Work is moving fast and furious in the Livermore Computing Complex at Lawrence Livermore National Laboratory (LLNL), where siting and installation for Sierra, the Lab’s next advanced technology high-performance supercomputer, is kicking into high gear.
Trucks began delivering racks and hardware over the summer for what will eventually be a 125-petaFLOPs (floating point operations per second) peak performance machine, projected to provide four to six times the sustained performance of the Lab’s current workhorse system, Sequoia. Sierra is scheduled for acceptance in fiscal year 2018, and has involved the labor of hundreds of Laboratory and vendor employees.
“It’s a major effort,” said Bronis de Supinski, Livermore Computing’s chief technology officer and head of Livermore Lab’s Advanced Technology (AT) systems. “It involves people on my team, making sure ahead of time, that we’ve really specified what we need correctly, and working closely with IBM to make sure they understand what our requirements are across the full range of aspects of the system: system software and tools, compilers, support for performance analysis and debugging tools. Most importantly, they’re working directly with the application teams to make sure they’re ready.”
It’s taken years of preparation to get to this point. Sierra rose out of the U.S. Department of Energy’s Collaboration of Oak Ridge, Argonne, and Livermore (CORAL) partnership, which is culminating in the delivery of large-scale, high-performance supercomputers at each of the three national laboratories. Livermore selected IBM to deliver Sierra to serve the requirements of the National Nuclear Security Administration’s (NNSA) Advanced Simulation and Computing (ASC) program.
“IBM analyzed our benchmark applications, showed us how the system would perform well for them, and how we would be able to achieve similar performance for our real applications,” de Supinski said. “Another factor was that we had a high probability, given our estimates of the risks associated with that proposal, of meeting our scheduling requirements.”
While Lab scientists have positive indications from their early access systems, de Supinski said until Sierra is on the floor and running stockpile stewardship program applications, which could take up to two years, they won’t be certain how powerful the machine will be or how well it will work for them.
Sierra will feature two IBM Power 9 processors and 4 NVIDIA Volta GPUs per node. The Power 9s will provide a large amount of memory bandwidth from the chips to Sierra’s DDR4 main memory, and the Lab’s workload will benefit from the use of second-generation NVLINK, forming a high-speed connection between the CPUs and GPUs, de Supinski said.
As Livermore’s first extreme-scale CPU/GPU system, Sierra has presented challenges to Lab computer scientists in porting codes, identifying what data to make available on GPUs and moving data between the GPUs and CPUs to optimize the machine’s capability. Through the Sierra Center of Excellence, Livermore Lab code developers and computer scientists have been collaborating with on-site IBM and NVIDIA employees to port applications.
The LLNL team also is working closely with IBM to install the system at the same time IBM and Oak Ridge National Laboratory are installing the Summit system there.
“It’s been really good for us to be working closely with all of our partners, and we find the relationship with Oak Ridge National Laboratory getting a system at the same time is helping us understand issues as they arise and solve them quickly,” de Supinski said. “It’s really been a real group effort to site this system and meet the aggressive schedule we’ve had in place from the beginning.”