The Department of Energy’s National Nuclear Security Administration (NNSA), Lawrence Livermore National Laboratory (LLNL) and its industry partners today officially unveiled Sierra, one of the world’s fastest supercomputers, at a dedication ceremony to celebrate the system’s completion.
Sierra will serve the NNSA’s three nuclear security laboratories, LLNL, Sandia National Laboratories and Los Alamos National Laboratory, providing high-fidelity simulations in support of NNSA’s core mission of ensuring the safety, security and effectiveness of the nation’s nuclear stockpile. Its arrival represents years of procurement, design, code development and installation, requiring the efforts of hundreds of computer scientists, developers and operations personnel working in close partnership with IBM, NVIDIA and Mellanox.
“Today we mark our latest milestone toward computing on a truly exascale level,” Department of Energy Secretary Rick Perry said in a video message prepared for the dedication. “With its dramatic unveiling of Sierra, Lawrence Livermore National Laboratory has taken a pivotal step forward on behalf of America’s national security.”
“With the advent of Sierra, Livermore has delivered a powerful new tool for NNSA and stockpile stewardship. This machine represents a new approach to high performance computing that will enable us to address and answer scientific questions previously beyond our reach,” said LLNL Director Bill Goldstein. “I thank everyone involved in getting us to this point: our sponsors at NNSA, our industry and national lab partners and our own dedicated staff. This is a signal moment in Livermore’s history, and a new milestone in our leadership in high performance computing and simulation.”
Sierra, ranked as the third-fastest supercomputer in the world on the latest TOP500 list, is NNSA’s first large-scale production heterogeneous system, meaning each node incorporates both IBM central processing units (CPUs) and NVIDIA graphics processing units (GPUs). It is specifically designed for modeling and simulations essential for NNSA’s Stockpile Stewardship Program, ongoing life extension programs, weapons science and nuclear deterrence. It is expected to go into use for classified production in early 2019.
“NNSA and its predecessors have been at the forefront of scientific computing since World War II,” said Mark Anderson, director for the Office of Advanced Simulation and Computing and Institutional Research & Development at NNSA. “The supercomputers provided by NNSA are an essential element of stockpile stewardship without nuclear testing. Sierra is the most capable computer we have ever fielded. It also is a harbinger of future computing technology and a critical step along the path to exascale.”
Sierra boasts a peak performance of 125 petaFLOPS — 125 quadrillion floating-point operations per second. Early indications using existing codes and benchmark tests are promising, demonstrating as predicted that Sierra can perform most required calculations far more efficiently in terms of cost and power consumption than systems consisting of CPUs alone. Depending on the application, Sierra is expected to be six to 10 times more capable than LLNL’s 20- petaFLOP Sequoia, currently the world’s eighth-fastest supercomputer.
“The continued aging of the stockpile requires much more capable computing systems,” said Mike Dunning, acting principal associate director for LLNL’s weapons program. “Sierra represents a continuation of NNSA’s leadership in high performance computing. It’s even more important today as we face increased global complexities, so it is essential that our tools are able to operate at the leading edge.”
With a footprint of 7,000 square feet, Sierra is comprised of 240 computing racks and 4,320 nodes, with each node consisting of two IBM POWER 9 CPUs, four NVIDIA V100 GPUs and a Mellanox EDR InfiniBand interconnect. To prepare for this architecture, LLNL has partnered with IBM and NVIDIA to rapidly develop codes and prepare applications to effectively optimize the CPU/GPU nodes.
IBM and NVIDIA personnel worked closely with LLNL, both on-site and remotely, on code development and restructuring to achieve maximum performance, while LLNL personnel provided feedback on system design and the software stack to the vendor. This “center of excellence” co-design strategy is necessary to assure that codes and platforms are well-matched, and applications are optimized for GPU-accelerated architecture. LLNL’s partnership with Oak Ridge National Laboratory, which is siting the Summit system from IBM, also has been extremely helpful throughout the project, from procurement to operation.
LLNL selected the IBM/NVIDIA system due to its energy and cost efficiency, as well as its potential to effectively run NNSA applications. Sierra’s IBM POWER9 processors feature CPU-to-GPU connection via NVIDIA NVLink interconnect, enabling greater memory bandwidth between each node so Sierra can move data throughout the system for maximum performance and efficiency. Backing Sierra is 154 petabytes of IBM Spectrum Scale, a software-defined parallel file system, deployed across 24 racks of Elastic Storage Servers (ESS). To meet the scaling demands of the heterogeneous systems, the solution delivers 1.54 terabytes per second in both read and write bandwidth and can manage 100 billion files per file system.
“The next frontier of supercomputing lies in artificial intelligence,” said John Kelly, senior vice president, Cognitive Solutions and IBM Research. “IBM's decades-long partnership with LLNL has allowed us to build Sierra from the ground up with the unique design and architecture needed for applying AI to massive data sets. The tremendous insights researchers are seeing will only accelerate high performance computing for research and business.”
As the first NNSA production supercomputer backed by GPU-accelerated architecture, Sierra’s acquisition required a fundamental shift in how scientists at the three NNSA laboratories program their codes to take advantage of the GPUs. The system’s NVIDIA GPUs also present scientists with an opportunity to investigate the use of machine learning and deep learning to accelerate time-to-solution of physics codes. It is expected that simulation, leveraged by acceleration coming from the use of artificial intelligence technology, will be increasingly employed over the coming decade.
“Sierra is a world-class, pre-exascale supercomputer that allows researchers to run large complex scientific simulations at scale, at speeds never before thought possible,” said Ian Buck, vice president and general manager of Accelerated Computing at NVIDIA. “Equipped with more than 17,000 of our Tesla Tensor Core V100 GPUs, Sierra is a powerful, universal platform for compute-intensive scientific simulations, machine learning, deep learning and visualization applications all in one — paving the path forward for the future of high performance computing.”
Sierra also leverages Mellanox EDR 100 Gigabit InfiniBand In-Network Computing acceleration engines to achieve higher applications performance and scalability.
“We are very proud to provide essential technology for one of the fastest supercomputers in the world at Lawrence Livermore National Laboratory,” said Gilad Shainer, vice president of marketing at Mellanox Technologies. “Our InfiniBand smart interconnect delivers the necessary performance, efficiency and scalability to support the needs of the Laboratory’s next-generation high performance and artificial intelligence applications, and the path to exascale computing.”
In addition to critical national security applications, a companion unclassified system, called Lassen, also has been installed in the Livermore Computing Center. This institutionally focused system will play a role in projects aimed at speeding cancer drug discovery, precision medicine, research on traumatic brain injury, seismology, climate, astrophysics, materials science and other basic science benefiting society.
Sierra continues the long lineage of world-class LLNL supercomputers and represents the penultimate step on NNSA’s road to exascale computing, which is expected to be achieved by 2023 with an LLNL system called “El Capitan.” Funded by the NNSA’s Advanced Simulation and Computing (ASC) program, El Capitan will be NNSA’s first exascale supercomputer, capable of more than a quintillion calculations per second, about 10 times greater performance than Sierra. Such computing power will be easily absorbed by NNSA for its mission, having required the most advanced computing capabilities and deep partnerships with American industry.
“In just a few short years, we expect to see exascale systems deployed at Lawrence Livermore, Argonne and Oak Ridge (national laboratories), ensuring our global superiority in this arena for years and decades to come,” Perry said. “Starting with Sierra, this new generation of supercomputers will be an absolute game-changer for the world.”