Powering up: LLNL prepares for exascale with massive energy and water upgrade
To prepare for the exascale El Capitan and the next generation of power-hungry supercomputers, construction crews and maintenance workers at LLNL have been hard at work since late 2019 and throughout the COVID-19 pandemic on a massive, $100 million Exascale Computing Facility Modernization project. The project will nearly double the energy capacity of the Lab’s Computing Center, eventually supplying the facility with enough wattage for about 75,000 modest-sized homes. Photo by Aaron Kiefer/LLNL.
A supercomputer doesn’t just magically appear, especially one as large and as fast as Lawrence Livermore National Laboratory’s upcoming exascale-class behemoth El Capitan. At peak usage, El Capitan — projected to be the world’s most powerful computer by 2023 — will require about as much energy as a small city, so it takes years of planning, infrastructure upgrades and an entirely new way of thinking about the mechanical and electrical capabilities of existing supercomputing facilities to make it all happen.
To prepare for El Capitan and the next generation of power-hungry supercomputers, construction crews and maintenance workers at LLNL have been hard at work since late 2019 and throughout the COVID-19 pandemic on a massive, $100 million Exascale Computing Facility Modernization (ECFM) project. The project will nearly double the energy capacity of the Lab’s Computing Center, eventually supplying the facility with enough wattage for about 75,000 modest-sized homes.
But instead of powering houses, the upgrade will enable LLNL and the other National Nuclear Security Administration (NNSA) national laboratories (Los Alamos and Sandia) to run El Capitan and other next generation supercomputers capable of regularly performing the high-fidelity modeling and simulation necessary to meet the needs of NNSA’s Stockpile Stewardship Program, which ensures the safety, security and reliability of the nation’s nuclear deterrent.
Though El Capitan won’t arrive until spring 2023, the road to preparing LLNL for the more than 2 exaFLOPs (2 quintillion floating point operations per second) supercomputer began in 2008 with a detailed roadmap to exascale. Studies showed that to successfully site two exascale machines at the same time, the Lab would need to run 85 megawatts to the computing floor, up from the existing 45-megawatt capacity.
“The ECFM project will nearly double the amount of electricity into our classified computing center and nearly triple the amount of cooling into the building,” said LLNL Weapon Simulation & Computing Program Director Chris Clouse. “These upgrades were essential in allowing us to site two exascale class computers simultaneously, avoiding any potential downtime in computing cycles as we fully stand up our second exascale system before decommissioning El Capitan in the 2029 timeframe.”
Operating El Capitan alone, which is set to reside in the 6,800 square-foot footprint of the retired Sequoia and ASC Purple supercomputers, will require between 30-35 megawatts at peak, meaning LLNL had to completely rethink how it supplied power and water to the machine room floor.
“The thing that’s been challenging is that when we were doing these smaller projects (Purple and Sequoia) we were looking originally at commercial solutions — then we started looking at industrial solutions,” said Anna Maria Bailey, LLNL High Performance Computing chief engineer and ECFM project manager. “Exascale is a game changer; we’re actually doing utility (scale) solutions, and that’s not something you can just snap your fingers and have done. So, when we say exascale, we’re saying ‘it’s a lot of infrastructure.’ It’s no longer something you can do locally, and it doesn’t just happen overnight.”
The project’s scale dwarfs anything like it before at LLNL. Working through the pandemic, construction firm Nova Probst installed a utility-grade infrastructure, running a 115-kilovolt (kv) transmission line into a new switchyard containing two 40-megawatt substation transformers, 13.8 kv switchgear, relays and feeders and secondary substations located inside the computing facility.
Crews also added six 3,000-ton cooling towers and all the requisite pipes, heat exchangers and pumps to expand the computing facility’s water-cooling system from 10,000 tons to 28,000 tons. The system will provide the water needed to cool El Capitan and other future machines, using warmer water to reduce the use of chillers and pumps to improve overall efficiency.
With a project of this size and scope, Bailey said the biggest challenge was navigating through the permitting and approval processes. As a Congressional budget line item managed by NNSA, the project took seven years of planning and underwent numerous review cycles.
It required multiple studies, a sitewide environmental impact review, approval from the California Public Utilities Commission and close collaboration with utility providers Pacific Gas and Electric and the Western Area Power Administration (WAPA), one of four power marketing administrations within the U.S. Department of Energy. Through an interagency agreement, WAPA will operate and maintain the new switchyard.
Despite delays due to the COVID-19 pandemic, the project is more than 93 percent complete and is expected to be fully operational by May 2022, eight months ahead of schedule. While the power and water is up to the requisite capacity, it still needs to reach the computing floor, where it will supply up to 35 megawatts to the same 6,800 square-foot footprint as Purple, a 100-teraflop machine that needed less than five megawatts to run.
“This project is going great,” Bailey said. “It’s crazy. The machine footprints have not grown much over the past 20 years, but the electrical and mechanical requirements have grown by a factor of seven. We used to provide commercial facility infrastructure solutions and now we need to provide utility solutions to support the load. We have to figure out now how we’re impacting the grid. It’s become more of a marriage with the utility company as opposed to just a handshake.”
Other studies will still need to be performed, such as how to best mitigate the energy imbalance that occurs when El Capitan goes from peak performance to idle, where the sudden drop of up to 30 MWs can be a cause of concern for the utilities.
Upon energization of the new substation, Bailey and her team will test the system with load banks. They have started the design process on the next step, the El Capitan Site Infrastructure project, which will facilitate sending the power and cooling from the facility point of distribution to El Capitan’s future location. Construction will begin this winter and is scheduled for completion in January 2023, just months before El Capitan’s racks start arriving.
“That’s what keeps me up at night,” Bailey said. “ECFM is almost done, and it’s been a great project, but now the rubber really hits the road.”
Contact
Jeremy Thomas[email protected]
(925) 422-5539
Related Links
Exascale Computing ProjectTags
HPC, Simulation, and Data ScienceComputing