Meet the machines that matter: How El Capitan keeps its cool
(Download Image)
A complex system of cooling towers, chillers, pumps, heat exchangers, sensors and more than 2,000 feet of pipes all work together to remove heat from the liquid-cooled exascale supercomputer El Capitan. (Graphic: Dan Herchek/LLNL)
LLNL tackles the nation’s toughest security challenges through bold, multidisciplinary science powered by advanced facilities and instruments. In this four-part series, meet the machines that work behind the scenes at the Laboratory to drive discovery, push boundaries and enable excellence. From inspecting optics and trapping ions to cooling supercomputers and detecting radiation, these are the machines that matter.
When most people picture a supercomputer, they imagine endless rows of compute and storage racks, blinking lights and machines thinking faster than humans can conceive. What they don’t visualize is the system quietly working underneath it all — a complicated, choreographed dance that keeps all that processing power from turning into heat, and from heat into failure.
At Lawrence Livermore National Laboratory (LLNL), the exascale El Capitan supercomputer — capable of over 1.8 quintillion calculations per second on real-world benchmarks — depends on a cooling system so essential that without it, the machine would shut down fast.
If El Capitan is the brain, according to LLNL systems engineer Chris DePrater, the cooling system is the cardiovascular system — moving heat the way blood moves oxygen. And just as importantly, DePrater adds, the controls keeping El Capitan from overheating act like a nervous system, sensing changes and responding instantly to keep everything in balance.
Heat generates fast at exascale. As processors and accelerators work at extraordinary densities, they create energy faster than air can reasonably carry away. Once racks exceed a certain power threshold, air becomes impractical.
To keep El Capitan from overheating like a hiker in the desert summer, its cooling system uses two loops, one with treated water, and the other with a glycol-based liquid (like antifreeze), combining efficient heat transfer with protection against bacterial growth. At El Capitan’s scale, liquid cooling isn’t just a preference; it’s the only way to operate.
“We typically stop using air cooling if the rack goes over about 25 kilowatts (25,000 watts),” DePrater explains. “With the densities on El Capitan being 400 kilowatts a rack, there's no other option. You cannot cool that dense of a rack in that small of space.” Without liquid cooling, El Capitan simply “wouldn't function. It would completely stop working within minutes,” he says.
DePrater uses a campfire analogy: Toss an empty plastic bottle into the flames and it melts almost immediately. Fill that same bottle with water and it takes far longer to fail, because water absorbs heat much more effectively than air. El Capitan’s cooling system uses that same principle, circulating water directly to where heat is generated and carrying it away, back to the neighboring Exascale Computing Facility Modernization (ECFM) site, before temperatures can spike.
The system itself isn’t a single machine, but an ensemble — cooling towers, chillers, pumps, heat exchangers, sensors and more than 2,000 feet of pipes all working together to remove heat from El Capitan, through the facility and out to the atmosphere. DePrater compares the coordinated effort to a symphony, with the control system playing the part of the conductor.
“The controls need to conduct the systems to play or sing in harmony with each other, so that way nothing can be out of sync,” he explains.
Those controls matter as much as the plumbing. Sensors constantly monitor temperature, flow rate and pressure, allowing the system to respond to rapid power swings as workloads change. If cooling isn’t perfectly balanced, temperatures rise quickly. Built-in safeguards allow the system to shed load or shut down safely before damage occurs.
Despite the scale — tens of thousands of gallons of water recirculating constantly through pipes large enough for a human to crawl through — the goal is efficiency, not brute force. The water used in El Capitan is warmed to about 85 degrees, rather than chilled, avoiding energy-intensive refrigeration. Heat is removed largely through evaporation at the ECFM cooling tower, one of the most efficient cooling methods available. The result is a system designed not just to support today’s world-class computing machines, but to scale for what comes next. For DePrater, success is defined by invisibility; no news is good news.
“When nobody knows who I am, that’s the best thing, because it means everything has been working fine,” he says with a laugh. Using another analogy, the cooling system is like the drummer in a band, rarely in the spotlight but impossible to replace.
By enabling El Capitan to run reliably at scale, the cooling system supports the Laboratory’s national security mission and stockpile modernization efforts, including high-resolution, 3D simulations that underpin the safety, security and reliability of the nation’s nuclear deterrent.
While the world’s most powerful supercomputer gets the headlines, beneath the floor and behind the scenes, its cooling machines do the quiet work that makes everything else possible. Without it, there is no computation, and no mission-critical science.
Contact
[email protected]
(925) 422-5539
Tags
HPC, Simulation, and Data ScienceNuclear deterrence
Environment Safety and Health
Computing
Featured Articles




