Innovative computing technique facilitates unprecedented simulation and earns Livermore team the Gordon Bell Prize

Nov. 15, 2007

Innovative computing technique facilitates unprecedented simulation and earns Livermore team the Gordon Bell Prize

Using groundbreaking computational techniques, a team of scientists from Lawrence Livermore National Laboratory and IBM earned the 2007 Gordon Bell Prize for a first-of-a-kind simulation of Kelvin-Helmholtz instability in molten metals on BlueGene/L, the world's fastest supercomputer.

By performing extremely large-scale molecular dynamics simulations, the team was able to study, for the first time, how a Kelvin-Helmholtz instability develops from atomic scale fluctuations into micron-scale vortices.

"This has never been done before. We were able to observe this atom by atom. There was no time scale or length scale we couldn't see," said Jim Glosli, lead author on the winning entry titled "Extending Stability Beyond CPU Millennium: A Micron-Scale Simulation of Kelvin-Helmholtz Instability."

Other team members were: Kyle Caspersen, David Richards, Robert Rudd and project leader Fred Streitz of LLNL; and John Gunnels of IBM.

The Kelvin-Helmholtz instability arises at the interface of fluids in shear flow and results in the formation of waves and vortices. Waves formed by Kelvin-Helmholtz (KH) instability are found in all manner of natural phenomena, such as waves on a windblown ocean, sand dunes and swirling cloud billows.

While Kelvin-Helmholtz instability has been thoroughly studied for years and its behavior is well understood at the macro-scale, scientists did not clearly understand how it evolves at the atomic scale until now.

The insights gained through simulation of this phenomenon are of interest to the National Nuclear Security Administration's (NNSA) Stockpile Stewardship Program, the effort to ensure the safety security and reliability of the nation's nuclear deterrent without nuclear testing.

Understanding how matter transitions from a continuous medium at macroscopic length scales to a discrete atomistic medium at the nanoscale has important implications for such Laboratory research efforts as National Ignition Facility (NIF) laser fusion experiments and developing applications for nanotube technology.

"This was an important simulation for exploring the atomic origins of hydrodynamic phenomena, and hydrodynamics is at the heart of what we do at the Laboratory," Glosli said. "We were trying to answer the question: how does the atomic scale feed into the hydrodynamic scale."

"This remarkable Kelvin-Helmholtz simulation breaks new ground in physics and in high-performance scientific computing," said Dona Crawford, associate director for Computation at Lawrence Livermore National Laboratory. "A hallmark of the Advanced Simulation and Computing program is delivering cutting edge science for national security and the computing that makes it possible."

This simulation of unprecedented resolution was made possible by the innovative computational technique used --a technique that could change the way high-performance scientific computing is conducted.

Traditionally, the hardware errors or failures that are an inevitable part of HPC have been handled by the hardware itself or the operating system. This strategy was perfectly adequate for 1,000-to-10,000 processor supercomputing systems.

However, these traditional approaches don't work as well on a massively parallel machine the size of BG/L with more than 200,000 CPUs (central processing units) -- almost 10 times more than on any other system. With such a large number of processors and components, hardware failures are almost certain during long production runs.

Hardware failures impact system performance and consume valuable time on the machine.

In partnership with IBM, the Livermore team pioneered a new strategy for recovering from hardware failure.

They developed a way to use the application itself to help correct errors and failures. Their reasoning was that the application, which has a complete understanding of the calculation being run, can evaluate the errors and decide the most efficient strategy for recovery.

For example, by implementing a strategy to mitigate cache memory faults (which are the primary cause of failure in BG/L), the team was able to run without error for CPU-millennia.

"Applications with this capability could potentially lead to a new paradigm in supercomputer design," said Streitz, noting that application-assisted failure recovery reduces hardware reliability constraints, opening the way for supercomputer designs using less stable but higher performing - and perhaps less expensive - components. "That concept may allow the building of a faster machine."

Named for one of the founders of supercomputing, the prestigious Gordon Bell Prize is awarded to innovators who advance high-performance computing. The award is widely regarded as the Oscars of supercomputing.

A Livermore team led by Streitz won the 2005 Gordon Bell Prize for a simulation investigating the solidification in tantalum and uranium at extreme temperatures and pressure, with simulations ranging in size from 64,000 atoms to 524 million atoms. This year on the expanded machine, the Livermore team was able to conduct simulations of up to 62.5 billion atoms.

"The scale of this Kelvin-Helmholtz simulation was enormous compared to the previous simulations," Streitz said. "We were really pushing the limits of what is currently possible on this machine."

Founded in 1952, Lawrence Livermore National Laboratory has a mission to ensure national security and to apply science and technology to the important issues of our time. Lawrence Livermore National Laboratory is managed by Lawrence Livermore National Security, LLC, for the U.S. Department of Energy's National Nuclear Security Administration.