LLNL staff returns to Texas-sized Supercomputing Conference

SC22 (Download Image)

The 2022 International Conference for High Performance Computing, Networking, Storage, and Analysis (SC22) returned to Dallas as a large contingent of LLNL staff participated in sessions, panels, paper presentations and workshops centered around HPC. Photo by SC Photography.

The 2022 International Conference for High Performance Computing, Networking, Storage, and Analysis (SC22) returned to Dallas as a large contingent of Lawrence Livermore National Laboratory (LLNL) staff participated in sessions, panels, paper presentations and workshops centered around high performance computing (HPC).

The world’s largest conference of its kind celebrated its highest in-person attendance since the start of the COVID-19 pandemic, with about 11,000 of the nearly 12,000 registrants appearing at the Kay Bailey Hutchinson Convention Center in Dallas.

The conference kicked off Sunday with a slate of workshops and tutorials, with several of the sessions led by LLNL scientists. In what has become tradition on the conference’s first day, LLNL computer scientists Todd Gamblin and Greg Becker tutored attendees on the HPC package manager Spack, while Bronis de Supinski, chief technology officer for Livermore Computing, ran tutorials on OpenMP. LLNL computational scientist Judy Hill participated in the 2022 International Workshop on Performance Portability and Productivity and LLNL’s Ivy Peng and Maya Gokhale served as co-organizers for MCHPC’22: Workshop on Memory Centric High Performance Computing.

Monday began with LLNL computer scientist Peter Lindstrom leading a tutorial on lossy compression (such as zfp and TThresh) and Elsa Gonsiorowski of LLNL’s HPC Support Center speaking at the 9th International Workshop on HPC User Support Tools.

A midday press conference saw the eagerly anticipated reveal of the TOP500 most powerful computers in the world, with Oak Ridge National Laboratory’s Frontier once again claiming the top spot, reproducing its 1.102 exaFLOP performance on the High Performance Linpack benchmark from June’s list. A new machine broached the Top 10: the No. 4 Leonardo system at EuroHPC/CINECA in Italy, bumping LLNL’s flagship system Sierra down a spot to No. 6 in the world.

Overall, LLNL continued outpacing all other individual HPC centers with nine entrants on the Top500, including the third generation of El Capitan Early Access Systems (EAS3) — rzVernal, Tioga and Tenaya — all moving up from the previous list in June, to 107th, 121st and 165th respectively. TOP500 mainstays Lassen, Ruby, Magma, Jade and Quartz once again made the list.

Erich Strohmaier, a TOP500 co-organizer formerly of Lawrence Berkeley National Laboratory, said the list was reflective of an overall slowdown in the market and was “top-heavy,” with 8 systems making up half of the entire performance of the list. HPC machines appear “to be growing even slower in performance than before,” Strohmaier said, pointing to declines in the number of cores per system and reaching physical limitations on chips, both signaling the impending end of Moore’s Law. Moore's Law is the observation that the number of transistors in a dense integrated circuit doubles about every two years.

John Gyllenhaal
A mainstay at SC, LLNL’s colorful computer scientist John Gyllenhaal, clad in a unicorn helmet, passes out free light-up headgear to attendees. Photo by SC Photography.

The panel for the evening’s SC22 HPC Accelerates Plenary, entitled “The Many Dimensions of HPC Acceleration,” included Gina Tourassi, director of the National Center of Computational Sciences and the Oak Ridge Leadership Computing Facility at Oak Ridge National Laboratory (ORNL). Tourassi spoke of the need to “accelerate innovation even faster” in HPC to address existential threats, such as climate change and pandemics.

“We’re running out of time with many of these challenges,” Tourassi said.

As always, the traditional highlight of Monday at SC is the Grand Opening Gala for the Exhibit Hall floor, where organizers hold a ribbon-cutting ceremony and crowds stream in to interact with the more than 350 exhibitor booths representing industry vendors, academia and government, including the Department of Energy (DOE).

LLNL computer scientist Peer-Timo Bremer, along with students from California State University Channel Islands, debuted their hands-on demonstration of autonomous laser experiments at the DOE booth, where attendees tried their hands at beating an artificial intelligence (AI) “opponent” at color matching. The demonstrration used an identical software as is used in large high-repetition laser facilities for “self-driving” laser experiments and ran periodically at the booth for the next several days.

The publication HPCwire on Monday also announced its Reader’s and Editors’ Choice awards for 2022, handing its Editor’s Choice for Best Use of HPC in Energy to LLNL for applying AI and machine learning-based “cognitive simulation” techniques to inertial confinement fusion (ICF) research, which has led to more predictive models of implosions for ICF experiments at the National Ignition Facility and other laser facilities. The CogSim team includes Bremer, as well as physicist Brian Spears, computer scientists Brian Van Essen, Rushil Anirudh, Kelli Humbird, Luc Peterson, Jay Thiagaragan and others. Bremer, Spears and Van Essen accepted the award at the DOE booth the following day.  

“What’s most exciting to me and the rest of the team is that we’ve worked very hard to build AI into a tool that can bridge high performance computing and experimental work and put that together into something that’s actually functional for science,” Spears said. “To have the HPC world recognize that these aren’t just tools with prospects but they’re actually in the world and doing work now — being recognized feels like some external validation that, while it wasn’t necessary, is great to have.”

The need for speed

Nov. 15 at SC began with SC22 Chair Candy Culhane, a program/project director in Los Alamos National Laboratory’s Directorate for Simulation and Computation, leading the opening session of the conference, welcoming attendees to Dallas and introducing the conference theme of “HPC Accelerates.”

“HPC is accelerating our understanding of the world, it’s accelerating the exploration of solutions and shortening the time it takes to diagnose problems,” Culhane said. “We’re also accelerating the time-to-solution through innovations in hardware, whether it’s for AI or machine-learning systems, or making advancements in the exciting new world of quantum computing, we’re developing the specialized hardware that is solving challenging problems with unprecedented speed.”

Culhane’s welcome was followed by the ACM A.M. Turing Award Lecture by Jack Dongarra, a University Distinguished Professor of Computer Science in the Electrical Engineering and Computer Science Department at the University of Tennessee. Revered for his influence on HPC through his contributions to efficient algorithm development, parallel computing programming mechanisms and performance evaluation tools, Dongarra spoke of his family life and education, as well as his long and storied history in supercomputing. Highlights included the development of the Linpack benchmark, the “incredible changes” in performance he's seen in the TOP500 since 1993, and the promise and potential of exascale computing. Dongarra won the Turing Award, often referred to as the “Nobel Prize of Computing,” in 2021.

At the DOE booth, LLNL computer scientist Johannes Doerfert gave a talk on the Exascale Computing Project’s (ECP) SOLLVE project, which is working to evolve OpenMP for exascale computers, such as LLNL’s upcoming El Capitan. Doerfert talked about engaging with the ECP application teams to determine requirements and discussed tips for using and optimizing the LLVM set of compiler and toolchain technologies and OpenMP for exascale computing. Other LLNL members of the SOLLVE team include de Supinski and computer scientist Tom Scogland.

Johannes Doefert
LLNL computer scientist Johannes Doerfert participates in a panel to discuss the latest runtime evolution and the impact on applications. Photo by SC Photography.

“Compilers are not magic, but they can still give you magic results if you give them the right information,” Doerfert said.

Rounding out the schedule, computer scientist Ignacio Laguna presented a paper on Finding Inputs that Trigger Floating-Point Exceptions in GPUs, and LLNL-led Birds of a Feather (BoFs) sessions were held on Benchmarking Across HPC Architectures (co-led by Olga Pearce and Gamblin) Community-Driven Efforts for Energy Efficiency in HPC Software Stack (co-led by Tapasya Patki) and HPC and cloud computing (led by Dan Milroy).

LLNL takes SC by the horns

Perhaps the busiest day for LLNL participation, Wednesday Nov. 16, was loaded with paper presentations and talks. LLNL’s Laguna moderated a paper presentation on approximate computing and feasibility in HPC, featuring panel member and LLNL computer scientist Harshitha Menon. Gamblin also presented a paper on Answer Set Programming (ASP), a model for combinatorial search problems.

“Since the beginning of Spack, we designed a tool that could install lots of different versions of libraries and a language to describe that, but with the first version of Spack we didn’t really have a solver that could handle all the complexities that we introduced,” Gamblin said. “So, this is the culmination of a bunch of work that’s gone into simplifying that.”  

LLNL system engineer Chris DePrater gave an afternoon talk at the DOE booth, where he presented on the Exascale Computing Facility Modernization (ECFM) project, a major upgrade of the power and water of the Livermore Computing Center that will enable the Lab to site El Capitan and future HPC machines.

“It was a huge challenge,” DePrater said, describing the lessons learned from the project and the unforeseen hurdles the team experienced. “We had to figure out a way for [the crews] to work safely and keep the project going.”

Despite bad air quality due to California wildfires, COVID-19 and supply chain issues, ECFM was delivered 9 months early and $9 million under budget. DePrater said the Lab began readying the machine room floor for El Capitan in January and are about 75 percent complete.

The day featured a BoF session on the International Association of Supercomputing Centers (IASC), an effort that includes LLNL and co-founders — the Science and Technology Facilities Council Hartree Centre, the National Center for Supercomputing Applications and Leibniz Supercomputing Centre of the Bavarian Academy of Sciences and Humanities.

Speakers, including LLNL’s Judy Hill, said they envisioned the IASC as the start of a global effort to bring together public computing centers to share lessons learned and how to best optimize for advancement of science and research, as well as serve as a conduit for determining and fostering communication and collaboration among the centers, partners and users.

“We’re in an exciting time but also a terrifying time,” Hill said about El Capitan. “We’re fielding a machine we hope will be one of the largest in the world, and there’s a lot that goes with that. Over the next two years we’ll be working to find how to best use it, and I’m sure we’ll have a lot of lessons learned to share with the community.”

Back at the DOE booth, LLNL Principal Deputy Associate Director for Computing Lori Diachin and computational scientist Erik Draeger spoke about DOE’s Exascale Computing Project mission objectives and progress. Diachin, ECP’s deputy director, provided an overview of performance and portability and the ECP’s “holistic approach to co-design and integration” that is getting its exascale applications “across the finish line.”

“We’re entering a really exciting time for our project right now,” Diachin said. “Now in our final year, we are getting access to the exascale computing systems, and so all the teams have demonstrated progress on these systems, and most are ready to run.”

Draeger, ECP’s deputy director of application development, discussed how the ECP portfolio of 24 applications are seeing “a massive performance increase over time,” particularly on ExaSky, an application which simulates physics of the dark sector in space, such as dark energy, dark matter and neutrinos.  

“We’ve seen some great progress in our portfolio and that is continuing with Frontier,” Draeger said.

Throughout the morning, LLNL computer scientist Olga Pearce chaired a session on task scheduling papers. Doerfert, a co-author on a conference Best Paper Finalist, also took part in a panel to discuss the latest runtime evolution and the impact on applications.

The day ended with several BoFs led by LLNL staff, including Liquid Cooling Adoption: Roadblocks and Key Learnings (co-led by DePrater), the Green500 (co-led by Scogland) and the Spack Community BoF, where Gamblin, Becker and others discussed recent improvements in the latest version of Spack software. software.

The sun sets on Big D

As the technical program began winding down on Nov. 17, LLNL computer scientist Konstantinos Parasyris presented a paper with multiple LLNL co-authors on approximate computing, which reduces computation time but also introduces error. The team proposed the Puppeteer technique, which applies uncertainty quantification methods to reduce errors.

LLNL’s Director of Diversity, Equity and Inclusion Tony Baylis, who also served as SC22’s Inclusivity Committee Deputy Chair, was as a panelist on a BoF on racial inclusion and HPC.

“We’ve been culturally brought up to see gender and race," Baylis said. "We have to work toward feeling comfortable to having a dialogue about this. We need to focus on how we can have a dialogue that is respectful. My dream is that we have that understanding and we’re all blind (to color) and we just see the humanity in each other.”

Tony Baylis
LLNL’s Director of Diversity, Equity and Inclusion Tony Baylis (left) participates as a panelist at a Birds of a Feather (BoF) on Racial Inclusion and HPC at SC22 on Nov. 17. Photo by Jeremy Thomas/LLNL.

Baylis will serve as Inclusivity Chair at SC23, which will carry the tagline of “I Am HPC.”

“We really want to infuse the conference as a celebratory place where we can showcase inclusivity and take it home and live it — to really integrate it into the work that we do every day,” Baylis added. “We have to learn to give people grace and lean into curiosity.”

The day continued with the SC22 Awards, highlighted by the awarding of the ACM Gordon Bell Prize to a multinational team that presented a first-of-its-kind code for kinetic plasma simulations optimized on ORNL’s Frontier and Summit, LBNL’s Perlmutter and Japan’s Fugaku supercomputers.

The Gordon Bell Special Prize for High Performance Computing-Based COVID-19 Research was presented to a 34-member team for training one of the largest foundation models on whole genome sequences and applying it to characterize SARS-CoV-2 variants of concern. The work is an attempt to find out how new and emergent variants of pandemic causing viruses, especially SARS-CoV-2, are identified and classified. The team included co-authors from Cerebras Systems, whose second-generation CS-2 AI accelerator chip was installed into LLNL’s Lassen supercomputer for AI-related research.

In addition, ACM announced the establishment of a new award, the ACM Gordon Bell Prize for Climate Modelling, which will highlight research focused on modelling the impact of climate change. The first award will be given out at SC23.

The afternoon continued as LLNL physicist Draeger joined a panel called “Pain of Porting,” a discussion on lessons learned from porting and lowering the barrier to allow for emerging technologies. Drager mentioned EQSIM, an earthquake modeling code that was successfully optimized using RAJA at LLNL, and is now running on ORNL's Summit at 1,000 times more throughput than before.

“Once you fully abstract away the work and recompose it for whatever architecture you’re on, you can get massive performance gains,” Draeger said. “It’s very exciting to see how much science you can pull out of it if you really invest in the computer scientist, even as a physicist.”

SC22 wrapped up on Friday the 18th with a workshop on correctness co-organized by Laguna, a workshop on hierarchical parallelism for exascale computing co-organized by LLNL Ulrike Yang, and a panel on disaster management capabilities at computing facilities, featuring LLNL HPC Chief Engineer and ECFM Project Manager Anna Maria Bailey as a panelist.

Bailey, who joined virtually, discussed the myriad challenges faced during the ECFM’s construction.

“In this decade — not just 2020 — we don’t have unforeseen conditions, we have things that are unrealistically unbelievable,” Bailey said. “It’s been quite strange for us, and I would say risk planning is definitely changing. It’s not just one unforeseen condition, these are phenomenal things that you have to plan for and trying to set contingencies before you plan them is not something we’re used to doing.”

In addition to the many LLNL presenters, panelists and speakers, more than 30 LLNL employees took on volunteer roles on the conference’s many committees. See a full list of members and events.  

SC23 will be held Nov. 12-17, 2023 in Denver and will be chaired by Dorian Arnold, an associate professor in the Department of Math and Computer Science at Emory University.