Back

Record-setting SC23 builds mile-high momentum for exascale computing, AI, and the future of HPC

SC23_1116_awards-24_1750px (Download Image)

The Simple Cloud Resolving E3SM Atmosphere Model (SCREAM) team won the inaugural Gordon Bell Prize for Climate Modelling, recognizing the team for “innovative parallel computing contributions toward solving the global climate crisis.” LLNL staff scientist Peter Caldwell (third from left) led the SCREAM team. Photo courtesy of SC Photography.

 

The crowds returned to SC, and with them came a renewed excitement for the future of high performance computing (HPC).

A record number of attendees — more than 14,000 — experts, researchers, vendors and enthusiasts in the field of HPC descended on the Mile High City for the 2023 International Conference for High Performance Computing, Networking, Storage and Analysis, colloquially known as SC23.

Spanning multiple days and featuring presentations, tutorials, networking sessions and discussions on various topics, the 35th SC conference came at an inflection point for HPC, as recent advancements in generative artificial intelligence (AI), hardware and HPC/cloud convergence permeated seemingly every talk, birds-of-a-feather forum and Q&A session.

As Lawrence Livermore National Laboratory (LLNL)’s Lori Diachin proclaimed at the Department of Energy booth on Nov. 16, “Exascale is here.” With improvements to scientific applications already visible on Oak Ridge National Laboratory (ORNL)’s Frontier, the long-awaited debut of Argonne National Laboratory (ANL)’s Aurora on the Top500 List of the world’s fastest supercomputers, along with enthusiasm over the impending arrival of LLNL’s El Capitan, there was much to say about the current state and evolving trends of HPC.

While AI was the acronym on everybody’s lips, like the SC23 tagline — “I am HPC”— sought to humanize supercomputing, the conference was as much about the people who are pushing the field forward as it was about the technologies underpinning it.

SC23-1110-COconferencecenter-56
The International Conference for High Performance Computing, Networking, Storage and Analysis (SC) returned to Denver, where a record number of attendees gathered to discuss the latest achievements and trends in HPC, artificial intelligence, big data and more. Photo courtesy of SC Photography.

Kicking off a bigger-than-ever SC

As attendees began streaming into the Colorado Convention Center in downtown Denver on Nov. 12, eager for the week to come, SC23 opened with its traditional spate of user workshops and tutorials, many of them led or organized by Lab employees.

At the 10th International Workshop on HPC User Support Tools, organized by LLNL HPC I/O Support Specialist Elsa Gonsiorowski, Lab intern Kyle Fan presented work on MSR-genie, a query tool for Model Specific Registers. Another workshop featured Lab computational engineer Robert Carson presenting his team’s work on quantifying uncertainty in metal additive manufacturing. LLNL computer scientists Todd Gamblin and Greg Becker also conducted their annual tutorial on Spack, a flexible package manager used for compiling code in HPC.

At a morning workshop, LLNL intern Anton Rydahl spoke about a C standard math library for graphics processors (GPUs), and at the 7th International Workshop on Software Correctness for HPC applications, organized by LLNL computer scientist Ignacio Laguna, researchers highlighted advancements in verification, debugging and testing for HPC. One presentation by Lab computer scientist Chunhua Liao explored the use of large language models (LLMs) to improve and enhance data race detection in HPC applications. Later, in a workshop on irregular applications, Center for Applied Scientific Computing (CASC) data scientist Keita Iwabuchi presented research delving into the challenges and solutions associated with graph-based approaches on large-scale systems.

SC23_1114_stud_post_-06621
LLNL software developer Kristi Belcher discusses a poster with attendees at the SC23 Poster Session on Nov. 14. Photo courtesy of SC Photography

Diverse sessions mark the start of SC23 technical program

The first day of the technical program saw more workshops, including the 2023 International Workshop on Performance, Portability and Productivity in HPC (organized by LLNL’s Judy Hill) and the AI-Assisted Software Development for HPC (organized by LLNL’s Giorgis Georgakoudis, Ignacio Laguna and Konstantinos Parasyris). Research software developer Kristi Belcher spoke about Umpire, a data and memory management Application Programming Interface (API) created at LLNL, and computer scientist Tom Scogland presented work on the scalable graph-based resource model Fluxion, as well as advanced scheduling use cases enabled by the model.

SC23_1112_tut_work-104532
LLNL Organizational Development Consultant A.J. Lanier talks to participants in the Students@SC program at the SC23 conference in Denver. Photo courtesy of SC Photography.

Throughout the day, LLNL Organizational Development Consultant A.J. Lanier gave presentations for the conference’s early career program (chaired by Lab software test engineer Stephanie Dempsey), which included “The Art of the Pitch,” a session on crafting effective elevator pitches for young researchers. Lanier also conducted a workshop on psychological safety for the Students@SC program, and followed up later in the afternoon with a session on microaggressions, addressing the importance of fostering inclusive environments in the HPC community.

SC23_1113_pres_con
LLNL’s Lori Diachin, director of the Exascale Computing Project, briefs the media on the state of exascale computing at the SC23 press conference on Nov. 13. Photo courtesy of SC Photography.

The day was highlighted by the annual SC press briefing revealing the Top500 List of the world’s most powerful supercomputers and discussions around the state of HPC. The briefing began with a talk by LLNL’s Diachin, director of the Exascale Computing Project (ECP), who provided an overview of the progress made in exascale computing over the past year.

Diachin highlighted the successful deployment of Frontier, which maintained its top spot on the Top500, and Aurora at ANL, which debuted at No.2 on the list with a benchmark run of 585 petaFLOPs on only about half of the system. Diachin discussed the exascale software ecosystem, including the open source Extreme-scale Scientific Software Stack (E4S) and emphasized significant improvements in applications. She concluded by discussing the broad reach of the project, covering areas like cancer research and wind energy, the expected impact on industry and its potential for reducing energy consumption from using accelerator-based computing.

"We have delivered exascale systems that are on the floor, and they are performing exceptionally well,” Diachin said. “We have improved applications by factors of tens to hundreds to thousands, through not only the hardware improvements that we've seen in the last seven years, but also through algorithmic improvements."

Following Diachin and talks by IDC’s Heather West and NVIDIA’s Jack Wells on quantum computing and the impact of generative AI on scientific discovery respectively, Lawrence Berkeley National Laboratory (LBNL) senior scientist Erich Strohmaier presented an overview of the newest Top500, and recent changes and trends in HPC. Strohmaier noted the unusual but encouraging “churn” in the list, with several new additions to the Top 10 since June, including Aurora and a Microsoft Azure Cloud system named Eagle at No. 3, the highest ranking a cloud system had ever achieved.  

Two new National Nuclear Security Administration commodity supercomputing clusters deployed at LLNL under the Commodity Technology Systems (CTS-2) contract — Dane and Bengal— also debuted on the Top500, ranking 108th and 129th respectively. The addition of the machines brought the total of LLNL-sited systems on the list to 11, more than any other supercomputing center in the world. LLNL’s current flagship petascale system, Sierra, dropped to No. 10, remaining in the Top 10 despite being more than five years old, Strohmaier noted.

The SC tutorials and workshops continued into the afternoon, with Livermore Computing’s Chief Technology Officer Bronis de Supinski leading a tutorial on OpenMP, and Lab computer scientist Cyrus Harrison heading a tutorial on the in-situ visualization tool Ascent and a framework called ParaView Catalyst.

As day turned to evening, the “I Am HPC” plenary featured a panel of speakers from the University of California, Berkeley; Rutgers University; the National Oceanic and Atmospheric Administration and the Barcelona Supercomputing Center. The panelists discussed the impacts of HPC across society, the challenges and opportunities in the future, the people who make HPC run and efforts to ensure the field isn’t limited to a handful of groups.

The evening concluded with a ribbon cutting by SC23 Conference Chair Dorian Arnold, an associate professor at Emory University and former student scholar at LLNL. The ceremony marked the official start of the SC23 opening night gala, as the exhibit hall doors opened and thousands of attendees made a mad dash for the 438 exhibitor booths, another SC record.

Amidst the hubbub and before a large crowd at the DOE booth, LLNL computer scientist Todd Gamblin, creator of Spack, announced the intended formation of the High Performance Software Foundation, a nonprofit he is co-founding with Kokkos lead developer Christian Trott of Sandia National Laboratories (Sandia) under the Linux Foundation banner. Through the foundation, Gamblin said he hopes DOE labs can work together with academia and industry to advance a common, affordable software stack for HPC.

“We want to bring the HPC software stack to the mainstream,” Gamblin said. “The impact doesn't have to be limited to DOE—we want to make DOE software much more widely available. We think industry has a real reason to invest in this stack and to contribute to these packages, so they live for a really long time, and many more use cases than maybe we envisioned for them. We're trying to lower all these barriers to use for the software and encourage more development.”

SC23_ex_hall-04727
Inclusivity co-chair and LLNL’s Office of Inclusion, Diversity, Equity and Accountability Director Tony Baylis talks to a group on the conference exhibit hall floor. Photo courtesy of SC Photography.

Evolving supercomputing landscape takes center stage

Nov. 14 began with the conference keynote, an inspiring talk by Hakeem Oluseyi, author of “A Quantum Life: My Unlikely Journey from the Streets to the Stars.” Oluseyi related his transformational journey from a rough childhood growing up in poverty and crime-ridden neighborhoods to earning a Stanford University doctorate in physics and becoming an astrophysicist, an inventor with multiple patents, a cosmologist, a research fellow at LBNL and a science communicator.

“We are wasting a lot of our human capital and potential because we judge people based on not what they're capable of, not what they have inside, but how well they have been prepared up to today; and some people have been prepared the way I was prepared, which is not very well prepared at all,” Oluseyi said. “I encourage you to be open-minded and see deeper than the GPA., but also, if you help people, there's no telling where it might end up.”

Midday at the DOE booth, Diachin and LLNL High Performance Computing Innovation Center Director Erik Draeger provided an update on exascale supercomputing at DOE, focusing on the success of ECP in building the exascale software ecosystem, applications and integration into facilities worldwide. The seven-year ECP initiative is scheduled to end in December and involved 15 DOE labs, universities and industrial partners.

Diachin highlighted the deployment of Frontier and Aurora as well as the delivery of El Capitan at LLNL, scheduled for mid- to late 2024. She emphasized the importance of not “treating software as an afterthought” and pointed to key performance parameters for measuring success, including significant performance improvements ranging from 70x to over 1,000x on various exascale applications, which she attributed to collaborations and investments in software technologies.

“We are so proud that we have been involved with ECP,” Diachin said. “It’s been one of the most amazing projects to be a part of, and we really do believe that the legacy of the Exascale Computing Project will be far reaching, in terms of migrating from CPUs to GPUs and what that looks like. Those lessons learned are the fact that you can get much better performance at a much lower energy cost. We think it's going to be important at all scales, not just at exascale, and we hope to enable that transition.”

Draeger addressed some of the challenges in achieving exascale, stressing the need for a diverse ECP application portfolio with “real science goals.” Using ExaWind as an example, he emphasized the importance of specific, achievable objectives to advance simulation accuracy and algorithmic innovation. He showcased ECP performance achievements, and concluded with an earthquake modeling case study, illustrating the positive impact on seismic hazard risk assessment.

“This is us distilling down our lessons from ECP and how to write an application,” Draeger said. "Don't do it the way you might have thought to do it on the left; you really have to take the time and think deeply about re-architecting and reimagining these things, looking at the hardware we have and what's going to happen."

In an afternoon session, LLNL’s Parasyris presented one of the finalists for the conference’s Best Paper Award, describing a method for optimizing application performance on large-scale GPU systems. The paper, which features four LLNL co-authors, utilizes a mechanism called Record-Replay (RR) to enhance the performance of computing applications employing GPUs. This mechanism involves recording the program's execution on a GPU and replaying the recording to test different settings, enabling the "autotuning of applications faster than ever," Parasyris said.

“We developed a tool that automatically picks up part of the application, moves it as an independent piece so you can start independently and then optimize it,” Parasyris said. “Once it is optimized, you can plug it into the original application. By doing so you can reduce the execution time of the entire application and do science faster.”

SC23_1114_tut_wrk
LLNL intern Zane Fink presents a paper on HPAC-Offload, a programming model allowing approximate computing to accelerate HPC applications on graphics processing units (GPUs). Photo courtesy of SC Photography.

LLNL research intern Zane Fink also presented a paper on HPAC-Offload, a programming model allowing approximate computing to accelerate HPC applications on GPUs, which included Parasyris, Georgakoudis and Harshitha Menon as co-authors.

Later, at the Top500 birds of a feather (BOF) session, LBNL’s Strohmaier provided insights into the evolving landscape of supercomputing, including new systems, industry changes and the growing significance of mixed-precision algorithms. The Top500 Q&A session delved into topics such as El Capitan's potential, the intersection of AI with applications and the latest developments in HPC.

"We have a very strong concentration of compute power at the very top, and you see that now also in the commercial market, not just the research market,” Strohmaier said. “Clearly the old Moore's law is over. Getting something like a 10 exaFLOP system in a decade might be a stretch if we do business as usual. Needless to say, the technology [needed is] going to be more and more on the software side."

Exploring cutting-edge scientific research and initiatives

On Nov. 15, Sandia’s Luca Bertagna presented a paper describing a record-setting run of the Simple Cloud Resolving E3SM Atmosphere Model (SCREAM) on ORNL’s exascale Frontier system. A finalist for the first-ever Association for Computing Machinery (ACM) Gordon Bell Prize for Climate Modelling — an award instituted by the prize’s namesake to help address climate change through advanced computing — the paper marked the first time a global cloud-resolving model had run on an exascale supercomputer, breaking the one-simulated-year-per-day barrier at less than five kilometers of horizontal resolution.

On Frontier, the SCREAM model demonstrated notable speed-ups, underscoring the benefit of exascale systems for climate modeling. The SCREAM team is led by LLNL staff scientist Peter Caldwell, and includes LLNL co-authors Aaron Donahue, Chris Terai and Renata McCoy.

In the talk, Bertagna highlighted the use of Kokkos, a C++ library for on-node parallelism, which enabled efficient execution across various computing architectures for SCREAM, including both CPU and GPU systems, as well as conventional architectures. SCREAM aims to improve scientific understanding of climate change and has already completed a set of 40-day simulations for all four seasons, Bertagna said.

An afternoon panel on democratizing AI, chaired by LLNL Chief Computational Scientist Fred Streitz, drew representatives from DOE, the Department of Defense, National Science Foundation, NASA and the National Institutes of Health to discuss the National AI Research Resource (NAIRR) initiative, which aims to develop trustworthy AI and a vision for a widely accessible national cyberinfrastructure to advance a U.S. AI R&D environment. The initiative launches in Jan. 2024.

Also, at the DOE booth, engineering software lead Jamie Bramwell and computational physicist Aaron Skinner presented on efforts at LLNL to develop portable HPC multiphysics codes, focusing on MARBL and the Livermore Design Optimization (LiDO) codes. Lab investments in performance portable applications, particularly those utilizing GPUs, has led to notable speed-ups in codes like MARBL, ARES and ALE3D, Skinner said. Skinner also outlined goals for the exascale GPU era, redefining what constitutes a “heroic” 3D calculation and aiming to do them in a single workday.

"Our goal is to maintain portability across a wide range of architectures and still maintain the types of performance that we've had on prior [HPC] architectures and all the way down to laptops,” Skinner said.

Bramwell said LiDO showcases a cutting-edge approach to modeling and talked about the application of the code to electrode battery design. She said she was particularly enthusiastic about its utilization in 3D-printed liquid crystal elastomers and its ability to perform simultaneous shape and parameter optimization.

In addition to the talks, SC23 Inclusivity co-chair and LLNL’s Office of Inclusion, Diversity, Equity and Accountability Program Director Tony Baylis chaired a session called “What’s Inclusivity Got to Do with HPC at SC conferences?” The afternoon also saw a panel on HPC and cloud-converged computing, moderated by LLNL computer scientist Dan Milroy, and featuring Gamblin, who discussed LLNL's collaboration with cloud vendors.

SC23_1114_tut_wrk-105671
LLNL computer scientist Dan Milroy moderates a panel on HPC+cloud computing at the SC23 conference in Denver. Photo courtesy of SC Photography.

We all scream for SCREAM

Nov. 16 brought the moment many had waited for: the SC23 awards presentation. The multi-institutional, LLNL-led SCREAM team won the inaugural Gordon Bell Prize for Climate Modelling, recognizing the team for “innovative parallel computing contributions toward solving the global climate crisis.” The award will be given out annually for the next 10 years to recognize the contributions of climate scientists and software engineers in addressing climate change.

SCREAM project lead and LLNL staff scientist Caldwell acknowledged the team’s collective effort, which included contributions from seven DOE national laboratories. The submission was led by Energy Exascale Earth System Model (E3SM) chief computational scientist Mark Taylor of Sandia. Caldwell said he hoped the award is seen as a positive force, not only for the team but for the broader community engaged in climate modeling and computational research.

“This climate modeling prize feels like an affirmation that the work that we're doing is important,” Caldwell said. “We are at a time in climate modeling and in computer science where we can take this real quantum leap into higher resolution and more accuracy. This award motivated us and served as a North Star in terms of getting faster and doing our simulations the best that we can. I can't say how appreciative I am to Gordon Bell for creating this prize. I think it's a great thing for the community, and hopefully, for humanity.”

SC23_1116_awards-77
LLNL’s Johannes Doerfert (center) received his IEEE-Computer Society Technical Community for High Performance Computing’s Early Career Researchers Award for Excellence in High Performance Computing at the SC23 Conference. Also pictured are Wenqian Dong (left), an assistant professor at Florida International University, and Prashant Pandey (right), an assistant professor of computer science at the University of Utah. Photo courtesy of SC Photography.

 

During the ceremony, LLNL’s Johannes Doerfert also received his IEEE-Computer Society Technical Community for High Performance Computing’s (TCHPC) Early Career Researchers Award for Excellence in High Performance Computing. The TCHPC award recognizes individuals who “have made outstanding, influential and potentially long-lasting contributions in the field of high-performance computing” within five years of obtaining their Ph.D. degrees as of Jan. 1 the year of the award, according to IEEE-CS.

“At the end of the day, what I feel is most important for me is that I work in a very downstream world that isn’t very user-facing,” Doerfert said. “Being recognized as having an impact in HPC is really nice, because it's hard for me to see my impact. I never consider myself an HPC person, so this kind of recognition in the field — as someone that is production-oriented — is fantastic.”

The day also featured BOFs bringing together the Spack community (led by Gamblin, Becker and Tammy Dahlgren) and operational data analytics (led by Kathleen Shoga).

processed-02B0C143
LLNL computer scientists Greg Becker (left) and Todd Gamblin (center) were presented with the HPCwire Editor's Choice Award for the Best High Performance Computing Programming Tool or Technology at SC23 for the Spack package manager. They are pictured with Tabor Communications CEO Tom Tabor (right). Photo courtesy of HPCwire.

On to Atlanta in ‘24

With many attendees already homebound to locales across the world, the conference concluded on Nov. 17, with the First International Workshop on HPC Testing and Evaluation of Systems, Tools, and Software (HPCTESTS 2023).

During the session, LLNL computer scientist Olga Pearce delved into the complexities of benchmarking, noting the need for machine-agnostic benchmarks and introducing a roadmap for future benchmark development.

Simultaneously, LLNL’s Doerfert participated in a panel exploring the evolving role of compilers and identifying crosscutting issues between HPC and AI workloads. Doerfert, who has been involved in the open source LLVM compiler framework since 2014, said he is trying to attract more people from outside the Lab to LLVM to benefit and grow the community.

Throughout the hectic week, SC23 showcased LLNL's active participation and considerable presence in the field of HPC, with more than 100 Computing employees attending in person and many others participating virtually. Over a dozen LLNL employees also contributed their time by serving on the SC23 conference committee. SC24 will be held in Atlanta.

For a full list of LLNL events and contributors, visit https://computing.llnl.gov/about/newsroom/sc23-event-calendar.