May 26, 2022

LLNL and Amazon Web Services to cooperate on standardized software stack for HPC

Jeremy Thomas, thomas244 [at] llnl.gov, 925-422-5539

Lawrence Livermore National Laboratory (LLNL) and Amazon Web Services (AWS) have signed a memorandum of understanding (MOU) to define the role of leadership-class high performance computing (HPC) in a future where cloud HPC is ubiquitous.   

Under the MOU, LLNL and AWS will explore software and hardware solutions spanning cloud and on-premises HPC environments, with the goal of establishing a common stack of open-source software components that can run equally well at both large HPC centers and on cloud resources.

“The cloud HPC market is growing, and clouds are becoming a viable way to run HPC jobs,” said computer scientist Todd Gamblin, who is leading the effort for LLNL. “More software is being developed for the cloud environment than for traditional HPC centers, and larger portions of our workflows are going to start looking like cloud software. So, we want to be in tune with mainstream software development, take advantage of this ecosystem and be able to deploy it easily, the way that clouds do.”

LLNL and AWS have an existing open source collaboration involving Spack —  a software package manager Gamblin and his team developed for HPC machines. Building off that collaboration, LLNL and AWS will look to better understand how HPC centers can best utilize cloud resources to support HPC and will explore models for cloud-bursting, data staging and data migration for deploying both on-site and in the cloud.

"In working with LLNL, AWS addresses a growing request from our customers to provide cloud capabilities that can support workloads running at the world’s largest HPC centers, including those that integrate AI and complex simulation capabilities," said Ian Colle, general manager HPC at AWS. "Furthermore, we look forward to working together to improve the user experience at the intersection of the AWS cloud and HPC centers."

In addition to increased flexibility for users, Gamblin said a common HPC software stack could increase competitiveness, stimulate tech transfer and present a “more natural direction” for working with industry partners, who can more easily run most of their HPC jobs in the cloud.

“We're trying to reduce friction and make it possible to have both a nice software stack that’s easy to use but also is portable and gets performance out of hardware — the holy grail for developers,” Gamblin said. “Traditionally, we have a long multi-month period of getting acquainted with each other's machines; if we meet industry where they are and teach them how to use our stack, it's much easier to engage and quicker to get to the interesting work.”