Oct. 5, 2022

Earth System Grid Federation launches effort to upgrade global climate data system

Anne M Stark, stark8 [at] llnl.gov, (925) 422-9799

The Earth System Grid Federation (ESGF), an international multi-institutional initiative that gathers and distributes data for top-tier computer simulations of the Earth’s climate, is preparing a series of upgrades that will make using the data easier and faster while improving how the information is curated.

A new project called ESGF2, encapsulates the primary U.S. contribution to ESGF, led by the Department of Energy’s Oak Ridge National Laboratory (ORNL) in collaboration with Argonne (ANL) and Lawrence Livermore (LLNL) national laboratories, is now part of those international efforts to upgrade and modernize the global data system that is integral to some of the most important, impactful and widely respected projections of the Earth’s future climate: those made by scientists working with the Coupled Model Intercomparison Projects for the World Climate Research Programme.

“ESGF data are about the future of life on Earth,” said Forrest Hoffman, lead for US ESGF2 project and the Computational Earth Sciences group at ORNL. “By providing scientists easy access to the full collection of international models, across multiple experimental phases, ESGF enables them to build the very best understanding about the potential future about the future trajectories of our climate.”

A key ESGF mission is to support the data needs of scientists undertaking global climate change research, and notably those who prepare the United Nations Intergovernmental Panel on Climate Change’s comprehensive climate assessments released every six to seven years. ESGF climate  data underpin IPCC landmark reports such as the recent Sixth Assessment Report, AR6, and its working group findings. The data also informs IPCC special reports focused on climate vulnerabilities, adaptation scenarios and mitigation strategies.

Another important aspect of the ESGF’s mission is to ensure that scientific investigation is transparent, collaborative and reproducible, given its direct impact on worldwide climate research and potential use in government and commericial decision making.

“All of the Earth system model data that go into the IPCC reports and all of the most important simulations of the climate from around the world are stored in the ESGF,” Hoffman said. “The federation is a unique consortium of community-minded institutional that aims to get data into the hands of the tens of thousands of researchers and stakeholders who analyze it and compare it with observational data to constantly update our best projections of the future.”

earth 100522
A simulation of the planet from the DOE Energy Exascale Earth System Model, one of the large-scale models with output available through the Earth System Grid Federation . Image courtesy of the Department of Energy.

In the new ESGF project, computational scientists are working to improve data discovery, access and storage in direct collaboration with international federation partners. Their work will leverage the latest software tools, cloud computing resources, the world’s most powerful supercomputers and DOE’s Energy Sciences Network, or ESnet. ESnet currently enables 100 gigabit-per-second transfer rates among national laboratories and connections to other national and international universities and research centers. An upgrade expected by the end of the year will boost ESnet transfer rates to as much as 400 Gbps.

"Working with federation partners in Germany, United Kingdom, European Union nations, Australia and other countries, we will develop and deploy modernized and cyber-secure system for distributing model output data to the global scientific community,” Hoffman said. 

The federation operates as a network of large computer nodes hosted in the United States and 17 other countries, functioning in tandem as one large data archive. ORNL, ANL and LLNL are collaborating with these partners to improve the reliability and scalability of the system, providing a smooth data replication process that ensures the broader scientific community has access to data from all ESGF sites. ORNL and ANL are currently hosting a dual backup of the more than 8 petabytes (and counting) of ESGF data originally replicated to LLNL,  taking advantage of the world-class computing systems operated ted at the labs.

Developing robust user interfaces and secure, reliable archives

The multiyear U.S. DOE-funded ESGF2 project has already replicated existing data and is providing the storage and computational services needed to deliver data for the user community while it builds out new infrastructure and services to augment ongoing international federation efforts. ESGF partners created a roadmap in 2020 to guide the development work of all federation partner activities.

ORNL brings substantial experience with big data centers and large-scale modeling and simulation to its leadership role in the ESGF project. The lab is home to the Oak Ridge Leadership Computing Facility, a DOE Office of Science user facility whose Frontier exascale computing system was recently ranked as the world’s fastest, as well as the Climate Change Science Institute, which brings together data experts, modelers and experimentalists to accelerate understanding of climate change and its impacts.

“ORNL is in the unique position of knowing about big data and also knowing about climate and serving as host to very large data centers and the interfaces that make that information easily accessible by scientists around the world, and we are excited to be adding new resources to ESGF," Hoffman said.

The Argonne Leadership Computing Facility lends its unique capabilities, as well as the Globus research data management system, operated for the research community by the University of Chicago, to the federation. Globus services will be used in the ESGF deployment for authentication and for data indexing, access and replication.

“The terabytes and petabytes generated by the climate models of today require new approaches to data management and analysis,” said Ian Foster, ANL lead for the project. “We will enable not only faster download of data subsets, but also previously infeasible data analyses on ANL and ORNL supercomputers.” ALCF is a DOE Office of Science user facility.

Lawrence Livermore also brings a wealth of high-performance computing and data center expertise and capabilities, creative technologies and software solutions to ESGF2, plus its experience as the initial lead of the ESGF.

“The upgrades will make it easier and faster for users to access the data that can help us better understand what climate will look like in the future," said Sasha Ames, LLNL lead for ESGF2.

The ESGF projectis sponsored by the Biological and Environmental Research program within DOE’s Office of Science. ESGF is co-funded by the National Aeronautics and Space Administration, the National Oceanic and Atmospheric Administration and the National Science Foundation in the US, by the Infrastructure for the European Network for Earth System Modelling in Europe and by numerous international research institutions and academic partners.

UT-Battelle manages ORNL for the Department of Energy’s Office of Science, the single largest supporter of basic research in the physical sciences in the United States.The Office of Science is working to address some of the most pressing challenges of our time. For more information, please visit the web.

 —Stephanie Seay/ORNL