A Lawrence Livermore National Laboratory (LLNL) team claimed a top prize at an inaugural international symbolic regression competition for an artificial intelligence (AI) framework they developed capable of explaining and interpreting real-life COVID-19 data.
Hosted by the open source SRBench project at the 2022 Genetic and Evolutionary Computation Conference (GECCO), the competition had two tracks — synthetic and real-world — and invited teams to submit their best symbolic regression algorithms. Organizers trained the models on datasets, assigned “trust ratings” and evaluated them for accuracy and simplicity.
LLNL computer scientist Brenden Petersen and his team’s “Unified Deep Symbolic Regression” (uDSR) algorithm beat out 12 other teams on the real-world track — a task to build an interpretable predictive model for 14-day forecast counts of COVID-19 cases, hospitalizations and deaths in New York state.
“We have seen in recent years the ability of AI to accelerate and unlock new paths for science,” said team member Jiachen Yang. “Our team’s achievement in this competition shows we are working at the leading edge of this rapidly growing field.”
Team member Mikel Landajuela recalled that six months before their Laboratory Directed Research and Development (LDRD) project ended, the team “had a grand idea on how to combine several modern deep-learning algorithms with classical search methods into a unified framework.
“We knew the combination would be extremely powerful, and we made a final sprint to get it done,” Landajuela explained. “After all the hard work, the results of the competition felt especially rewarding.”
The team’s uDSR method is an updated version of their earlier deep symbolic regression algorithm — a reinforcement learning approach using deep neural networks — that finds short mathematical expressions to best fit experimental data and uncovers underlying equations or dynamics of physical processes. The initial framework outperformed previous state-of-the-art baseline methods on benchmark problems. Researchers said the new version unifies deep symbolic regression with four other classes of state-of-the-art symbolic regression algorithms, hybridizing their key capabilities into a single, modular framework.
“We look forward to seeing more interpretable models of real-world data be uncovered by uDSR for other natural and social science problems,” explained team member Chak Lee.
SRBench, which seeks to create a living benchmark of modern symbolic regression algorithms and problems, held the competition to help distill algorithmic design choices and improve the practice of symbolic regression by evaluating submitted methods sourced mainly from the domains of physics, epidemiology and bioinformatics, according to their website. The uDSR algorithm also placed third on the competition’s synthetic track.
uDSR is part of a larger framework called Deep Symbolic Optimization (DSO), developed by the team as part of the LDRD. DSO is now being used in several other programs at LLNL: for example, to optimize antibody sequences to bind emerging pathogens.
“This competition is a big win for the Laboratory because our underlying DSO framework is being used for several Lab missions, not just symbolic regression,” said Petersen, who serves as the principal investigator on the project. “This victory establishes that we are supporting several Laboratory programs with bleeding-edge technologies of a highly competitive field.”
The LDRD Disruptive Research Program, a portfolio composed of projects considered to be high-risk and high-reward, funded the work. The DSO software framework is open-source and available on GitHub.
In addition to Petersen, Landajuela, Lee and Yang, the uDSR team includes LLNL researchers Ruben Glatt, Ignacio Aravena Solis, Claudio Santiago and Nathan Mundhenk.
thomas244 [at] llnl.gov