Two-week workshop lets UC Merced students step into shoes of Lab computer scientists

merced students (Download Image) UC Merced students visited Lawrence Livermore National Laboratory for a two-week crash course in what it’s like to work on real-world data science problems at LLNL. Photo by Julie Russell/LLNL

University of California Merced student Asmaa Mohamed, who immigrated to the U.S. from Egypt in 2013, recently graduated with a bachelor’s degree in developmental biology. She plans on pursuing her Ph.D. through Dartmouth College and is studying immunology in hopes of someday creating novel cancer immunotherapies.

When Mohamed found out Lawrence Livermore National Laboratory (LLNL) would be holding its first-ever Data Science Challenge Workshop, a two-week crash course in what it’s like to work in data science at LLNL, she jumped at the opportunity to learn more about how biology and computation can intersect, and find out about the Lab, of which she was only vaguely aware.

"What I’d heard was that it was in the middle of nowhere, that it was an isolated place, so I didn’t know what to expect," Mohamed said. "They say a lot of secret stuff happens here, but we get here and everyone’s so nice, and there’s so much research going on. It’s like academia, but much more focused. You have all the resources here that anyone would need, so that’s very fascinating."

From May 20-31, Mohamed joined 20 of her undergraduate and graduate classmates, many of them first-generation college students, at the Lab to take part in the workshop. While they were on site, the students, along with their Lab mentors, were tasked with using machine learning and other computational methods to tackle real-world problems in computational immunology. Over the course of the workshop, the students — majors in a broad range of subjects including computer science, applied math, cognitive and information science, biology and even psychology — attended lectures and seminars, worked on exercises and toured the Livermore Computing facility, the Additive Manufacturing Lab and the National Ignition Facility. In their daily working groups, the students explored machine learning-based methods for precision medicine based on existing datasets.

One challenge looked at identifying novel therapeutic strategies for cancer treatment and another on automated hypothesis testing, both problems using different methods of reinforcement learning. A third challenge involved predicting the interactions between antibodies and antigens using mathematical descriptions of real antibodies to train and measure the performance of machine learning models, which could in turn impact vaccine design and help develop countermeasures. Two distinct groups worked on the latter project, each group taking unique approaches to design and select "features"  numerical descriptions of the antibody/antigen interactions — that could be useful in predicting changes resulting from mutations. The results were then tested on actual data from scientific literature.

Event organizer and LLNL computer scientist Marisol Gamboa said the challenges were preliminary research problems and that students weren’t expected to solve them completely. More importantly, she said, the workshop was intended to provide students with exposure to work that could be done with higher-level degrees, encourage students to pursue master’s and Ph.D. programs and create a workforce pipeline between the Laboratory and UC Merced.

"The goal was to create multidisciplinary teams to give the students a real-world experience of working here at the Lab and put them to work on actual problems," Gamboa said. "I was amazed at the amount of knowledge gained, the progress that was made and the solutions that were presented in just two weeks. We know that diverse teams with different backgrounds and disciplines build better solutions, but we witnessed that at its best with this group. They came together from different degree programs to prove yet again that diversity in skills and backgrounds does really build better teams and better solutions."

LLNL Computing Scholar Program Administrator Jamie Lewis, who co-organized the workshop, said: "The exposure to this level of science is not something [the students] can get anywhere else. We wanted them to get the feel for what it’s really like to work here, to have this goal and work toward it every single day. When we hear the students say, ‘Oh I would love to come work here,’ then we know this is working."

By the end of the workshop, many of the students did express their desires to work at LLNL. UC Merced graduate student Christine Hoffman, who is nearing the end of her Ph.D. in applied mathematics, said she is starting to look at jobs post-college, but isn’t sold on either academia or the private sector. After her experience, a career at a government lab is now on her radar.

"I thought getting a sense of what it’s like at a national lab, and also to hear from other people about how they got here, would be very valuable. I also wanted to get a taste of machine learning, because I know that’s a hot new field," Hoffman said. "What attracted me about LLNL is it’s an in-between [of academia and industry]. I found out you can come here and start working on a different project initially and grow into a different role as you go. That’s been really meaningful and interesting because that’s what I want — a job where I can grow and learn new things."

The workshop was sponsored by LLNL’s University Relations Program and co-organized by the Computing Directorate, the Center for Applied Scientific Computing and the Data Science Institute (DSI). DSI Director Mike Goldman said the event was a "novel activity that we’re trying to champion," and if successful, could be used as a prototype for similar programs involving other UC campuses.

Open to all UC Merced students, the workshop drew intense interest when it was announced, receiving nearly 60 applicants, according to UC Merced applied mathematics associate professor Suzanne Sindi. Established in 2005, UC Merced has a high population of first-generation and minority students, a state-of-the art campus, and bills itself as "the first American research university built in the 21st century." As such the university takes interdisciplinary data science and research projects seriously, Sindi explained. The workshop, she said, helped lift the veil for students on the work being done at national laboratories and allowed them to imagine themselves on a career path in government-funded science and computation.

"It’s especially meaningful for the students because these are areas that Lawrence Livermore is actually working on — these aren’t toy problems they made up just for this workshop," Sindi explained. "It’s a learning opportunity for the students, but it also gives them the chance to really see what kinds of things people do at Livermore. It’s a totally different experience to come and see what the Lab is about, and that’s a really important part of the program. In Silicon Valley, you hear all these big, splashy things about what happens in the startup world, so I think it’s important to get that same kind of exposure to the kinds of things that happen in the Lab."

Jonathan Anzules, a third-year grad student at UC Merced, said he was in the process of finishing up a project involving understanding the dynamics of autoimmune disease and was in search of a new challenge. He found it during his visit.

"When I heard that one of the projects here would be something about the immune system, I got really excited," Anzules said. "When I came here, I got exposed to the whole field, the combination of biology, the immune system and computation, all at the same time. Because of that, I already have an idea of what I should be doing for my second project. To find out that there are datasets out there that I could use for something I’m interested in is really exciting."

On May 31, the students presented their results and discussed the skills they gained over the course of the workshop. An "Immuno-A" group led by Anzules, who is studying the immune system through computational methods, was able to predict how mutations would affect antigen-antibody interactions with about 79 percent accuracy using machine learning. Working on the same problem, the "Immuno-B" group led by UC Merced postgraduate student Sabah Ul-Hasan, who recently finished her Ph.D. in quantitative systems biology, identified optimal antigen-antibody binding features using a different machine learning strategy, improving on the accuracy rates they were provided with at the start of the challenge.

Hoffman’s group successfully used reinforcement learning to learn physically interpretable mathematical expressions that best describe a dataset, while "Team Botakz," led by third-year cognitive science Ph.D. student Ayme Tomson, applied reinforcement learning to a mobile app developed by Moffitt Cancer Center that uses a predictive mathematical model allowing players to adjust treatment strategies and see the impact on tumor growth. Tomson’s group found that a simple decision tree could replace a complex neural network policy and still maintain efficacy, making it easier for clinicians to interpret.

The challenge problems were led by LLNL computer scientists Tom Desautels and Brenden Petersen, who served as mentors for the students, presenting daily lectures, interacting with students and helping them overcome any obstacles they encountered. Desautels said the antibody/antigen interaction problem was particularly challenging because data are limited, and the interactions aren’t easy to describe numerically.

"I’ve been very impressed by the students and what they’ve accomplished," Desautels said. "They’ve come together and used their diverse skill sets to do some really impressive work in a quite limited time. My hat is off to them."

Petersen, who mentored the precision medicine and hypothesis testing teams, said, "these are not simple problems — just understanding the problem itself requires a good deal of background information to get up to speed. But even by the end of the first week, the students were making real progress, and by the end of two weeks, demonstrated some impressive results."

UC Merced graduate Mohamed, who worked on the antibody/antigen interaction problem, said she learned a lot about herself from the workshop and gained a sense of the power of an interdisciplinary team. Her career options also have expanded.

"I don’t know what the future holds but I’m definitely excited that I had this opportunity to see I can do other things with a Ph.D. I wanted to become a professor, but I didn’t know that this other field existed. Now, my eyes are open to consider other things," Mohamed said. "I’m really appreciative and hopefully many more UC Merced students get to experience such an amazing opportunity like this."