For the first time, students from the University of California (UC) Merced and UC Riverside joined forces for the two-week Data Science Challenge (DSC) at Lawrence Livermore National Laboratory, tackling a real-world problem in machine learning (ML)-assisted heart modeling.
Held in the Livermore Valley Open Campus’s newly remodeled University of California Livermore Collaboration Center from July 10-21, the event brought together 35 UC students — ranging from undergraduates to graduate-level students from a diversity of majors — to work in groups to solve four key tasks, using actual electrocardiogram (ECG) data to predict heart health.
According to organizers, the purpose of the challenge was to give students a taste of the broad scope of work that goes on at a national laboratory and provide them with experience working in interdisciplinary teams.
“One of my main goals is developing the students’ technical skills, but I also want to get people excited about going to grad school who hadn't thought about it yet,” said LLNL computer scientist Brian Gallagher, who co-directed the DSC program. “I want to get people thinking about a science career, not just a web development career or software engineering career. The students have been super engaged and interested in career paths and job progressions. Many have asked how to get an internship, which I take as a good sign.”
The tasks were conceived by LLNL ML researcher Mikel Landajuela, who served as DSC lead mentor and introduced the problem to the cohort on the first day of the challenge. The tasks ranged from a simple classification problem — using ML to distinguish a healthy heart from an abnormal heart and diagnosing the condition — to the most complicated task of reconstructing a full heart activation map from 12-lead ECG data taken from 75 areas of the heart. The models can be used for heartbeat simulations and more advanced diagnostics of heart conditions.
Organizers said they chose an open dataset so students could continue to practice after their DSC experience. DSC co-director Cindy Gonzales, group leader for the Intelligent Detection, Exploitation and Analysis team at LLNL, said she felt the challenge problem “ran the gamut” on difficulty and required working with “messy” data: the kind that students would encounter in the real world. Gonzales said she hoped the event gave UC Merced and UC Riverside students more visibility at the Lab and externally, and that the students learned the difficulty of working on a real-world problem.
“They’re learning from one another, which I think is less daunting than having to come to us as the subject matter experts,” Gonzales said. “Some students have put together full-on tutorials for their teams, which has been really cool to see. Other students are much stronger on the data-science side but don't understand the biology side, so they're having to learn how to normalize heart-rate data and to reach out and ask questions to the challenge problem lead.”
Each of the eight teams were led by a Ph.D. student. Gonzales said she selected the teams so that each had a data scientist with ML experience and paired undergraduate students with strong team leads with backgrounds in data-science concepts.
At the end of the challenge, the students presented their work in a poster session, another first for the DSC, which was instituted in 2019 with UC Merced. The poster session added a “race to the finish” element to the challenge, according to Suzanne Sindi, a mathematical biologist and UC Merced Applied Math department chair, who has been involved with the program from the beginning.
Sindi, the UC Merced faculty lead, said she was pleased by the “fantastic” camaraderie between the two campuses. She was most excited about teaching students how to use skills they’d learned in school in ways they hadn’t thought of before, exposing students to the national labs and working outside of the traditional classroom environment.
“A lot of our students are the first ones in their family that have ever gone to college, and maybe they have a much more limited view about what they actually could do with their careers,” Sindi said. “I think it's really meaningful for them to be here. It’s one thing for a professor to say, ‘well you could get a job at a national lab,’ but it's totally different to actually see what it's like and appreciate that it is a place where you could actually see yourself someday.”
For the grad students, Sindi said, the DSC gives them research experience and a unique setting that sparks them to share the experience with other grad students. And for other faculty members like herself, it's been a good way to connect and learn more about individual researchers at the labs they might want to collaborate with someday.
Daniel O’Connor, a master’s student in engineering at UC Riverside, said he was drawn to the challenge out of his interest in applying data science to heart health. He didn’t know what to expect at first, but enjoyed the team working environment and hearing from LLNL staff about the autonomy of working at the Lab and the freedom to pursue various research interests. O’Connor said the experience exceeded his expectations and “put Lawrence Livermore on the map” for a possible future career.
“It's been awesome,” O’Connor said. “I feel like it is the perfect level of difficulty of a challenge where I'm learning a ton, but not overwhelmed. I'm surrounded by a bunch of great resources, like the other teams and a lot of the Ph.D. students. It's amazing to be able to work on a task and have all your questions answered, or have someone who can at least assist you with the theoretical understanding of how and why it works. It's been really enjoyable.”
Shamima Hossain, a UC Riverside graduate student in data science and data analysis, was one of the team leads. She said her motivation in attending was seeing how the time-series heartbeat data could be used differently from the time-series data she uses in her Ph.D. work, where she uses sensor data collected from beehives to model bees’ behavior. Depending on the temperature changes in the environment, she can see how the temperature is impacting bee populations, with a goal of giving beekeepers early warning so they can take measures to save their hives.
Hossain said through the course of the challenge, she discovered a link between that work and work at the national laboratories, and that she was strongly considering applying for an internship after hearing from LLNL staff.
“It's a very good opportunity to interact with the staff, because every day I'm interacting with new people and making new connections,” Hossain said. “I didn't know how to apply for jobs for national labs or how to apply for an internship, or when I apply what will they look for, so that's helpful and beneficial for when I apply for internships or jobs.”
Javier Miranda is a fifth-year student at UC Merced, where he is studying computer science and engineering. Miranda applied for the DSC because he thought the challenge problem was impactful, and despite his past experience with hackathons and coding in Python, found the challenge “difficult but rewarding.”
“The program was very intensive, but I learned so much,” Miranda said. “I learned about different types of models and the different types of parameters that are involved in making your model more accurate. I have a learned a lot about complications with models like overfitting, and the learning rates of models. I've also learned about different libraries in Python, and about different types of tasks, like classification, that models can do and their capabilities.”
UC Merced Ph.D. student Asees Kaur said as a mathematical biology major, she found the challenge problem right up her alley and was thrilled to be able to teach her group some of the fundamentals of data science.
“We got to network with a lot of people who work here and understand how the professional life goes at the Livermore Lab, and we got a hands-on experience with a lot of guidance from professionals here,” Kaur said. “The challenge was not only the actual mathematical problem, but also in how I managed the team, because everyone in my team was at a different level in machine learning and data science terms. That was also quite a challenge, but I'm loving it.”
Besides their group working sessions, the students attended seminars, learned new concepts in ML and neural networks through talks by team leads, engaged with Lab mentors and toured the National Ignition Facility. UC Merced DSC alums and past team leads who are now Lab Data Science Summer Institute (DSSI) and Computing interns also visited to help the students with any problems.
Jocelyn Ornelas-Munoz, a former UC Merced DSC team lead and current DSSI intern, said she got her first real exposure to ML and deep learning by attending a previous DSC and wanted to “pay it forward” by working with this year’s cohort.
“The Data Science Challenge was my jumping board,” Ornelas-Munoz said. “Now my research is fully focused on machine learning and deep learning, so it’s nice to see students being interested in deep learning and their different approaches to understanding and juggling it. There was one student who was trying to recreate a neural network from scratch, so he was really trying to get deep into understanding it. I think that's really cool.”
Over the course of the challenge, LLNL staff members Brian Weston, Amar Saini, Jim Gaffney, Indra Chakraborty, Omar DeGuchy and Nipun Gunawardena gave informal talks on their work at the Lab. LLNL’s Director of the Academic Engagement Office & Science Education Eric Schwegler provided students with a Lab overview, while Senior Science Adviser Dave Rakestraw gave a talk on “Physics with Phones.” DSSI co-director Nisha Mulakken also stopped by to give the students advice on applying for DSSI internships.
The students also attended a joint seminar with the Livermore Lab Foundation featuring Amy Aines, a communications expert who co-wrote the book "Championing Science" with LLNL Energy Program Chief Scientist Roger Aines, who discussed how to communicate scientific and technical ideas to nonscientists.
The challenge also was facilitated by UC Riverside associate professor Vagelis Papalexakis and LLNL’s Sira Neily provided administrative support. The DSC was sponsored by LLNL’s Data Science Institute, the Center for Applied Scientific Computing and the Academic Engagement Office, and received additional support from the University of California Office of the President’s Office of the National Laboratories.
thomas244 [at] llnl.gov