Virtual LLNL-UC Merced Data Science Challenge tackles asteroid detection though machine learning

data science (Download Image)

The recent three-week Data Science Challenge was the third such annual event for Lawrence Livermore National Laboratory and the University of California, Merced. Meeting virtually three times per week, 22 UC Merced students engaged with LLNL mentors and peers to address a real-world challenge problem, using machine learning to identify potentially hazardous asteroids that could pose an existential threat to humanity,

Over three weeks, students from the University of California, Merced collaborated online with mentors at Lawrence Livermore National Laboratory (LLNL) to tackle a real-world challenge problem: using machine learning to identify potentially hazardous asteroids that could pose an existential threat to humanity.

The Data Science Challenge was the third such annual event for LLNL and UC Merced and the second held in a virtual format. Meeting three times per week over Webex, 22 UC Merced students engaged with LLNL scientists on exercises and assignments, attended seminars, took virtual tours and worked on deep learning models with their peers. UC Merced graduate students served as leads for four teams of students and provided skill development on “off-days,” where they discussed data science fundamentals and exposed them to data visualization, neural networks and projects outside the Lab.

Throughout the event, the teams tackled problems around the theme of “Astronomy for Planetary Defense.” For the main challenge, students were tasked with applying deep learning models to optical astronomy data to detect and identify Near Earth Objects (such as asteroids).

The students began with an image classification tutorial before moving on to building classifier models to identify stars and galaxies, and finally asteroids, using image data from the Zwicky Transient Facility (ZTF), an astronomical survey at the Palomar Observatory near San Diego. LLNL is an institutional member of ZTF under a partnership between LLNL’s Data Science Institute (DSI) and the Space Science and Security Program within the Lab’s Global Security Directorate.

LLNL computer scientist Brian Gallagher, who took over as Data Science Challenge director after serving as a mentor last year, said leading the program during COVID-19 was a “trial by fire,” but the event achieved its goals thanks to the combined effort of numerous co-organizers including LLNL administrative specialist Jennifer Bellig and UC Merced applied math professor Suzanne Sindi, as well as Lab mentors and student team leads.

“My overarching goal for this program is to provide an environment that is conducive to growth for everyone involved,” Gallagher said. “In addition to the students, we had a number of early career LLNL mentors involved and all of them stepped up in a big way and really helped to shape the program. We all have a lot to learn from each other, and I certainly include myself in that. There are times during these programs when the growth that everyone is undergoing is palpable. You can see and feel the changes in people from day to day. That’s the best part for me.

Virtual collaboration a ‘fulfilling’ experience

Each day of the Data Science Challenge began with check-ins and lectures, where students and mentors worked through pitfalls and listened to tutorials from Lab mentors. At the end of the day, mentors and team leads met for updates on their teams’ progress in creating their models, as well as discussing problems they encountered and what they did to overcome them.

Lab scientists from the Astronomy and Astrophysics Analytics group in the Physics Division and the Center for Applied Scientific Computing (CASC) — James Buchanan, Kerianne Pruett, Ryan Dana, Imene Goumiri, Ben Priest and Nathan Golovich — served as mentors and gave presentations. Students also listened to astronomy-themed talks by LLNL scientists in lecture sessions; Buchanan spoke on using machine learning to identify galaxy blending, Amanda Muyskens on classifying stars and galaxies through Gaussian processes, Pruett on astronomy background and fundamentals, Will Dawson on finding microlensing black holes and Travis Yeager on inner solar system asteroid stability.

Dana, a former Data Science Summer Institute (DSSI) student intern who became a full-time staff data scientist last January, led efforts to select a challenge problem. He said the team picked Near Earth Objects because it connected the LLNL astronomy group with institutional collaborators at ZTF, UC Merced, DSI, CASC and key Global Security missions.

“The experience has been incredibly fulfilling to mentor such incredible graduates and undergraduates — the graduate students also really stepped up in making the experience fulfilling to the wide range of undergrads in both domain expertise and experience,” Dana said. “My hope is that all of the students have greatly accelerated their data science careers forward, whether they are just beginning or about to defend their dissertation. I hope they accomplished more than they ever thought possible in the short three weeks and that it strengthened their ability to collaborate on short deadlines.”

Searching for needles in a haystack

On the first day of the challenge, CASC Director Jeff Hittinger and DSI Director Mike Goldman welcomed the students, who worked in their individual teams throughout the week and attended a seminar by LLNL bioinformatics software developer Marisa Torres. The second week featured more working and mentoring sessions, a virtual tour of the National Ignition Facility (NIF), a seminar by NIF physicist Laura Kegelmeyer on using machine learning to manage damage on NIF optics and a panel of Data Science summer internship and UC Merced alumni, including LLNL data scientists Amar Saini and Omar Deguchy, and UC Santa Cruz alumni Mary Silva.

Week three brought more collaborative sessions, a two-hour resume writing workshop and a seminar by Brenden Petersen. The challenge capped off on June 4 with student briefings, where the teams gave hourlong presentations explaining their approaches to building their classifier models, their results, future potential improvements and what they learned over the course of the challenge.

Amandeep Kaur, a UC Merced Ph.D. student in applied mathematics, said her team approached the challenge problem by learning and teaching each other the basics. Some team members developed their own models and helped the others in achieving their results. While she would have preferred to have worked in person and visit LLNL, she liked the virtual environment for the ease of communicating with her team.

“We were able to connect with each other at any time we needed to,” Kaur said. “It was a wonderful experience. Every single member in the Data Science Challenge was compassionate, respectful and generous. Overall, I learned a lot.”

UC Merced applied mathematics Ph.D. student Jocelyn Ornelas Muñoz, who led a team of undergrads mentored by Priest, said the challenge gave her the opportunity to expand her leadership skills, learn more about applying machine learning to real-world data and assist students in an area she was passionate about. Additionally, the mentor lectures and seminars provided her with insight about the kinds of problems LLNL scientists work on every day.

“My biggest takeaway was the connections I made and everything that I learned, technical and not. The diversity of people at LLNL and areas that are being explored is astonishing," she said. "The LLNL scientists and administrative staff helped me get more exposure on the topics that they are working on, which is helpful as I continue to think about my own research interests and career goals.”

Ornelas’ team explored convolutional neural networks in PyTorch and Keras for the star-galaxy classification problem, achieving an accuracy as high as 97 percent. For the asteroid problem, her team tried different methods such as OpenCV (Open Source Computer Vision Library) to detect light in the difference images that could be asteroids, with a goal of identifying 20 synthetic asteroids.

“It was a pleasure to see the amount of personal and professional growth that the students experienced, their problem-solving sessions and how they worked together despite the virtual environment,” Ornelas said.

To help foster camaraderie in the online space, Data Science Challenge co-organizer Bellig said students were encouraged to turn on their cameras, ask questions and engage with other students through work sessions and in social activities like Kahoot trivia and a scavenger hunt.

“Planning and coordinating this challenge during COVID was a team effort,” Bellig said. “Brian and Suzanne (Sindi) were integral in ensuring we provided opportunities for students to learn, get to know each other and have fun. The virtual environment did pose some challenges to accomplish connection, but I feel we did a fantastic job to facilitate it.”

“It’s been really rewarding to work with everyone,” Sindi added. “Every year I’m very impressed with the team leads and this year, given that I expected them to exceed our expectations, they really did.”

A second Data Science Challenge will be held this year with UC Riverside from Aug. 30-Sept. 17.