Laboratory looks to expand data science pipeline through internship program

July 3, 2017
data science

Through the Data Science Summer Internship (DSSI) program, formerly known as Data Heroes, graduate and undergraduate students from all over the country are being mentored by Lab employees in machine learning, deep learning and big data analytics, among other disciplines. Photo by George Kitrinos/LLNL (Download Image)

Laboratory looks to expand data science pipeline through internship program

Jeremy Thomas, thomas244@llnl.gov, 925-422-5539

This summer, college students from all over the country are at Lawrence Livermore National Laboratory to be mentored by experts in machine learning, deep learning and big data analytics under the newly rechristened Data Science Summer Institute (DSSI) program.

Previously known as Data Heroes, the program has expanded to host 24 paid student interns this year, and for the first time, will welcome a guest lecturer, a statistics professor from Virginia Tech University. The Computation and Engineering directorates partnered to provide the program.

"The DSSI program is structured to give students a better understanding of the problem space the Lab occupies," said LLNL computer scientist Goran Konjevod of Engineering. "They get to work with data they might not get to see at universities and interact with experts at the Lab. We see this as a way to make it better understood in academia what the Lab does, and it gives us a lead to great hires in the future."

Now in its third year, the program lasts 12 weeks and has internships overlapping over the course of the summer. During their stay, the students split their time tackling Lab-centric projects and challenge problems, as well as taking courses taught by Lab employees on subjects such as deep learning, Bayesian basics, predictive modeling and statistical computing.

"We're really trying to turn this into a summer university for data science-minded students," said LLNL computer scientist Mike Goldman, who heads the Computation side of the program. "The ultimate goal is to establish a pipeline and hopefully treat it as an extended job interview. We're trying to give students the experience of working on different unique problems that expand out of their comfort zone, to let them know that being flexible is a huge benefit to have in their careers. Data science covers a lot of different disciplines and we want them to learn as much as possible."

Each student has a main project they work on half the time with their mentor and spends the other half reading scientific papers, attending seminars or working on challenges with other students. Jose Cadena, a data science intern from Virginia Tech University, has taken part in the program all three years and is working on finishing a project analyzing traffic data to study changes or anomalies in patterns. He said he's looking forward to expanding his knowledge base in machine learning and neural networks and would like to get hired at the Lab after he finishes his Ph.D. later this year.

"This is a different experience than you get in grad school," Cadena said. "The classes are valuable and the Lab is doing a good job of providing courses in data mining and data science. Coming to an internship here lets you expand into projects you wouldn't otherwise have access to."

University of California, Davis grad student Irene Kim, who is working on her Ph.D. in statistics, just began her second summer internship at LLNL. She returned to get more experience in computing and perform data analysis using the Lab's unique tools, such as computer clusters.

"I can get more experience in applications that I can't get in school," Kim said. "Because of the computation resources that aren't always available outside the Lab, everything is a lot faster and you can try out a lot of different things at once. Here, you're dealing with real data, so in that sense it's more fun."

The program continues until Aug. 28. Jim Gansemer, who helped start the institute as Data Heroes and now plays a supporting role, said its success is "ultimately going to be measured in terms of hiring, but also serves to increase awareness of the diversity of applied research at the Lab in areas like climate, energy, and computational biology."

Gansemer added those goals are only achievable through the mentorship the DSSI scholars receive. The DSSI program always is looking for more Lab mentors with real-world problems in need of solutions to help expand future internship opportunities, he said.

For more information and to learn about becoming a Data Science Summer Institute mentor, visit the web.