Lab's Data Science Institute brings best minds in AI, machine learning under one umbrella

April 3, 2018
data science institute

Lab scientists, from left, Timo Bremer, Ana Kupresanin and Michael Schneider discuss the Data Science Institute's varied disciplines. Photo by Julie Russell/LLNL (Download Image)

Lab's Data Science Institute brings best minds in AI, machine learning under one umbrella

Jeremy Thomas, thomas244@llnl.gov, 925-422-5539

Machine learning. Deep learning. Artificial intelligence. Computer vision. Big data analytics.

These aren’t just techie buzzwords — they’re all areas of research that fall under the sweeping term “data science.” So how does a national laboratory, with researchers exploring all of these areas and more, coalesce these disciplines into a unified group?

Launched earlier this year, the Data Science Institute (DSI) is a Lawrence Livermore National Laboratory (LLNL) initiative designed to bring together myriad topics considered “data science” under one umbrella, establishing a centralized hub for education, discussion, collaboration and for building a workforce pipeline targeting soon-to-be college graduates.

The institute, which began in January after about a year of planning, casts a wide net over the 150-plus data scientists at the Lab. It oversees the Data Science Summer Institute (DSSI) internship program for Ph.D. students, postdoctoral programs, a seminar series, reading groups, weekly working groups, internal talks on papers and invited lectures on machine learning, computer vision and other emerging technologies.

“It’s been a rush of activity so far,” said LLNL computer scientist Mike Goldman, DSI’s director. “We’re learning what we do and getting our feet on the ground. We’re really trying to foster a community at the Lab. We tend to get isolated in our own areas. It’s hard to get out sometimes and see what everybody else is doing. We want to be that resource.”

The areas that fall under data science are as varied as its scientists. LLNL physicist Michael Schneider is performing astronomy surveys using data gathered from the Large Synoptic Survey Telescope (LSST) to search for asteroids and clues to the origin and evolution of the universe.

The DSI, Schneider said, is not only a place to share ideas and brainstorm projects, it’s also about educating LLNL’s current workforce, providing tools for data scientists who want to expand their knowledge base and explore other applications.

“It’s a value to the workforce of adding job skills,” Schneider said. “We’re hopeful it becomes a job perk that you can join and pass for career development. It makes it more interesting to work here.”

Goldman and Schneider are joined on the Data Science Council by Peer-Timo Bremer, Dan Faissol, Barry Chen and Ana Kupresanin. The Institute already has 300 people on its mailing list, an active Confluence page, blog and Slack channel and is exploring other ways to engage and encourage Ph.D. students into the unique work and broad range of computational approaches not available anywhere else.

“I’m especially excited to kick off our broader seminar series, reading groups and informal tech exchanges,” Chen said. “I’m always energized by the enthusiasm our data scientists have for their work. Through DSI, they will have many forums in which to share their work as well as to learn about the exciting ways others are applying and extending data science to support the Lab’s mission.”

Bremer said because data science is so “cross-cutting” and difficult to define, the DSI was a necessary step, not just for the benefit of Lab scientists, but for those outside the gates as well. He pointed to large-scale collaborative projects such as CANDLE (CANcer Distributed Learning Environment) and ATOM (Accelerating Therapeutic Opportunities in Medicine), which seek to use machine learning and data analytics to better understand cancer, enhance precision medicine and speed up drug development; and the ADAPD (Advanced Data Analytics for Proliferation Detection) projects, aimed at developing advanced machine learning algorithms to help detect nuclear proliferation indicators earlier, more thoroughly and more robustly than ever before. The algorithms will combine cues from multiple sensor measurements to detect patterns too subtle or too noisy to be detected from any one sensor alone.

“It’s important for people (outside the Lab) to know what we do,” said Bremer. “Very few people realize how much impactful data science is going on here. We need to get out there and say, we’re not just doing simulation — here you can predict how cancer works, and for a lot of people that’s more interesting. Yes, we’re doing closed-site work, but there’s a wealth of problems in bio, in health and space that are rare outside the lab complex.”

Other projects, Goldman said, are ongoing applying data science to nuclear nonproliferation, research in materials science using machine learning to predict performance, cognitive simulation for stockpile stewardship, basic science, advanced manufacturing and energy and climate security.

"The problems we work on here are unique and challenging and difficult, and we use data you’re not going to see anywhere else,” Goldman added. “Also, you’ve got the computing power here to utilize that data.”