Data Heroes — able to leap stacks of data with a few lines of code — were in residence at Livermore this summer. With access to the Computation Directorate’s faster-than-a-speeding-bullet, more-powerful-than-a-locomotive high-performance computing systems, 20 interns came to Livermore to learn about and work on significant data science problems.
Students in the inaugural class of Livermore’s Data Heroes intern program spent their summer taking classes about data science and high-performance computing, working with advisers on specific projects tailored for them, and participating in a big-data programming competition on kaggle.com. Ranging from undergraduate through doctorate degree programs in such fields as applied math, computer science, computer vision, machine learning and statistics, the participants came from all over the Unied States, and some had international backgrounds.
"The question is, how can we do better data-driven simulations of science?" says Ghaleb Abdulla of the Center for Applied Scientific Computing. Data science, or colloquially, big data, is one of the hottest areas of computing science and applied mathematics right now because such large amounts of data are becoming available.
"There are lots of new classes of data-centric applications," he notes, "including biomedicine and energy, and even the applications that we use at the National Ignition Facility [NIF]…We would like to be able to extract, move, manage and analyze data much better than we do now. In the future, we’ll see much more integration of the science with data — more large scientific applications with large datasets."
After witnessing the success of the Cyber Defenders cybersecurity internship program, Abdulla proposed the idea of the Data Heroes program because he felt that a focus group in data science would provide special benefits to the Laboratory. "We could train students in data science so that they could use their skills both here or at other jobs, and we could provide a hiring track — some students might decide that they wanted to come back to Livermore and be employees," says Abdulla. He and Jim Gansemer of Engineering are managing the program, with support from Engineering’s Eric Xun Wang.
For several of this summer’s participants, such as Jose Cardenas, the work they did at Livermore is directly related to their Ph.D. thesis work, so they’ll be applying their new knowledge immediately. Cardenas expects to stay in touch with his mentor as he continues his degree program.
For others, the program opened them to new possibilities. Undergraduate Zachary Canann says that what he learned this summer will help him with classes he’ll take in the coming year, and will inform his choice of what to pursue when he applies for graduate school. First-year graduate student Deepali Aneja says, "I’m working on a problem here that’s very different from what I’m doing at my university program. This internship gave me a great opportunity to learn about deep learning." She says her experience in the Data Heroes program will help her choose a thesis topic.
One of the program’s activities, mentored by Engineering’s Corey Lanker, involved solving a data science problem posted to Kaggle.com. Companies and institutions that need big data problems solved set up competitions there, and offer a reward for the best solution. Data scientists around the world compete as groups or individuals to solve these problems.
The project chosen for the Data Heroes was to classify handwritten digits using Modified National Institute of Standards and Technology (MNIST) data. Three teams and one individual from the Data Heroes program worked on solutions, and all did very well in the scoring. As of mid-August, two of the three teams and the individual were in the top 50 out of more than 650 teams, and Data Hero Jordan King’s team was in 12th place. These teams were using a "deep learning" approach. A third team was using learning trees, and they were scoring well among those adopting this approach.
"Accurately recognizing handwritten digits is a longstanding problem," says Wang, "and MNIST is considered one of the primary benchmarks of machine learning and vision. The top of the leader board is hotly contested by teams from all over the world."
The ultimate value to the program’s participants may be in getting the chance to see the world differently. "I learned new techniques and new ways of approaching problems," says Mallory Walsh. Kaushik Suresh Kasi was inspired to look further afield: "Before this program, I didn’t realize how much mathematics and statistics was behind data science," he says, adding that he is thinking of doing additional studies in statistics.
"It gives you a real feeling for what it’s like to working in this field, says Aneja. "It’s nice to see researchers here who like what they’re doing and are happy in their lives."
For more about data heroes visit the Computation website.
By Allan Chen/LLNL
johnston19 [at] llnl.gov