Building scientific insight through machine learning

machine learning (Download Image)

An illustration of the collection of targeted information extracted from manuscripts including chemicals, experimental sentences, nanomaterial compositions and morphologies. The statistics of collected information is displayed using a computer visualization tool. Image by Veronica Chen/LLNL.

A team of Lawrence Livermore National Laboratory (LLNL) materials and computer scientists developed machine learning tools that extract and structure information from the text and figures of nanomaterials articles using state-of-the-art natural language processing, image analysis, computer vision and visualization techniques.

They are applying this technique to COVID-19 literature to determine if information extracted from that large body of literature can aid in accelerating COVID-19 research.

Nanomaterials are widely used at LLNL and in industry for many applications from catalysis to optics to additive manufacturing. The combination of nanomaterials’ shape, size and composition can impart unique optical, electrical, mechanical or catalytic properties needed for a specific application. However, synthesizing a specific nanomaterial and scaling up its production is often challenging because a small change in the process or the addition of a specific chemical can have a dramatic effect on what is made. These effects are only discovered by time-consuming trial-and-error experimentation and reading reported experiments in the scientific literature, which together build experts’ intuition.

The new machine learning tools have enabled the creation of a personalized knowledge base for nanomaterials synthesis that can be mined to help inform further development. Starting with approximately 35,000 nanomaterials-related articles, the team developed models to classify articles according to the nanomaterial composition and morphology, extract synthesis protocols from within the article text and extract, normalize and categorize chemical terms within synthesis protocols.

“These tools and information can be used to identify trends in nanomaterials synthesis, such as the correlation of certain reagents with various nanomaterial morphologies, which is useful in guiding hypotheses and reducing the potential parameter space during experimental design," said Anna Hiszpanski, the lead author of the paper appearing in the Journal for Chemical Information and Modeling.

In addition to processing articles’ text, microscopy images of nanomaterials within the articles also are automatically identified and analyzed to determine the nanomaterials’ morphologies and size distributions. To enable users to easily explore the database, a complementary browser-based visualization tool also was developed that provides flexibility in comparing across subsets of articles of interest.

“While the focus of our work was on the synthesis of nanomaterials, the framework in terms of model selection and training methodology can be used for extracting other targeted data of interest from scientific articles in other materials and chemistry subfields and beyond,” said principal investigator of the project Yong Han.

Other LLNL contributors include: Brian Gallagher, Karthik Chellappan, Peggy Li, Shusen Liu, Hyojin Kim, Jinkyu Han, Bhavya Kailkhura and David Buttler. This work was funded by LLNL’s Laboratory Directed Research and Development program.