A group of researchers from Lawrence Livermore National Laboratory recently visited the Cancer Registry of Norway (CRN) in Oslo to discuss the progress of ongoing collaborations between the two institutions aimed at applying big data analytics to predicting cancer risk and mortality.
The U.S.-Norwegian partnership began with a project headed by LLNL researcher Ghaleb Abdulla to use cervical cancer screening data obtained from Norway to develop personalized cancer screening. A second project called Multitask Learning for Cancer seeks to develop algorithms able to predict cancer occurrence and five-year survival rates by combining data from a broad range of cancer types.
“Cancer research and practice are very siloed, but even seemingly disparate cancer types may share underlying similarities or causes,” explained project principal investigator Ana Paula Sales. “For example, cervical cancer is caused by the Human Papilloma Virus (HPV), but HPV also is responsible for about 30 percent of head and neck cancers. Our goal is to develop algorithms that combine data from various cancer types in intelligent ways to leverage such commonalities, thus improving prediction performances.”
During the workshop, the LLNL researchers presented their work to U.S. Ambassador to Norway Kenneth Braithwaite, Oslo Cancer Cluster General Manager Ketil Widerberg, Director-General of the Research Council of Norway John-Arne Røttingen and Elizabeth Weiderpass of the International Agency for Research on Cancer, among others. Sales presented results from the multitask learning project’s first year, where she and her team of researchers, including Braden Soper, Dave Widemann, Priyadip Ray and Andre Goncalves have looked at models predicting cancer development and outcomes.
“This type of problem [cancer] will expand our data science toolset in detecting rare events in noisy and incomplete data sets at scale,” said LLNL’s Director of Innovation Jason Paragas. “While pushing the frontiers of data science, this approach may provide a novel toolset to reduce over treatment and bring earlier detection of the too many cancers that touch us too frequently.”
Initial results using data from the National Cancer Institute’s Surveillance, Epidemiology and End Results (SEER) program showed improvements in survival prediction for HPV-related cancers from 8 percent to 37 percent better than single-task learning methods that look at one type of cancer at a time. Preliminary results using data from CRN also are promising, Sales said.
“We’re seeing a reduction in errors when you combine this data across the board. At different levels we see an improvement in (multitask learning) above the others,” Sales said. “It’s encouraging. Over time, we hope to continue to improve risk prediction and get a sense of what features are important.”
Sales and her team are using comprehensive cancer diagnosis and lifestyle data from CRN on 30,000 Norwegian women, including information on sexually transmitted diseases, smoking, sexual activity and other risk factors over an 11-year span. About 1,000 of the women developed cancer during that time, providing valuable insight into factors that increase cancer risk, Sales said.
The collaboration between LLNL and CRN dates to initial discussions in 2015 and has continued to grow stronger. Abdulla’s project began the following year, and in 2017, Sales’ project began. That same year, CRN researchers Mari and Jan Nygard spent a six-month sabbatical at LLNL, a stay that included a workshop held at the Laboratory on big data challenges for precision medicine, with an emphasis on cancer.
“Given that we come from completely different domains (cancer vs. data science), our close interactions, including weekly video conferences and frequent in-person meetings, have been vital in allowing us to build a common language, and in essence, build the success of the collaboration,” Sales said.
The CRN team visited LLNL three times in 2018. Last month’s workshop was the first time the LLNL team had visited Norway. During the visit, CRN researcher Mari Nygard presented results from the collaboration with Abdulla’s cervical cancer project. Abdulla said that work is progressing and over the summer the team hired three summer interns to help explore different approaches to model the data. The students gained experience with real-life medical data and machine learning algorithms, Abdulla said, and in addition to their technical contributions to the project, the students are getting valuable exposure to the work done at LLNL and are potential strategic future hires.
Abdulla said one of his students investigated recurrent neural networks to model time-series data and predict a woman’s maximum risk of developing cervical cancer. The experiments were performed on a data set that was compiled and cleaned using two data sets from Norway.
“The model showed good results compared to other modeling approaches,” Abdulla said. “In addition, the implementation was very simple given the available technical libraries that allowed us to test different approaches to predict the risk.”
Another student, Abdulla said, looked at a new set of high-level statistical programming languages and explored the possibility of implementing one of the advanced modeling approaches used for the project. The third student attempted to use advanced Bayesian modeling techniques to cluster the patients into categories of normal, low-risk, high-risk and cancer.
Working with institutions in Norway is particularly exciting, Sales said, because the country collects patient data in a centralized way, as opposed to the U.S., which has many different health care providers, making it difficult to obtain a complete picture of patients for entire populations. Sales said she left Oslo with “a sense of excitement” about the future of the work.
“This was a great opportunity for us to bring it all together and discuss our results with subject-matter experts as well as stakeholders,” Sales said. “We have a strong sense of support from both institutions and governments and it seems like we’re at a point where everything is coming together nicely. It was a good way to conclude the first year of our project.”
Sales’ three-year project is funded by a Laboratory Directed Research and Development program. Sales said the team’s future goal is to incorporate more data into the algorithms, such as prescription drug and HPV vaccination and testing. Abdulla is working with other team members to help them write proposals to continue and expand the current work.