'Modelfest' recruits data experts to Dana-Farber cancer fight
Cancer scientists have a Big Data problem: there's too much of it.
Genome sequencing and other cutting-edge research tools generate a vast amount of data to be crunched and interpreted for precision cancer care – such as matching individual patients with the best drugs for their particular tumors.
However, says Dana-Farber Cancer Institute investigator Eliezer Van Allen, MD, PhD, the torrent of data on a single patient "can be so overwhelming that there's no way any clinician could imagine keeping up with it. And none of what we do matters if you can't communicate it to the clinician effectively."
This dilemma was the focus of the inaugural Big Data Modelfest 2015 conference held last month at the Yawkey Center. Big Data refers to sets of data that are too large or complex to be processed by traditional methods.
Modelfest was organized by Dana-Farber Trustee Sean Dobson together with his wife, Joslyn.
Dobson is CEO and chairman of the board of Amherst Holdings, an investment services company involved in real estate finance and widely known for its data and analytics. The meeting was designed to explore how powerful data analysis tools used in private industry can be applied to cancer research.
"We're here to discuss how private industry can share their knowledge and make sure our cancer doctors benefit from the work we do," said Dobson. "Our interest is in helping advance the war against cancer."
In addition to Van Allen, the private sector representatives heard from Giovanni Parmigiani, PhD, chair of Biostatistics and Computational Biology at Dana-Farber. Parmigiani described several research projects he and his colleagues are working on that generate large data sets.
In one project, individuals' family medical histories are being integrated with results of genome testing to calculate cancer risks. Another project aims, through the use of "adaptive clinical trials," to learn more quickly which drugs are effective in which groups of patients.
Next, the data experts took their turn.
Peter Wang, cofounder and president of Continuum Analytics of Austin, Texas, described how his company uses tools based on the Python programming language to analyze large amounts of data and create models. "Our goal is to empower scientists to explore their data, analyze their problems, and share their results," he said.
Wei Li, chief scientist at NetBase Solutions of Mountain View, Calif., said his company mines huge quantities of data from social media and other Web content for market analysis. Using Natural Language Processing, he said, "We can extract emotions and behaviors from text" to discover, for example, what percentage of people on Twitter and Facebook like or dislike the iPhone 6.
One of the leading providers of analytics and predictive models is 1010data Inc., of New York, whose chief analyst, Afshin Goodarzi, spoke at Modelfest. He said the company's services focus on building models to predict business and finance behavior related to real estate and retails sales. 1010data's analytical platform can quickly process tables with as many as 20 trillion rows of data, he said, and added, "companies like us may be able to help" cancer researchers.
Following the presentations, Dobson asked the experts to "bring your organizations' abilities to problems faced by Dana-Farber." As motivation, he said his company will sponsor a prize for the best specific collaboration between a data company and Dana-Farber in 2015. The winner's name will be posted in a Computational Biology office in the Longwood Center.
"This is a call to arms," Dobson declared. "We need to solve a big problem in the fight against cancer."