Broad Institute Sees Key Role for AI and Machine Learning in Medicine
The Broad Institute was launched in 2004 to seize the opportunity offered by the Human Genome Project, the international research effort to identify and map all of the genes of the human genome. Its founding scientists hoped a research institution built to focus on the genetic basis for disease could drive great progress in medical understanding, diagnosis, and treatment.
Two Broad employees Puneet, the leader of Machine Learning for Health Group of Broad; and Dave, who works as a CISO, recently joined me on my podcast. We talked about how AI can help advance studies of the human genome and drive advances in technologies like CRISPR.
We touched on the tremendous progress that’s been made in genomic studies in the past two decades. Consider that the Human Genome Project cost $3 billion, but today you can have your genetic map decoded for about $1,000. “Everybody’s moving crazy fast,” Puneet said to me. “I feel like a lot of what we’re trying to do is tread water and pull the threads together. On the machine learning side, we’re seeing developments and architectures move really fast. In the disease space, we’re seeing rapid evolution in fields like cardiovascular illness, all driven by genomics.”
Teasing Out Risk with Machine Learning
The Broad Institute is working to develop polygenic risk scores, in which a decoded gene can be used to predict the risk of future disease. “There are signatures on somebody’s genome that might tell you whether somebody is predisposed to a large left ventricle, which might predispose them to sudden cardiac death,” Puneet said. “That’s the name of the game right now, using clinical data or various single cell assays and then connecting them back to the genome to understand what the links might be.”
That leads to what Puneet described as the best and worst worlds of machine learning: massive amounts of data but much of it misleading. He cites a study that correlated visits from a chaplain with hospital patients’ rates of death. “That is clearly not causal, but if you throw an unsupervised learning algorithm at it,” that’s what you get,” he said.
AI can sift through false leads and clean up the data to find meaningful correlations.
Keeping Data Secure
David describes some of the Broad Institute’s collaborators as “honest…but curious.” These are researchers with legitimate access to data, not malicious nation states, but people who nonetheless might use data in ways not approved by Broad. “People might do things we didn’t expect,” he says. “Or export it to another country.”
That’s why the security of personal medical data is a big concern. “It’s a delicate balance,” David said. “We can’t build a wall around our stuff and say, ‘No one can access this.’ The challenge is, how do we give people access in the right way, in the right context, without really restricting the science that can happen from a collaboration using this kind of data? We’re very explicit about the security you must have to host and manage this data,” says David.
The Promise of CRISPR
CRISPR, the groundbreaking gene-editing technique, is central to the Broad Institute’s work. It also raises ethical concerns about safety, consent, and fears of eugenics or “designer babies.”
David believes its real promise is in curing or preventing single-gene illnesses. “I’m an Ashkenazi Jew, and Tay-Sachs (a rare inherited disorder that progressively destroys nerve cells) is a known disease within this population,” David said. “I can see a time when it will be possible to almost vaccinate a child against potential diseases.”
Both Puneet and David see some exciting breakthroughs on the horizon. For example, Puneet anticipates algorithms that can be deployed to a smartwatch to proactively predict a heart attack.
“We’ve got the data we need,” he said. “I think we’re close to crossing a threshold for prediction in clinical tasks. That will really make people sit up and take notice.”
For David, it’s the promise of highly sophisticated data security methods that will allow scientists to collaborate more effectively. “The idea of that is that if I have a bunch of genomic information and you have a bunch of genomic information, we can compute against each other’s genomic information, find variants, find anomalies, find things like that, without revealing each other’s data.
“So, it becomes less about, ‘Who do I trust? How do I share this data?’ and more about, ‘Hey, let’s just do science. Don’t worry about it,’ because we’ve taken care of all that magical data-sharing stuff without sharing the data.”