Why the Pharmaceutical Industry Needs Automated Machine Learning

November 30, 2018
· 3 min read

While there are pharmaceutical drugs available to treat every type of ailment, the long drug development process creates massive obstacles for the industry, not to mention for those in need of treatment. Bringing a drug to market can take up to 15 years with a cost that can reach billions of dollars. Medical advances are continually changing the industry and by the time a drug goes to market, it’s no longer providing the best solution.  

From a biological perspective, while many genes have been identified as disease-causing, only a few of them have become targets for approved therapeutics.  Once a disease-causing gene has been identified, it’s tested to see if it has an appropriate shape suitable for drug binding, and that any potential drug will have a selective (won’t bind to another gene) and enduring therapeutic effect. This process takes time to get right and to ensure that the correct gene-to-drug combination is made. 

From a chemical perspective, a potential drug compound may work well in the first few stages of the drug development cycle. This is seen in laboratory and mouse models. However, further down the line, when the dosage is increased for human consumption, the compound may prove to be toxic or have other adverse effects. To have such promising results in testing only to have it fail further down the line is both frustrating and costly, to say the least.

Yes, it’s important to make sure that drugs are properly funded and tested before being made available to the public, but this process is costly and inefficient. Because of this, it is important that any potential drug “fails fast” in the drug development cycle or preferably is not selected for further experimentation in the first place.


The value of automated machine learning for drug development

There has been a massive increase in the amount of data generated from high-throughput compound screening and DNA Sequencing technologies as they become widely available to the pharmaceutical community. For this data to be useful, it needs to be filtered down in a systematic way so chemical compounds and genes can be prioritized and selected for further investigation in a wet laboratory. By using historical compound and genetic experimental data together with clinical patient data, automated machine learning may be used to predict chemical activity, genetic druggability, best treatment options for a particular patient, and more.

How automated machine learning helps manage these challenges

Automated machine learning has been used for all stages of the drug development cycle including:

  1. Prediction and prioritization of genes that may be causative of a specific disease using gene expression and gene copy number data.

  2. Prediction of genetic druggability.

  3. Identification of biomarkers for exacerbation events in asthma.

  4. Prediction of drug combinations for the treatment of cancer using the protein-protein interaction network.

  5. Selection of the most appropriate clinical trial for a patient. 

In general, automated machine learning has reduced the search space of compounds and genes, providing the user with genes and compounds to prioritize for wet laboratory experiments. The speed and productivity align with the philosophy of “fail fast” and helps to limit the chance of failure later in the drug development cycle. This also ensures that the machine learning models are built correctly to avoid overfitting the data.

The future of work for machine learning in pharmaceuticals

Unlike the financial industry where machine learning models are the ones being closely regulated, it’s the biological and chemical processes that undergo strict protocols. How models chose the gene or compound is not as important.

However, this doesn’t mean that jobs will be at risk. Automated machine learning tools, like DataRobot, give biologists, chemists, and even medical professionals the ability to build solid and trustworthy models. And, there are now specific higher education degrees for Computational Biology, Computational Chemistry, and Bioinformatics as well as dedicated computational departments in Pharmaceutical companies. A cross-disciplined field will provide new and specialized jobs!


We’re attending RE•WORK’s Deep Learning in Healthcare Summit! Meet team DataRobot, expand your network, and discover the latest trends and best practices. 

New call-to-action

About the Author:
Amanda Schierz is a London-based Data Scientist at DataRobot researching solutions for various customer problems. Prior to joining DataRobot, Amanda worked as a Sr. Computational Biologist at The Institute for Cancer Research and Sr. Lecturer at Bournemouth University. Amanda has a PhD in Knowledge Management from the University of Surrey in text mining for innovation management.

About the author
Amanda Schierz
Meet Amanda Schierz
  • Listen to the blog
  • Share this post
    Subscribe to DataRobot Blog
    Newsletter Subscription
    Subscribe to our Blog