Reduce Hospital Readmission Risk

Healthcare Clinical Patient Experience Improve Health Outcomes Reduce Risk Augmentation Binary Classification End to End Other Power BI
Proactively reduce 30-day readmissions rate by predicting in advance which patients are likely to readmit and understanding the top reasons why. Proactively identifying hospital readmittance means increasing quality of care, decreasing costs, and improving the lives of patients.
Request a Demo


Business Problem

A readmission occurs when a patient is readmitted into the hospital within 30 days of previously being discharged. Readmissions are not only a reflection of uncoordinated healthcare systems that fail to sufficiently understand patients and their conditions, but they are also a tremendous financial strain on both healthcare providers and payers. In 2011, the United States Government estimated there were approximately 3.3 million cases of 30-day all-cause hospital readmissions, incurring healthcare organizations a total cost of $41.3 billion.

The foremost challenge in mitigating readmissions is accurately anticipating patient risk from the point of initial admission to discharge. Although a readmission is caused by a multitude of factors, including a patient’s medical history, admission diagnosis, and social determinants, the existing methods (i.e., LACE and HOSPITAL scores) used to assess a patient’s likelihood of a readmission are unable to effectively consider the variety of factors involved. By only including a limited amount of considerations, these methods result in suboptimal health evaluations and outcomes.

Intelligent Solution

AI provides clinicians and care managers with the information they need to nurture strong, lasting connections with the patients they care about. AI helps reduce readmission rates by predicting which patients are at risk and allowing clinicians to prescribe intervention strategies before and after the patient is discharged. Unlike existing methods, AI models can ingest significant amounts of data and learn complex patterns behind why certain patients are likely to readmit. With advancements in model interpretability, AI offers personalized explanations for all its predictions, giving clinicians complete transparency of the top risk drivers for every single patient at any given time.

By taking the form of an artificial clinician and augmenting the care they provide, along with other actions clinicians already take, AI enables them to conduct intelligent interventions to improve patient health. Using the information they learn from AI, clinicians can decrease the likelihood of patient readmission by carefully walking through their discharge paperwork in-person, scheduling additional outpatient appointments (to give them more confidence about their health), and providing additional interventions that help reduce readmissions.

Value Estimation

What has ROI looked like for this use case? 

“[DataRobot] easily outperformed the LACE model with a 5% reduction in readmissions in the first quarter of the year.”—KLAS Report

Post-Acute Care Center [Anonymous]—$500k in cost savings by reducing readmissions. 

How would I measure ROI for my use case? 

Current Cost of Readmissions  =  Current readmissions annual rate x Annual hospital inpatient discharge volumes x Average cost of a hospital readmission

New Cost of Readmissions = New readmissions annual rate x Annual hospital inpatient discharge volumes x Average cost of a hospital readmission 

New Cost of Readmissions – Current Cost of Readmissions = ROI 

Value Estimates (Top-Down Calculation) 

Current costs of readmissions x improvement in readmissions rate = ROI 

Calculating top down cost of readmissions for each healthcare provider is  $41.3 billion / 6,210 US providers = ~$6.7 million

Technical Implementation

About the Data

For illustrative purposes, we are going to be using a sample dataset provided by a medical journal that studied readmissions across 70,000 inpatients with diabetes. The researchers of the study collected this data from the Health Facts database provided by Cerner Corporation, which is a collection of clinical records across providers in the United States. Health Facts allows organizations that use Cerner’s electronic health system to voluntarily make their data available for research purposes. All the data was cleansed of PII in compliance with HIPAA. 

Problem Framing

The target variable for this use case is whether or not the patient readmitted to the hospital (Binary: True or False, 1 or 0, etc.). This choice in target makes this a binary classification problem.

The features below represent key factors that are important in predicting readmissions. They encompass each patient’s background, diagnosis, and medical history, which will help DataRobot find relevant patterns across the patient’s medical profile to assess their re-hospitalization risk.

Beyond the features listed below, we suggest incorporating any additional data your organization may collect that could be relevant to the use case. As you will see later, DataRobot is able to quickly differentiate important vs unimportant features. 

These features are generally stored across proprietary data sources available in your EMR system: Patient Data, Diagnosis Data, Admissions Data, and Prescription Data. Examples of EMR systems are Epic and Cerner.

Other external data sources that may also be relevant include: Seasonal Data, Demographic Data, and Social Determinants Data.

Sample Feature List   

Feature Name Data Type Description Data Source Example
Readmitted Binary (Target) Whether or not the patient readmitted after 30 days Admissions Data False
Age Numeric Patient age group Patient Data Female
Weight Categorical Patient weight group Patient Data 50-75
Gender Categorical Patient gender Patient Data 50-60
Race Categorical Patient race Patient Data Caucasian
Admissions Type Categorical Patient state during admission (Elective, Urgent, Emergency, etc)  Admissions Data Elective
Discharge Disposition Categorical Patient discharge condition (Home, Home with Health Services, etc.)  Admissions Data Discharged to home
Admission Source Categorical Patient source of admissions (Physician Referral, Emergency Room, Transfer, etc.)  Admissions Data Physician Referral
#Days in Hospital Numeric Length of stay in hospital Admissions Data 1
Payer Code Categorical Unique code of patient’s payer  Admissions Data CP
Medical Specialty Categorical Medical specialty that patient is being admitted into  Admissions Data Surgery-Neuro
#Lab Procedures Numeric Total lab procedures in the past Admissions Data 35
#Procedures Numeric Total procedures in the past Admissions Data 4
#Outpatient Visits Numeric Total outpatient visits in the past Admissions Data 0
#ER Visits Numeric Total emergency room visits in the past Admissions Data 0
#Inpatient Visits Numeric Total inpatient visits in the past Admissions Data 0
#Diagnosis Numeric Total diagnosis Diagnosis Data 9
ICD10 Diagnosis Code(s)  Categorical Patient’s ICD10 diagnosis on their condition; could be more than one (additional columns)  Diagnosis Data M4802
ICD10 Diagnosis Description(s) Categorical Description on patient’s diagnosis; could be more than one (additional columns) Diagnosis Data Spinal stenosis, cervical region
#Medications Numeric Total number of medications prescribed to the patient Prescription Data 21
Prescribed Medication(s) Binary Whether or not the patient is prescribed to a medication; could be more than one (additional columns) Prescription Data Metformin – No

Data Preparation 

The original raw data consisted of ~74 million unique visits that include ~18 million unique patients across ~3 million providers. This data originally contained both inpatient and outpatient visits, as it included medical records from both integrated health systems and standalone providers. 

While the original data schema consisted of 41 tables with 117 features, the final dataset was filtered on relevant patients and features based on the use case. The patients included were limited to those with: 

  • Inpatient encounters 
  • Existing diabetic conditions
  • 1–14 days of inpatient stay
  • Lab tests performed during inpatient stay (or not)
  • Medications were prescribed during inpatient stay (or not) 

All other features were excluded due to lack of relevance and/or poor data integrity. 

Watch this end-to-end demo to get a better understanding of the DataRobot AI Platform. Learn how to connect DataRobot to your data source, perform feature engineering, follow best practice data science techniques, and more by visiting the DataRobot Documentation. Alternatively, sign up for our 30-day free trial to experience the full breadth of DataRobot capabilities.

Model Training 

DataRobot Automated Machine Learning automates many parts of the modeling pipeline. Instead of hand-coding and manually testing dozens of models to find the one that best fits your needs, DataRobot automatically runs dozens of models and finds the most accurate one for you, all in a matter of minutes. In addition to training the models, DataRobot automates other steps in the modeling process such as processing and partitioning the dataset.

We will jump straight to interpreting the model results.

For this use case we create one unified model that predicts the likelihood of readmission for patients with diabetic conditions. Each record in the data represents a unique patient visit. 

Interpret Results
  • By taking a look at the Feature Impact chart, we learn that a patient’s number of past inpatient visits, discharge disposition, and the medical specialty of their diagnosis are the top three most impactful features that contribute to whether a patient will readmit. (See here for more information about Feature Impact.)

  • In assessing the partial dependence plots to further evaluate the marginal impact top features have on the predicted outcome, we learn that as a patient’s number of past inpatient visits increases from 0 to 2, their likelihood to readmit subsequently jumps from 37% to 53%. As the number of visits exceeds 4 the likelihood increases to about 59%.  

  • DataRobot’s Prediction Explanations provide a more granular view to interpret the model results. Here, we see why a given patient was predicted to readmit or not, based on the top predictive features. (For more information on Prediction Explanations, go here).


For the prediction results to be intuitive for clinicians to consume, instead of displaying them as a probabilistic or binary number, they can can be post-processed into different labels based on where they fall under predefined prediction thresholds. For instance, patients can be labeled as high risk, medium risk, and low risk depending on their risk of readmissions.

Business Implementation

Decision Environment

After you are able to find the right model that best learns patterns in your data to predict readmissions, DataRobot makes it easy to deploy the model into your desired decision environment. Decision environments are the ways in which the predictions generated by the model will be consumed by the appropriate stakeholders in your organization, and how these stakeholders will make decisions using the predictions to impact the overall process. 

This is a critical piece of implementing the use case as it ensures that predictions are used in the real world for reducing hospital readmissions and generating clinical improvements. 

Decision Maturity 

Automation | Augmentation | Blend

At its core, DataRobot empowers your clinicians and care managers with the information they need to nurture strong and lasting connections with the people they care about most: their patients. While there are use cases where decisions can be automated in a data pipeline, a readmissions model is geared to augment the decisions of your clinicians. It acts as an intelligent machine that, combined with the expertise of your clinicians, will help improve your patients’ medical outcomes. 

Model Deployment

DataRobot will provide your clinicians with complete transparency on the top risk-drivers for every single patient at any given time, enabling them to conduct intelligent interventions both before and after the patient is discharged. For an overview of model deployment, see this DataRobot Documentation article.

The predictions can be integrated to other systems that are embedded in the provider’s day-to-day business workflow. Results can be integrated into the provider’s EMR system or BI dashboards. If the former, clinicians can easily see predictions as an additional column in the data they already view on a daily basis to monitor their assigned patients. They will be given transparent interpretability of the predictions to understand why the model predicts the patient to readmit or not. 

Some common integrations: 

  • Display results through an Electronic Medical Record system (i.e., Epic) 
  • Display results through a business intelligence tool (i.e., Tableau, Power BI)

For this use case, we show an example of how to integrate predictions with Microsoft Power BI to create a dashboard that can be accessed by clinicians to support decisions on which patients they should address to prevent readmissions. 

The dashboard below displays the probability of readmission for each patient on the floor. It shows the patient’s likelihood to readmit and top factors on why the model made the prediction. Nurses and physicians can consume a dashboard similar to this one to understand which patients are likely to readmit and why, allowing them to implement a prevention strategy tailored to each patient’s unique needs. 

Decision Stakeholders

Decision executors are the clinical stakeholders who will consume decisions on a daily basis to identify patients who are likely to readmit and understand the steps they can take to intervene. 

  • Nurses
  • Physicians
  • Care Managers

Decision managers are the executive stakeholders who will monitor and manage the program to analyze the performance of the provider’s readmission improvement programs. 

  • Chief Medical Officer
  • Chief Nursing Officer
  • Chief Population Health Officer

Decision authors are the technical stakeholders who will set up the decision flow in place.    

  • Clinical Operations Analyst
  • Business Intelligence Analyst
  • Data Scientists 
Decision Process

Thresholds can be set to determine whether a prediction constitutes a foreseen readmission or not. Assign clear action items for each level of threshold so that clinicians can prescribe the necessary intervention strategies.  

Low Risk: Send automated email or text that includes discharge paperwork, warning symptoms, and outpatient alternatives

Medium Risk: Send multiple automated emails or texts that include discharge paperwork, warning symptoms, and outpatient alternatives, with multiple reminders. Follow up with the patient 10 days post discharge through email to gauge their condition. 

High Risk: Clinician briefs patient on their discharge paperwork in person. Send automated emails or texts that include discharge paperwork, warning symptoms, and outpatient alternatives, with multiple reminders. Follow up with the patient on a weekly basis post discharge through telephone or email to gauge their condition. 

Model Monitoring

Decision Operators: IT/System Operations, Data Scientists 

Prediction Cadence: Batch predictions generated on a daily basis 

Model Retraining Cadence: Models retrained once data drift reaches an assigned threshold; otherwise, retrain the models at the beginning of every new operating quarter.

Implementation Risks
  • Fail to make prediction results easy and convenient for clinicians to access (i.e., if they have to open a separate web browser to the EHR that they are already used to or have information overload)
  • Fail to make predictions intuitive for clinicians to understand
  • Fail to help clinicians interpret the predictions and why the model thought a certain way
  • Fail to provide clinicians with prescriptive strategies to act on high risk cases 
Trusted AI

In addition to traditional risk analysis, the following elements of AI Trust may require attention in this use case. 

Target leakage: Target leakage describes information that should not be available at the time of prediction being used to train the model. That is, particular features make leak information about the eventual outcome that will artificially inflate the performance of the model in training. This use case required the aggregation of data across 41 different tables and a wide timeframe, making it vulnerable to potential target leakage. In the design of this model and the preparation of data, it is pivotal to identify the point of prediction (discharge from the hospital) and ensure no data be included past that time. DataRobot additionally supports robust target leakage detection in the second round of exploratory data analysis and the selection of the Informative Features feature list during autopilot.

Bias & Fairness: This use case leverages features that may be categorized as protected or may be sensitive (age, gender, race). It may be advisable to assess the equivalency of the error rates across these protected groups. For example, compare if patients of different races have equivalent false negative and positive rates. The risk is if the system predicts with less accuracy for a certain protected group, failing to identify those patients  as at risk of readmission. Mitigation techniques may be explored at various stages of the modeling process, if it is determined necessary.


Once patients leave the hospital, it can be much more difficult to impact their health. Many patients are difficult to contact and even more difficult to influence. At the point of readmission, most likely the patient’s health has declined even further. DataRobot models identify those patients that are likely to return to the hospital—whether due to a physical downturn, abusive relationship, or chronic disease—allowing healthcare providers to take action before the patient is discharged. Using patient information like diagnosis, length of stay, previous medical records and admissions, age, and other demographics, DataRobot models help prevent readmission, saving costs and improving quality of treatment. DataRobot makes it easy for hospitals to process extensive patient data and identify at-risk patients before they are discharged.

banner purple waves bg

Experience the DataRobot AI Platform

Less Friction, More AI. Get Started Today With a Free 30-Day Trial.

Sign Up for Free
Explore More Healthcare Use Cases
Healthcare companies are using machine learning and AI to increase top and bottom line through gaining competitive advantages, reducing expenses, and improving efficiencies. They are optimizing all areas of their business from readmission risk and occupancy rates to marketing, in order to make data-driven decisions that lead to increased profitability.