Predicting Days-to-Recovery of COVID-19 Patients

May 8, 2020
by
· 5 min read

Our Objective

We want to predict the days-to-recovery (or days-to-discharge) of a COVID-19 patient at time of diagnosis, especially if it is a mild case. While COVID-19 is a global pandemic, thousands have recovered from it and are no longer infectious. For many who may not be feeling sick despite being COVID-19 positive, one of the key questions on their mind will be, “How long will it take for me to be discharged and get back home?” 

It is important to note that days-to-recovery prediction is especially relevant in countries where the policy is virus containment. These predictions can help public health experts to track COVID-19 positive citizens and their contacts, test the community frequently, and put everyone who tested positive into isolation at a hospital or community facility until they fully recover. 

Motivation

We used South Korea as a specific example because, as of April 30, 2020 and to the best of our knowledge, around 85% of 10,793 confirmed infected patients have recovered1 and the project Data Science for COVID-19 (DS4C) has the best publicly available COVID-19 patient-level data (hosted in Kaggle) compiled from Korea Centers for Disease Control and Prevention and local governments2.

There are 3,388 patients in the DS4C data, of which 1,327 have recovered. Patient data, such as age and location, is readily available, as is patient travel history, the infection cluster or group to which a patient belongs, regions and daily weather in South Korea, and population behavioral data, such as daily search engine trends and Seoul’s floating population numbers. Government policies on COVID-19 have recently been added. Time-series data on daily confirmed infected, recovered, and death cases is also made available.

For better data understanding, Patient 31 is well-known as the first person from Shincheonji church in Daegu to test positive for COVID-19, and most of South Korea’s COVID-19 patients are from this church and province3. From DS4C patient data, we know that she is 61 years old and came into contact with 1,160 people. She was confirmed to be infected on February 18th and is currently isolated  and has not yet recovered. From DS4C travel history data, Patient 31 went to 15 places (hospital, church, lodging) in Daegu between February 6 and 17. For DS4C infection cluster data, we observe that Shincheonji church in Daegu has 4,510 confirmed infected, and there are 13 other Shincheonji churches in other provinces that have 682 confirmed infected.

Approaches and Results

Feature D2R Histogram 1
Feature ‘D2R’ Histogram

The figure above shows, after removing outliers (below 10 and above 40), it takes an average of about 22 days for someone in South Korea to recover from COVID-19, with a standard deviation of seven-and-a-half days.

We tried two approaches to predict days-to-recovery, by automatically building models (with DataRobot’s default settings) using:

COVID 19 Modeling Approaches
Approaches to Predicting Days-to-Recovery for COVID-19

The second approach, which uses features from several tables, is promising for days-to-recovery prediction. Although accuracy is not high, it can be improved with updated monthly versions of DS4C data, and more diverse models and model tuning.

eXtreme Gradient Boosted Trees Regressor with Early Stopping Poisson Loss Effect Size Holdout province Feature Effects
eXtreme Gradient Boosted Trees Regressor with Early Stopping (Poisson Loss) (Effect Size) (Holdout) (province) Feature Effects
eXtreme Gradient Boosted Trees Regressor with Early Stopping Poisson Loss Effect Size Holdout infection case Feature Effects
eXtreme Gradient Boosted Trees Regressor with Early Stopping (Poisson Loss) (Effect Size) (Holdout) (infection_case) Feature Effects

From the figures above, provinces such as Gyeongsangbuk-do and Chungcheongnam-do have more patients and they have slightly higher days-to-recovery; and people who get infected in places such as nursing homes, churches, and community centers will take more time to recover.

Conclusion

For days-to-recovery predictions, South Korean COVID-19 patients will take around 22 days to be discharged, and each prediction can have an error of around ±6.7 days. We noticed that within a month from February 18 to March 18, 2020, average days-to-recovery dropped from 25 to 17 days. We hypothesize that it is the result of having many new South Korean COVID-19 patients with no symptoms (asymptomatic) or mild symptoms who can recover much faster than the general population. There was an enormous amount of testing after the discovery of Patient 31 on February 18th3, and maybe about 10% with no symptoms and 40% with mild symptoms4. Recovery for asymptomatic patients is confirmed by having two consecutive negative test results within the last 24 hours5 .

One of the main limitations of DS4C data is that 83% of all patients are from three provinces of Daegu (64%), Gyeongsangbuk-do (13%), and Gyeonggi-do (6%). However, almost all of the Daegu patients are currently not present in the DS4C data. As the data is updated monthly, it is possible that more Daegu patients will be added.

Several prediction use cases are possible using DS4C. For most use cases, we have built initial models but decided that the data is not sufficient or reliable at the moment, or the predictive models are not accurate yet:

1. Other than days-to-recovery, there are other patient-level prediction use cases, such as:

  • Length-of-Stay (LOS)regression use case like days-to-recovery where it is the days difference between discharge and confirmed dates, except LOS will usually include patients who have died
  • Symptoms-to-confirmation – regression use case where it is the days difference between confirmed and symptom start dates (only 300+ patients in DS4C data with symptom start dates)
  • Severity – binary classification use case where we know that the patient was in an Intensive Care Unit or had comorbidities (not available in DS4C data), or exceeds x days of hospitalization (where x can vary from country to country) or died due to COVID-19 related complications (60+ deaths in DS4C data, but >200 deaths in official statistics) 
  • Statemulti-classification use case where the patient can be isolated, recovered, or deceased (most are isolated in DS4C, but most are recovered in official statistics)

2. Time-series forecasting models, at the country- or province-level, such as:

  • New daily confirmed cases count
  • New daily recovered count
  • New daily death count
  • Cumulative confirmed cases count
  • Cumulative recovered count
  • Cumulative death count
COVID
COVID-19 Response: DataRobot is offering services pro bono
Learn more

Acknowledgements

Valuable feedback was provided by Steven E. Moore, Sergey Yurgenson, and Yong Kim. Permission has been obtained from Jihoo Kim of DS4C to use this dataset for publication.

References

1 2020 coronavirus pandemic in South Korea. Accessed 3rd May 2020
2 DS4C: Data Science for COVID-19 in South Korea. Accessed 1st May 2020 
3 How a South Korean church helped fuel the spread of the coronavirus. Accessed 21st April 2020 
4 코로나19 초기 확진자 10% ‘무증상’, 발열도 10명 중 4명 그쳐. Accessed 22nd April 2020
5 무증상 양성자의 격리해제기준은 어떤가요?. Accessed 22nd April 2020

About the author
Clifton Phua
Clifton Phua

Customer Facing Data Scientist, DataRobot

Clifton is a Customer Facing Data Scientist (CFDS) at DataRobot working in Singapore and leads the Asia Pacific (APAC)’s CFDS team. His vertical domain expertise is in banking, insurance, government; and his horizontal domain expertise is in cybersecurity, fraud detection, and public safety. Clifton’s PhD and Bachelor’s degrees are from Clayton School of Information Technology, Monash University, Australia. In his free time, Clifton volunteers professional services to events, conferences, and journals. Was also part of teams which won some analytics competitions.

Meet Clifton Phua

João Gomes
João Gomes

Data Scientist at DataRobot

João is a Data Scientist at DataRobot working on automated feature engineering in Singapore

Meet João Gomes

Woonpyo Hong
Woonpyo Hong

Customer-Facing Data Scientist at DataRobot

Woonpyo is a Customer-Facing Data Scientist at DataRobot working in Seoul, South Korea

Meet Woonpyo Hong
  • Listen to the blog
     
  • Share this post
    Subscribe to DataRobot Blog
    Newsletter Subscription
    Subscribe to our Blog