DataRobot PartnersUnify all of your data, ETL and AI tools in our open platform with our Technology Partners, extend your cloud investments with our Cloud Partners, and connect with DataRobot Services Partners to help you build, deploy or migrate to the DataRobot AI Platform.
This follow up study was conducted 2 weeks after the first study (see below) and some key statistics are:
161% increase in cases: 1189 known COVID-19 cases in entire Singapore
Higher proportion of cases being hospitalized, despite limiting non-essential gatherings: 74.5% are in hospital, 0.5% (6 patients) have died, and 25% have recovered
More local transmissions than imported cases, as a result of banning all short-term visitors from entering Singapore: 54% local transmissions and 46% imported cases
In this 676-patient sample, there are significantly more serious cases now: 547 (81%) are or had been in serious condition, out of which 373 are still in hospital. We left out patients who were admitted after 25th March 2020. The new clusters of cases mostly come from communities, such worker dormitories and nursing homes.
Other than Symptomatic to Confirmation and Details, we also used features such as Displayed Symptoms, Age, Country of Origin, and Nationality. Using this 676-patient sample and these features, we achieved around 71% accuracy which is much lower than the model built 2 weeks ago. This can be directly attributed to fewer details being published online about the new cases, so most of the Symptomatic to Confirmation values are missing, and only the hospital where the patient was admitted into and prior travelled countries are made known in Details.
On 3rd April 2020 (Friday), 2 days before this follow up study, it was announced that most workplaces and all schools will be closed soon to reduce the local transmissions [1].
In addition to triaging, we have learnt that predicting severity for COVID-19 has more downstream uses, such as understanding the upcoming demand for staff, beds, and critical medical equipment. We can also extend this to other medical admission types, such as flu and pneumonia.
Original Study: March 22, 2020
Our objective is to predict the severity of a COVID-19 patient at time of hospital admission, to provide a second opinion to the triaging officer, so that more resources can be accurately allocated to a serious case [2]. We define a serious case as a patient having at least 11 days of hospitalisation which is equal to or more than the average stay of a serious COVID-19 case [3], or died in hospital.
We used Singapore as a specific example because it has more proactive testing and containment efforts than most other countries, with the intention of extending COVID-19 severity prediction to other countries. The main data source is Singapore government’s Ministry of Health (compiled from public sources [4] and [5]). This study was conducted on March 22, 2020 (Sunday) and some key statistics are:
455 known COVID-19 cases in entire Singapore, with the first case confirmed on January 23, 2020
>75% between ages of 21 to 60
>70% are Singaporeans and the rest are made up of at least 25 other nationalities
68% are in hospital, 31.5% have recovered, and 0.5% (2 patients) have died
54% imported cases (14% from UK, 5% from US, 5% from China) and 46% local transmission
In this 190-patient sample, 119 (63%) are or had been in serious condition, out of which 44 are still in hospital which can potentially introduce some bias. We left out patients who were admitted after March 11, 2020 because we do not know if they will remain in hospital for at least 11 days. Using this sample, we achieved around 85% accuracy (AUC) on a holdout set. The top-2 features which predict severity are:
Details contain textual description of each patient which is usually collected at time of hospital admission. It is about their profile, recent travel history, who and where (s)he visited, where (s)he is hospitalized, which known COVID-19 cases (s)he is connected to, when was the onset of symptoms and where (s)he has initially sought treatment etc. From the WordCloud above, size of word indicates frequency and red is related to higher chance of being severe. One interesting insight is that patients in the National Centre for Infectious Diseases (NCID) (330-bed purpose-built facility for infectious diseases) tend to have lower severity, compared to other public and private hospitals. NCID could receive patients of a lower severity or have better treatment, but we are not able to discern this from the current sample. Another insight is that the initial cases from China and/or Wuhan seemed to be less severe cases who were well enough to travel to Singapore.
Symptomatic to Confirmation refers to the days elapsed between onset of symptoms [6] and confirmation of COVID-19. The Histogram above shows that if the patient shows symptoms and (s)he gets tested and confirmed much later after the symptoms, it is most likely a mild or moderate case. We have 2 explanations for this. First, for milder cases, it usually takes longer for symptoms to develop sufficiently to warrant testing and hospitalization. Second, many of these early COVID-19 patients (circa early February) visited family clinics during onset of symptoms, but were only referred to hospitals much later as COVID-19 was not as widespread (at that time mostly contained in China, Japan, Hong Kong, and Singapore).
On March 23, 2020 (Monday), a day after this study was conducted, there was a spike of 54 new cases (48 are imported from overseas). At the end of this day, Singapore has temporarily banned all short-term visitors from entering or transiting via the country [7]. On March 24, 2020 (Tuesday), Singapore has limited gatherings outside work and school to 10 or less, and closed all bars, entertainment venues, tuition and enrichment centres, and religious services [8]. For future work, we will be keen to:
understand the effects of these containment measures on COVID-19 severity prediction in a few weeks’ time
extend COVID-19 severity prediction globally to other countries, such as Taiwan, Indonesia, South Korea, China, and/or France where patient data is already available
may present studies on other COVID-19 patient use cases, such as hospital length-of-stay or days-to-recovery prediction to help with bed utilization management at the wards
Acknowledgements
This study has benefited from review and advice from Sergey Yurgenson.
Clifton is a Customer Facing Data Scientist (CFDS) at DataRobot working in Singapore and leads the Asia Pacific (APAC)’s CFDS team. His vertical domain expertise is in banking, insurance, government; and his horizontal domain expertise is in cybersecurity, fraud detection, and public safety. Clifton’s PhD and Bachelor’s degrees are from Clayton School of Information Technology, Monash University, Australia. In his free time, Clifton volunteers professional services to events, conferences, and journals. Was also part of teams which won some analytics competitions.