Data Science Fails: Be Careful What You Wish For
There is a saying, “Be careful what you wish for; you might just receive it.” One of the first-ever stories to exemplify this saying comes from Greek mythology. The story goes that Midas earned the favor of Silenus, a satyr. Silenus offered Midas a wish, and Midas wished that everything he touched would turn to gold. Immediately, Midas put the power to the test, touching a rose and turning it into solid gold. Excited by the opportunities of his newfound power, Midas ordered a royal feast to celebrate. But whenever Midas touched any food or drink, it turned to gold, and he was unable to eat. Midas cursed the new power he had been given.
The “Midas Effect” is more relevant today than ever: a recent viral news story about a biased healthcare AI has become a cautionary study about the dangers of setting the wrong goals for an AI system.
AI Ethics: Perverse Incentives and Unfair Bias
There are four principles of AI ethics: Ethical Purpose, Fairness, Disclosure, and Governance.
The first principle is ethical purpose. An ethical AI will be used for purposes that improve society. Just like humans, AIs are subject to perverse incentives, maybe even more so than humans. So, it stands to reason that you need to choose carefully the tasks and objectives that you assign to AIs, as well as the historical data used to train them. Consider the objective that the AI must optimize. Since an AI will ruthlessly optimize its objective, at the potential cost of other organizational goals, list all competing objectives, and add them as constraints upon the AI as required.
The second principle is fairness. Most countries around the world have laws protecting against some forms of discrimination, including everything from race and ethnicity to gender, disability, age, and marital status. It goes without saying that companies must obey the law with regards to protected attributes. But beyond that, it is also good business practice to safeguard sensitive attributes, such as where there is an asymmetry of power or information. Bias can be direct or indirect. Direct bias is when sensitive attributes affect decisions and outcomes (e.g., when gender is used to determine who should be hired). Indirect bias is when another attribute (such as zip code) is used as a proxy for a sensitive attribute (such as race) when that attribute is strongly correlated with the discriminatory attribute. Indirect bias can lead to problems with disparate impact.
The third principle is disclosure, disclosing sufficient information to an AI’s stakeholders so that they can make informed decisions. Black box AI systems, without any disclosure or explanation of their workings or decisions, may silently make dangerous and inappropriate decisions. Stakeholder disclosure will include both internal and external stakeholders and is an important risk management tool for ensuring that an AI system behaves as expected.
The fourth and final principle is governance. With the possibility of adverse outcomes from AI failures comes the responsibility to manage AIs, apply high standards of internal governance and risk management. Higher standards of AI governance are required when there is significant risk of harm to humans, such as with life or death medical decisions.
In the case study below, we will see problems with the application of all four of these principles.
Case Study: Improving Healthcare Efficiency
Healthcare costs are rising. A recent study showed that healthcare costs have been increasing at three times the rate of inflation. It is no wonder that hospitals and healthcare managers have been looking to AI to improve the efficiency of healthcare, delivering improved health outcomes for the same cost, or the same health outcomes achieved with reduced cost. And AI holds great promise to help improve healthcare efficiencies, including popular healthcare use cases, such as proactively identifying sepsis infection cases, reducing unnecessary hospital readmissions, and reducing wasted staff time by predicting which patients will miss their medical appointments.
But not all healthcare AIs are created equally, and not all are improving health outcomes. In a Science journal article, “Dissecting racial bias in an algorithm used to manage the health of populations,” published in October 2019, the authors describe a data science failure. When a health system anticipates that a patient will have extraordinarily complex and intensive future health care needs, that patient is enrolled in a ‘‘care management’’ program, which provides considerable additional resources: greater attention from trained providers and help with coordination of care. Some healthcare systems use an algorithm to determine a patient’s “commercial risk score,” which in turn is used to select which patients are given access to care management.
These commercial risk scoring algorithms are “widely considered effective at improving outcomes and satisfaction while reducing costs,” but since they are proprietary algorithms, health researchers have been hampered in dissecting their behavior. However, this time was going to be different. The researchers had access to detailed data on primary care patients in a large teaching hospital and decided to use that data to test the behavior of a widely used algorithm, to measure differences in outcomes between the self-identified race of patients.
At ﬁrst, the research only looked at clinical data. But when the researchers expanded their analysis to other patient information, they discovered that one of the best predictors for length of stay was the patients’ zip code. The zip codes that correlated to longer hospital stays were in poor and predominantly African-American neighborhoods. Then, when they mapped commercial risk scores versus the number of active chronic conditions, split by race, they discovered that African-American patients with the same number of chronic health problems received lower commercial risk scores. This bias arose because the algorithm predicts healthcare costs rather than illness, but unequal access to care means that less money had been spent caring for African-American patients.
The research concluded that the predictions were not inherently biased or wrong. The algorithm was doing precisely what it was designed to do. However, the objective given to the algorithm was creating unintended adverse side effects. “The initial goal was eﬃciency, which in isolation is a worthy goal,” says Marshall Chin, one of the University of Chicago scientists who worked on the project. He noted that fairness is also important but had not been explicitly considered in the algorithm’s design. The authors suggested “that the choice of convenient, seemingly effective proxies for ground truth can be an important source of algorithmic bias in many contexts.”
Despite the Science article inspiring many viral news articles, with attention-grabbing headlines such as “Racial bias in health care software aids whites over blacks,” there is nothing new in this story. Back in January 2019, Obermeyer and Mullainathan, authors of the October 2019 Science paper, published a similar article, “Dissecting Racial Bias in an Algorithm that Guides Health Decisions for 70 Million People”, describing the same case study and conclusions.
Case Study: Pneumonia Mortality Risk
Industry insiders have known of data science problems in healthcare for a while. Back in 2017, Cabitza wrote a JAMA editorial, stating that “the introduction of new technologies in health care has not always been straightforward or without unintended and adverse effects.” Cabitza describes an algorithm that was developed to predict mortality for pneumonia patients, with the aim of providing additional preventative treatment for high-risk patients. Without the contextual information that asthmatic patients were already being prescribed prophylactic treatments, the algorithm dangerously concluded that patients with a history of asthma had a lower risk. The algorithm was correct in predicting that asthmatics with pneumonia are typically lower risk, but the goal that the algorithm was given and the way that the algorithm was planned to be used had unintended side effects. Everything else being equal, asthmatics have higher risk of lung infections and pneumonia-related mortality, but the algorithm would have dangerously chosen to not provide the extra care that they require.
Conclusion: Adopt Best Practices for AI
The problems in the case studies arose due to incorrect goals, perverse incentives, and inadequate model governance processes. The lessons to be learned are:
Define your organization’s values for AI ethics and publish those values as internal policy guidelines. These guidelines will guide the development of new AIs and the design of appropriate governance processes. DataRobot provides a free online tool to help your AI ethics guidelines to cover the broad range of AI ethics topics.
When practical, build your own AIs and avoid using third-party, black-box sourced AIs. Those AIs may not share your values. Insist that your AIs decisions are explainable and justifiable. For use cases with a significant risk of harm, such as in healthcare, apply rigorous AI governance practices before and after deploying AI systems.
Be careful what you wish for; you might just receive it. AIs can be ruthless in optimizing the goals you give them. For AI you can trust, apply best practices for AI governance, and use the latest generation of AIs that provide human-friendly explanations.