Data Literacy for Responsible AI: Algorithmic Bias

August 16, 2021

by

· 3 min read

As artificial intelligence secures its position in the public sphere, consumers expect companies to use the maturing technology ethically and responsibly. Seventy percent of customers expect organizations to provide transparent and fair AI experiences, according to a recent Capgemini report. But as the technology’s popularity grows, a number of concerning examples have emerged of AI models operating with algorithmic bias.

For example, in 2016 ProPublica discovered algorithmic bias in COMPAS (Correctional Offender Management Profiling for Alternative Sanctions), a proprietary algorithm used to predict the likelihood that an individual accused of a crime would reoffend. COMPAS was more likely to recommend false positives for Black defendants, inaccurately suggesting a higher likelihood they would reoffend, and false negatives for white defendants. Because the algorithm was being used to inform prison sentencing, algorithmic bias in the system came at an incredible cost: false positives led to more severe sentencing and fewer parole opportunities, and false negatives led to lenience for those who were actually more likely to break the law again.

Algorithmic bias can appear in both supervised and unsupervised AI models. Supervised examples allow us to trace algorithmic bias back to the human source of an AI’s knowledge, revealing how historical bias and systemic discrimination inform the outcomes used to train an AI. But even without human labeling, algorithmic bias can appear in unsupervised models when the data itself reflects damaging social norms and stereotypes.

In 2020, OpenAI revealed an NLP algorithm called GPT-3 that performed sophisticated autocomplete functionality for a range of complex language tasks. GPT-3 had learned from gigabytes of unlabeled text and could produce highly convincing text. However, prompts as simple as “Two Muslims walked into a…” revealed GPT-3’s bias: the algorithm generated text associating Muslims with violence at far higher rates than other religious groups.

These and other examples show how AI can go wrong, but defining algorithmic bias in any particular use case is still very difficult. AI researchers and academics have proposed over 70 metrics that can each define bias by pinpointing how an algorithm treats different groups represented in a dataset differently. Those groups are defined through a combination of protected or sensitive characteristics (such as race, gender, age, pregnancy, or veteran status). Deciding what bias metric is most relevant requires a contextual interpretation of a use case. One place to start is with the two general kinds of fairness: by representation or by error.

Fairness by representation focuses on the outcomes a model predicts to determine if there is a difference in the likelihood each group will receive favorable treatment. Fairness by error assesses the quality of a model’s performance and accuracy across groups to determine whether some groups are disproportionately affected by potential errors.

Alongside the different types of bias and the fairness metrics used to distinguish them, solving bias requires an understanding of its source. When bias is revealed, organizations should investigate the data to determine where the bias was introduced. The following are just some of the ways bias can be traced back to data:

Skewed datasets that lack representation
Examples tainted with unreliable labels or historical bias
Bad data collection practices leading to limited features for certain groups
Small sample size limiting the model’s ability to learn
Proxy features that indirectly leak information about protected attributes

When an investigation reveals the source of the bias, the next step is to mitigate the issue wherever it arises. Mitigation strategies can take place at three different stages of the machine learning pipeline:

Pre-processing: aimed at reducing bias in the data used to train a model
In-processing: applied during training, forcing models to learn less biased patterns
Post-processing: modifying model predictions based on fairness metrics

To realize the potential of artificial intelligence, we must develop a nuanced understanding of algorithmic bias, identify its sources, and enact pathways to mitigation. Bias in our society didn’t start with technology, but AI certainly presents an opportunity to deprogram bias from the system and codify the ethics and values that will lead to a more fair and just world.

About the author

Scott Reed

Trusted AI Data Scientist

Scott Reed is a Trusted AI Data Scientist at DataRobot. On the Applied AI Ethics team, his focus is to help enable customers on trust features and sensitive use cases, contribute to product enhancements in the platform, and provide thought leadership on AI Ethics. Prior to DataRobot, he worked as a data scientist at Fannie Mae. He has a M.S. in Applied Information Technology from George Mason University and a B.A. in International Relations from Bucknell University.

Meet Scott Reed

Adel Nehme

Data Science Evangelist at DataCamp

Meet Adel Nehme

Share this post

Subscribe to DataRobot Blog

First Name

Last Name

Email

Country

State

Yes! Please email me news and offers for DataRobot products and services.

DataRobot is committed to protecting your privacy. You can find full details of how we use your information, and directions on opting out from our marketing emails, in our Privacy Policy.

Share this post

Subscribe to DataRobot Blog

First Name

Last Name

Email

Country

State

Yes! Please email me news and offers for DataRobot products and services.

DataRobot is committed to protecting your privacy. You can find full details of how we use your information, and directions on opting out from our marketing emails, in our Privacy Policy.

Data Literacy for Responsible AI: Algorithmic Bias

How to Choose the Right LLM for Your Use Case

Belong @ DataRobot: Celebrating 2024 Women’s History Month with DataRobot AI Legends

Choosing the Right Vector Embedding Model for Your Generative AI Use Case

Related Posts

Thanks! Check your inbox to confirm your subscription.