Humans and AI: Data Scientists Are Human Too

April 19, 2021

by

· 6 min read

Do you remember Liquid Paper? It was the original white correction fluid, invented in 1956 by a professional typist and used to cover typing and handwriting mistakes. Correction fluid quickly became a business necessity in a time when typewriters were the primary instrument of business communications. The fastest typing speed ever recorded, 212 words per minute, was achieved in 1946 by Stella Pajunas-Garnand on an electric typewriter. Most professional typists typed 50 to 80 words per minute. At that speed, typing mistakes were bound to occur.

Even experts make mistakes!

Data Scientists Are AI Experts

For well-defined business processes, we have conventional computer systems. Programmers write code based on rules carefully defined by business experts. But for more complex business decisions, including those that use less structured data, we have artificial intelligence systems.

Most modern artificial intelligence systems are powered by machine learning algorithms, which learn by example. Data scientists train the algorithms using datasets that contain curated learning examples. Data scientists are also the experts in data pipelines: sourcing, loading, cleaning, joining, and feature engineering data into a form suitable for each machine learning algorithm. When the machine learning algorithm is trained, data scientists take the outputs (predictions), apply business rules to turn predictions into decisions, and send these predictions and decisions into the organization’s computer systems.

It is the data scientist’s job to ensure and follow proper data science practices and do so in a repeatable, scalable, accurate, and reliable manner. They need data assembly, data science, integration, and visualization skills. Data scientists also need to understand how the business operates, which requires communication and presentation skills. A high level of expertise and experience is required to become a successful data scientist. Because it’s rare to find one person with a skill set this broad, you might need to build a team of experts.

Not All AI Is Trustworthy

It has been more than five years since I last entered a data science competition, but there’s one memory that remains vivid. I had spent a week at my computer coding up a cool-looking solution. Given my results in previous competitions, I felt confident, so I clicked Submit and waited for my leaderboard ranking. I discovered that I had placed second to last!

After the initial shock and disappointment, I reviewed my work. It turns out that in the first day of coding, I made a silly mistake and used the wrong column name. The model was worthless. I had wasted an entire week of work. It was embarrassing, but thank goodness my model was built just for fun, not for a mission-critical business application that could result in millions of dollars in losses or for a healthcare application that could affect life-or-death decisions.

My mistake was not unusual. Over the past few years, we’ve heard stories about AI failures, such as racist medical algorithms, AI-driven hedge funds that consistently underperformed the market, sexist lending apps, and an AI earthquake aftershock prediction tool that was less accurate than much simpler physics-based tools.

As EY warns, your AI “can malfunction, be deliberately corrupted, and acquire (and codify) human biases in ways that may or may not be immediately obvious. These failures have profound ramifications for security, decision-making and credibility, and may lead to costly litigation, reputational damage, customer revolt, reduced profitability and regulatory scrutiny.”

We need to discover and correct errors before an AI system goes into production.

Data Scientists and Error Detection

Because data scientists are AI experts, we rely on them to discover and correct errors in the AI systems they build. We expect more from them than we do a layperson. But data scientists are only human, prone to cognitive biases and limited by the amount of their working memory resources or cognitive load. We should design AI governance and model validation processes that allow for the strengths and weaknesses of data scientists.

There is some recently published research about the ability of data scientists to discover errors. In “Interpreting Interpretability: Understanding Data Scientists’ Use of Interpretability Tools for Machine Learning,” the authors report the results of two experiments.

In the first experiment, study participants were presented with a dataset, a machine learning model that had been trained using that dataset, and an interpretability tool. Of the two interpretability tools used in the study, both showed the relative importance of each input feature and the relationship of input features to predictions. One of these tools used complex methodology. The other used straightforward methodology and was simpler to understand. The dataset had been altered to incorporate several common data science issues, such as missing values, redundant features, and temporal changes due to economic inflation. Each participant was asked to use these materials to answer a questionnaire, testing their understanding and trust of the visualizations, their ability to detect issues, their confidence in the model, and their assessment of its readiness for deployment. The results of this first experiment showed a misalignment between the intended use of interpretability tools and data scientists’ understanding, usage, and trust in the visualizations. The researchers concluded that participants “trusted the tools because of their visualizations and their public availability, though participants took the visualizations at face value instead of using them to uncover issues with the dataset or models.”

Given the small sample size of the initial experiment (only 11 participants), the researchers conducted a second experiment to quantify their findings and further explore the mental models used by data scientists. Although the dataset, models, and interpretability tools were identical to those used in the first experiment, the researchers further developed the experimental design by randomly showing the participants either standard or manipulated visualizations. They manipulated visualizations from the model interpretability tools by rearranging the input feature names, breaking the internal consistency of the insights, and reversing the apparent importance of each input feature. These manipulations were designed to test whether the data scientists depended on the reasonability of the explanations or simply the existence of visualizations.

The researchers observed that most participants gave the models high deployment ratings based on intuition, driven by their experience with machine learning, rather than careful consideration of the explanations. Some participants said that they would ask a customer to trust their judgment that the model was suitable for deployment. Even those participants who recognized issues in the model or diagnostics were biased toward model deployment. Similar to the results of other experiments on laypersons’ ability to understand AI, the researchers observed that participants who used the simpler interpretability tool were more accurate in discovering errors than those who used a tool that placed a higher cognitive load. The researchers concluded that static visualizations were suboptimal, that “it may be better to design interpretability tools that allow back-and-forth communication.”

Although they are experts in AI, data scientists are human too. They display cognitive biases such as anchoring, availability heuristic, confirmation bias, and social proof. They are more likely to fail to identify errors when their tools place a high cognitive load. They might sometimes be reticent or ineffective in their interactions with laypersons, such as your business subject matter experts, and rely on assertions of trust instead.

Humans and AI Best Practices

AI governance best practices and processes help reduce the chances that human biases and errors will be passed to your AI systems.

Static insights and diagnostics and familiar products and processes lead to overconfidence and untrustworthy models. We need automated guardrails: insights and diagnostics that intelligently flag potential problems for data scientists to investigate and correct.
Minimize opportunities for human errors by automating well-defined and repetitive manual processes.
Make data scientists explain and vet model behavior with business subject matter experts, using intuitive diagnostics and insights with low cognitive loads.
Require business subject matter experts to sign off on model behavior.
Apply data storytelling to contextualize model insights. Link the visualizations to customer behaviors, business rules, and the competitive landscape.
Identify all diagnostics and insights that were unexpected and anomalous. Provide a written explanation for the root cause in the real world (for example, public holidays, new regulations, and economic factors).

About the author

Colin Priest

VP, AI Strategy, DataRobot

Colin Priest is the VP of AI Strategy for DataRobot, where he advises businesses on how to build business cases and successfully manage data science projects. Colin has held a number of CEO and general management roles, where he has championed data science initiatives in financial services, healthcare, security, oil and gas, government and marketing. Colin is a firm believer in data-based decision making and applying automation to improve customer experience. He is passionate about the science of healthcare and does pro-bono work to support cancer research.

Meet Colin Priest

Share this post

Subscribe to DataRobot Blog

First Name

Last Name

Email

Country

State

Yes! Please email me news and offers for DataRobot products and services.

DataRobot is committed to protecting your privacy. You can find full details of how we use your information, and directions on opting out from our marketing emails, in our Privacy Policy.