Data Science Fails: The Transparency Sweet Spot
Does your organization apply appropriate human resources governance when hiring staff? Large enterprises tend to follow the same basic processes for hiring human staff. First, hiring managers write a job description, including the tasks the position requires and the skills and attributes of a suitable candidate. Job vacancies are posted, sometimes recruiters are also used, and people submit their resumes for consideration. Next, HR staff must sort through the applications, check resumes for suitability, invite in some applicants for in-person interviews, skills exercises, reference checks, and maybe even psychometric and IQ tests.
When the successful applicant is offered the role, they are given a copy of the job description and authorities, and trained in their new employer’s processes and procedures. Their performance is measured based on defined KPIs and feedback provided during regular performance review meetings.
The rigorous requirements of the job application process raises an important question: Why aren’t the same governance standards applied to artificial intelligence? Many AI solutions are off-the-shelf black-box systems, purchased without defining and documenting the required tasks, goals, and authorities. Too frequently, management takes no responsibility and has no understanding of an AI’s performance.
Case Study: Stanford University Exam Grades
Academic exam marking is typically a manual and human process, subject to human biases and the resulting issues of fairness. In 2013, Stanford University professor Clifford Nass received complaints from students enrolled in one of the two course sections of his technological interface course. The students believed that their section of the course unfairly received lower grades than students in the other section. When Professor Nass investigated, he discovered that the complaints were justified. Two different teaching assistants had marked the two student sections, and students with similar answers had received different grades.
Using his computer science skills, Professor Nass devised an algorithm that would correct for the human bias, giving higher grades to the students whose exams had been marked by the less generous exam marker. In the spirit of full transparency, he shared the full details of the algorithm. You would have thought this would satisfy everyone – after all he had ensured fairness and been fully transparent in his actions, right? Yet further complaints came in, with some students even angrier than before!
Maybe full transparency isn’t the solution to trust.
In 2015, René Kizilcec, a PhD student who had worked with Professor Nass, decided to conduct a research study to look at the effects of grading transparency on student trust. In Kizilcec’s study, 103 students submitted essays for peer grading. Internally the grading process returned two marks: a grade that represented an average peer grade and a “computed” grade which was the product of an algorithm that adjusted for bias. Students were randomly given one of three levels of transparency about how their final grade was calculated:
- Low Transparency: Students received the computed grade.
- Medium Transparency: Students received the computed grade accompanied by a paragraph explaining how the grade had been calculated, why adjustments had been made, and naming the type of algorithm used.
- High Transparency: Students received the computed grade accompanied by a paragraph explaining how the grade had been calculated, why adjustments had been made, and naming the type of algorithm used. These students also received their raw peer-graded scores and saw how these scores were each precisely adjusted by the algorithm to arrive at the final grade.
Students in each of the three groups were asked to rate their trust in the process. The final results were published in 2016 in the paper “How Much Information? Effects of Transparency on Trust in an Algorithmic Interface.” Using the data from the study, Kizilcec arrived at three key conclusions:
- Individuals whose expectations were not met (by receiving a lower grade than expected) trusted the system less, unless the grading algorithm was made more transparent through explanation.
- However, providing too much information eroded this trust.
- Attitudes of individuals whose expectations were met did not vary with transparency.
The observations, while disproving the common perception that greater transparency is always better, described intuitively reasonable behaviors. People distrust an algorithm when it deviates from their expectations and appears to be disadvantageous to them. Providing information can sometimes help to align human expectations with eventual outcomes, but too much transparency can have a negative effect. Maybe too much information is confusing, or perhaps it merely provides more opportunities for objections empowered by confirmation bias. Those who are not disadvantaged by a decision have little incentive to delve into the details of how the decision was made.
Kizilcec concluded, “Designing for trust requires balanced interface transparency — not too little and not too much.”
As many organizations have discovered, whoever owns your AI owns your business. This means that whoever builds and runs your AI owns the customer experience, the ethical values applied by the AI, and the intellectual property. AI-driven enterprises need to apply adequate AI governance and transparency standards to reduce the risk of reputational damage.
Professor Nass could have all prevented the student complaints via a strategy of optimal transparency. An approach of no disclosure and no transparency leads to inadequate AI governance, AI behavior that is inconsistent with business rules and core values, and dissatisfied stakeholders. On the other hand, a strategy of full transparency results in confusion and suspicion, disgruntled stakeholders, loss of intellectual property, and risks gaming of the system, e.g. fraudsters discovering tricks to avoid detection and exploit a system’s vulnerabilities. Optimal transparency uses heuristics to create human-friendly explanations of the reasons for a decision, which is crucial for decisions with adverse outcomes for stakeholders, and seemingly unintuitive decisions that defy stakeholder expectations.