Back to Papers

Is AI Ground Truth Really True? The Dangers of Training and Evaluating AI Tools Based on Experts’ Know-What

Sarah Lebovitz, Natalia Levina, Hila Lifshitz‐AssafInformation SystemsIT治理UTD24
MIS Quarterly2021-09-01University of Virginia; Shandong University of Political Science and Law; Supélec; University of Applied Sciences and Arts of Southern Switzerland; New York UniversityDOI
Citations49

Organizational decision-makers need to evaluate AI tools in light of increasing claims that such tools outperform human experts. Yet, measuring the quality of knowledge work is challenging, raising the question of how to evaluate AI performance in such contexts. We investigate this question through a field study of a major U.S. hospital, observing how managers evaluated five different machine-learning (ML) based AI tools. Each tool reported high performance according to standard AI accuracy measures, which were based on ground truth labels provided by qualified experts. Trying these tools out in practice, however, revealed that none of them met expectations. Searching for explanations, managers began confronting the high uncertainty of experts’ know-what knowledge captured in ground truth labels used to train and validate ML models. In practice, experts address this uncertainty by drawing on rich know-how practices, which were not incorporated into these ML-based tools. Discovering the disconnect between AI’s know-what and experts’ know-how enabled managers to better understand the risks and benefits of each tool. This study shows dangers of treating ground truth labels used in ML models objectively when the underlying knowledge is uncertain. We outline implications of our study for developing, training, and evaluating AI for knowledge work.

Ground truthArtificial intelligenceComputer scienceQuality (philosophy)Common groundField (mathematics)Work (physics)Training (meteorology)Knowledge managementPsychologyEngineeringMathematics
Related Papers (8-Dimension Scoring)

Mitigating Traffic Congestion: The Role of Intelligent Transportation Systems

Zhi Cheng, Min‐Seok Pang, Paul A. Pavlou · Information Systems Research

Score: 57

Examining the Heterogeneous Impact of Ride-Hailing Services on Public Transit Use

Yash Babar, Gordon Burtch · Information Systems Research

Score: 57

Financial Incentives Dampen Altruism in Online Prosocial Contributions: A Study of Online Reviews

Dandan Qiao, Shun‐Yang Lee, Andrew B. Whinston, Qiang Wei · Information Systems Research

Score: 57

Eye-Tracking-Based Classification of Information Search Behavior Using Machine Learning: Evidence from Experiments in Physical Shops and Virtual Reality Shopping Environments

Jella Pfeiffer, Thies Pfeiffer, Martin Meißner, Elisa Weiß · Information Systems Research

Score: 57

Platform Pricing and Investment to Drive Third-Party Value Creation in Two-Sided Networks

Burcu Tan, Edward G. Anderson, Geoffrey Parker · Information Systems Research

Score: 52

Consumption and Performance: Understanding Longitudinal Dynamics of Recommender Systems via an Agent-Based Simulation Framework

Jingjing Zhang, Gediminas Adomavičius, Alok Gupta, Wolfgang Ketter · Information Systems Research

Score: 52

More Than a Bot? The Impact of Disclosing Human Involvement on Customer Interactions with Hybrid Service Agents

Ulrich Gnewuch, Stefan Morana, Oliver Hinz, Ralf Kellner, Alexander Maedche · Information Systems Research

Score: 51

Does Telemedicine Reduce Emergency Room Congestion? Evidence from New York State

Shujing Sun, Susan Lu, Huaxia Rui · Information Systems Research

Score: 47