Believing vs. Achieving -- The Disconnect between Efficacy Beliefs and Collaborative Outcomes

This study reveals that while persistent efficacy beliefs drive systematic "AI optimism" and influence delegation decisions, they have a weaker impact on actual human-AI team performance, suggesting that transparency-focused approaches may be insufficient for optimizing collaborative outcomes.

Philipp Spitzer, Joshua Holstein

Published Thu, 12 Ma
📖 6 min read🧠 Deep dive

Here is an explanation of the paper "Believing vs. Achieving" using simple language and creative analogies.

The Big Picture: The "Trust Gap"

Imagine you are a captain steering a ship (the Human), and you have a high-tech autopilot system (the AI). Your job is to decide: Do I steer this ship myself, or do I let the autopilot take over for this specific stretch of water?

This paper investigates a strange disconnect in how captains make that decision. The researchers found that what you believe about your skills and the AI's skills before you start often clashes with how you actually judge the situation in the moment.

They call this the gap between "Believing" (your general confidence) and "Achieving" (the actual result of your teamwork).


The Experiment: The "Income Guessing Game"

The researchers set up a game where 240 people had to guess if a person earns more than $50,000 a year based on their age, job, and education.

  • The Human: The participant.
  • The AI: A computer program that was pretty good at guessing (about 77% accurate).
  • The Choice: For each person, the participant could either guess themselves or say, "You know what, AI, you handle this one."

Before the game started, they asked the players: "How good are you at this?" and "How good is the AI?" (These are the General Beliefs).
During the game, for every single guess, they asked again: "How good are you at THIS specific guess?" and "How good is the AI at THIS specific guess?" (These are the Instance Judgments).

They also gave some players "cheat sheets" (Contextual Information):

  1. Data Sheet: Showing how the data is distributed (e.g., "Most people with a PhD earn over $50k").
  2. AI Report: Showing where the AI makes mistakes (e.g., "The AI is bad at guessing for people under 25").
  3. Both: A combination of the two.
  4. Nothing: Just the game.

The Three Big Surprises

1. The "Self-Confidence Anchor" (You are stubborn about yourself)

The Metaphor: Imagine you are a chef who believes, "I am a great cook." Even if you burn a specific steak, you still think, "I'm a great cook; that steak was just weird."
The Finding: People's belief in their own abilities (Self-Efficacy) was like a heavy anchor. No matter what "cheat sheets" they were given, they stuck to their original belief about how good they were. If they thought they were good generally, they thought they were good for every specific guess.

  • Result: The "cheat sheets" didn't change how people saw themselves.

2. The "AI Optimism" Bias (You think the robot is smarter in the moment)

The Metaphor: Imagine you think your GPS is "okay" generally. But when you are stuck in traffic and the GPS reroutes you perfectly, you suddenly think, "Wow, this GPS is a genius!" You forget your general skepticism and give it a temporary boost of confidence.
The Finding: People had a systematic bias called "AI Optimism." Even if they thought the AI was just "okay" generally, when they looked at a specific task, they suddenly thought, "Oh, the AI will definitely crush this one!"

  • The Twist: The only thing that stopped this optimism was showing them the AI Report (telling them exactly where the AI fails). The Data Sheet didn't help; only knowing the AI's specific weaknesses fixed the bias.

3. The "Amplifier" Effect (More info makes you more emotional, not smarter)

The Metaphor: Imagine you are driving and you get a map. You think, "Okay, I know the road better now." But instead of driving more calmly, you start making more dramatic decisions. If you feel confident, you drive faster. If you feel the car is better, you let go of the wheel sooner.
The Finding: Giving people more information (Data or AI reports) didn't necessarily make their teamwork better. Instead, it made their decisions more sensitive to their feelings.

  • If they felt slightly less confident than usual, they handed the task to the AI much faster.
  • If they felt the AI was slightly better than usual, they handed the task over much faster.
  • The Problem: This made their delegation behavior (who does the work) swing wildly, but it did not improve the final score. They were making more "emotional" choices, not "smarter" ones.

The Core Problem: "Believing" vs. "Achieving"

The most important takeaway is this: People's gut feelings about who should do the work do not match what actually works best.

  • The Disconnect: The study found that the factors driving people to delegate (like "I feel the AI is great right now") had a huge impact on who did the work, but almost zero impact on whether the team actually got the right answer.
  • The Analogy: It's like a basketball coach who keeps swapping players based on who "feels hot" in the moment. The coach makes a lot of swaps (high delegation activity), but the team's score doesn't go up because the swaps weren't actually based on who was the best player for that specific play.

What Should Designers Do? (The Advice)

The paper suggests that just showing people more data or explanations (Transparency) isn't enough. Here is the new advice:

  1. Don't just show the AI's stats; show the Human's bias. Help people realize, "Hey, you are being stubborn about your own skills," or "Hey, you are getting too excited about the AI right now."
  2. Target the "Root Beliefs," not just the moment. Instead of just helping people make one good decision, help them understand their general relationship with AI before they start the task.
  3. Separate "Learning" from "Doing." Give people complex data to help them understand the system (Calibration), but give them simple, clear tools to help them make the actual decision (Decision Support). Don't mix them up, or people get overwhelmed and make emotional mistakes.

Summary

We often think that if we give humans more information about an AI, they will work with it perfectly. This paper says: No.
Humans have a stubborn belief in themselves and a temporary, inflated belief in the AI. Giving them more info doesn't fix this; it just makes their decisions more volatile. To build better teams, we need to design systems that help humans see their own biases, not just systems that show them more charts.