"I followed what felt right, not what I was told": Autonomy, Coaching, and Recognizing Bias Through AI-Mediated Dialogue

Imagine you are learning to dance. You want to learn the right steps so you don't accidentally step on your partner's toes or make them feel uncomfortable. But here's the twist: instead of a human teacher telling you exactly what to do, you are dancing with a robot partner, and a "coach" is whispering suggestions in your ear through a headset.

This paper is about a study that tested exactly this scenario to see how we learn to spot ableism (discrimination against people with disabilities) in everyday conversations.

Here is the story of the experiment, broken down into simple parts:

The Setup: The Dance Floor

The researchers built a digital "dance floor" (a chat interface) where 160 people had a conversation with a virtual character who uses a disability. The conversation happened in two settings: a casual party or a workplace.

The participants were split into four groups, each getting a different type of "coach":

The "Bad Coach" (Bias-Directed): This coach whispered suggestions that were subtly rude or ableist. For example, it might suggest asking the disabled person, "Is your disability making it hard to enjoy this party?" (This is a microaggression: it assumes they can't have fun).
The "Good Coach" (Neutral-Directed): This coach whispered helpful, inclusive suggestions. It might say, "Ask them what they are enjoying about the party."
The "Silent Coach" (Self-Directed): No coach at all. You just had to figure it out on your own.
The "Reading Club" (Control): This group didn't dance or talk. They just sat and read a pamphlet about what ableism is.

The Experiment: Before and After

Before the conversation, everyone took a test to see how well they could spot rude vs. normal interactions. Then they did their assigned activity. Then, they took the test again.

The researchers wanted to know: Did the conversation help people learn better than just reading? And did the type of coach matter?

The Surprising Results

1. Talking Beats Reading (Every Time)
The people who just read the pamphlet didn't learn much. In fact, they sometimes got worse at spotting the difference between rude and normal interactions.

The Analogy: Reading a manual on how to swim is not the same as jumping in the pool. The people who actually practiced the conversation (the three dialogue groups) learned much more than the people who just read about it.

2. The "Bad Coach" Paradox
This was the most surprising part. The group with the Bad Coach (who suggested rude things) actually became the best at spotting ableism.

Why? Because the suggestions were so obviously wrong or rude, the participants had to actively fight against them. They thought, "Wait, that's not right! I shouldn't say that."
The Analogy: It's like a teacher who tries to teach you math by giving you the wrong answers. You have to work harder to prove them wrong, and in doing so, you learn the math very well.
The Catch: While they got better at spotting the bad stuff, they also started thinking everything was a little bit negative. They became so sensitive to harm that they even rated normal, nice interactions as slightly negative. They were "on guard" too much.

3. The "Good Coach" Built Confidence
The group with the Good Coach learned to spot the bad stuff, but they also learned to appreciate the good stuff. They didn't just become hyper-sensitive; they became balanced.

The Analogy: This coach was like a supportive dance partner who gently nudged you toward the right steps. You felt confident, you didn't step on toes, and you enjoyed the dance.

4. The "Silent Coach" Was Okay, But...
The people with no coach did better than the readers, but they didn't learn as much as the coached groups. They relied on their own instincts, which were good, but they didn't get the extra "scaffolding" (support) to refine their skills.

The Big Takeaway: "I Followed What Felt Right"

The title of the paper comes from a participant who said, "I followed what felt right, not what I was told."

When the "Bad Coach" tried to push them toward rude comments, the participants said, "No, that feels wrong," and ignored the coach. This act of resistance was actually a powerful learning moment. They realized, "I know what a respectful conversation feels like, and this suggestion doesn't match that feeling."

What Does This Mean for the Future?

This study teaches us three big lessons about AI and how we treat each other:

Practice is better than lectures: You can't just read about being kind; you have to practice having conversations. AI can be a great "practice sandbox" for this.
AI isn't neutral: The way an AI suggests things changes how we think. If an AI suggests rude things, it might make us angry and hyper-vigilant. If it suggests kind things, it helps us build positive habits.
Resistance is a teacher: Sometimes, seeing a bad example (and realizing it's bad) teaches us more than seeing a good example. But we have to be careful not to make people so suspicious that they think everything is an attack.

In short: To learn how to be inclusive, we need interactive practice, not just reading. And while AI can be a great coach, we need to make sure it's coaching us toward kindness, not just showing us the "bad stuff" to fight against.

Here is a detailed technical summary of the paper "I followed what felt right, not what I was told": Autonomy, Coaching, and Recognizing Bias Through AI-Mediated Dialogue.

1. Problem Statement

Ableist microaggressions—subtle, often unintentional expressions of prejudice against people with disabilities—are pervasive yet difficult to recognize and address. While interventions exist for other forms of bias (e.g., race, gender), experimental work on ableism is limited. Furthermore, as Large Language Models (LLMs) increasingly act as conversational partners and "coaches" in social interactions, there is a critical gap in understanding how AI-mediated dialogue influences human recognition of bias.

The study addresses two main questions:

Can brief, structured AI-mediated dialogues improve the recognition of ableist microaggressions compared to passive reading?
How does the direction of AI coaching (biased vs. inclusive vs. unguided) shape these judgments?

2. Methodology

Study Design

The researchers conducted a pre-test → intervention → post-test experiment with 160 participants (after exclusions) recruited via Prolific. Participants were randomly assigned to one of four between-subject conditions:

Bias-Directed: Participants conversed with a virtual character (a person with a disability) while an AI "coach" (visible only to the user) provided nudges suggesting biased/ableist framings (e.g., pity, unsolicited help).
Neutral-Directed: Same setup, but the AI coach provided inclusive, bias-aware framings.
Self-Directed: Participants conversed with the virtual character with no coaching (unguided).
Reading (Control): Participants read a static informational module about ableism and microaggressions (no dialogue).

System Architecture & Implementation

Platform: A custom web application built with Flask (Python) backend and HTML/CSS/JS frontend.
LLM Integration:
- GPT-4o: Generated real-time responses for the virtual character (a person with a disability) and the coaching suggestions.
- DALL·E: Generated avatars for participants and characters.
Dialogue Flow: Conversations occurred in simulated everyday settings (a party or a workplace). The system maintained conversation history to generate context-aware responses.
Coaching Logic: In coached conditions, the AI generated one-way suggestions before each user turn. These were designed to subtly steer the user toward either ableist tropes (Helplessness, Minimization, Denial of Personhood, Otherization) or inclusive norms.

Materials & Measures

Vignette Corpus: A validated set of 40 scenarios (20 ableist, 20 neutral) adapted from the Ableist Microaggressions Scale (AMS). Scenarios were balanced by gender and disability type.
Metrics:
- Q1 (Standard Social Experience): Rated on a 7-point Likert scale (Is this a typical social interaction?).
- Q2 (Emotional Impact): Rated on a 7-point Likert scale (How would the character feel?).
- Change Scores ( $\Delta$ ): Difference between Post-test and Pre-test ratings.
- Contrast Scores: The difference between Neutral and Ableist ratings ( $Neutral - Ableist$ ), measuring the ability to differentiate bias.
Qualitative Data: Open-ended reflections collected immediately after the intervention for dialogue-based conditions.

3. Key Results

Quantitative Findings

Effectiveness of Dialogue (RQ1): All dialogue-based conditions outperformed the Reading condition. Participants in dialogue conditions showed greater improvements in recognizing ableism and affirming neutral interactions. The Reading condition often resulted in declines or weaker gains, suggesting passive learning can induce skepticism without behavioral practice.
Impact of Coaching Direction (RQ2):
- Bias-Directed: Produced the strongest differentiation between ableist and neutral scenarios. Participants exposed to biased nudges became significantly more sensitive to harm in ableist scenarios. However, this came with a negative halo effect: they also rated neutral scenarios more negatively on emotional impact.
- Neutral-Directed & Self-Directed: Fostered balanced judgments. These groups improved in recognizing ableist harm while simultaneously affirming the positivity of neutral interactions.
Differentiation (RQ3): Dialogue conditions significantly improved the ability to distinguish between biased and neutral interactions compared to reading. The Bias-Directed group showed the highest contrast scores, but the Neutral-Directed group maintained the most socially healthy overall trajectory.

Qualitative Findings

Resistance to Bias: In the Bias-Directed condition, participants frequently rejected the AI's ableist suggestions, describing them as "rude," "offensive," or "unnatural." This active resistance created a "critical friction" that sharpened their moral boundaries.
Scaffolding: In the Neutral-Directed condition, participants viewed the coach as helpful scaffolding, using suggestions to navigate social norms without feeling forced.
Authenticity: In the Self-Directed condition, participants reported high authenticity, relying on their own contextual adjustments (e.g., tone, formality) rather than system guidance.

4. Key Contributions

AI-Mediated Dialogue Platform: An empirically evaluated system for studying ableism in situ, isolating how one-way coaching influences conversational judgments.
Validated Vignette Corpus: A release of 40 balanced, disability-related interaction scenarios covering four microaggression domains, suitable for future research.
Empirical Evidence on Nudging: Demonstrates that active resistance to biased AI suggestions can be a powerful (albeit unintended) learning mechanism for recognizing harm, while inclusive coaching supports balanced social engagement.
Design Implications: Highlights the trade-offs in AI design: biased nudges increase critical awareness but risk general negativity; inclusive nudges support positive norms but may be perceived as directive.

5. Significance and Implications

Active vs. Passive Learning: The study confirms that interactive, situated practice is superior to passive reading for shifting social perceptions of bias.
The "Double-Edged Sword" of Bias: The finding that exposure to biased AI prompts can sharpen critical awareness (via resistance) challenges the assumption that biased AI is purely harmful. However, it warns of the "negative halo" effect where sensitivity to harm erodes the ability to see safety in neutral interactions.
Complementary Role for AI: The authors argue that AI-mediated dialogue should not replace disability-led education but serve as a scalable "sandbox" for practice, allowing learners to rehearse language and encounter critical friction between live training sessions.
Design Guidelines:
- Avoid "neutral" defaults; all design choices frame interactions.
- Prefer scaffolding (options/examples) over prescription (forced paths).
- If using biased examples for training, ensure consent and clear labeling to avoid normalizing harm.

This paper provides a foundational framework for using conversational AI not just as a tool for efficiency, but as a medium for social learning and bias recognition, emphasizing the need for human agency and critical engagement in AI-human interactions.