Deliberative Dynamics and Value Alignment in LLM Debates

This paper investigates how different deliberation protocols (synchronous vs. round-robin) and model architectures influence value alignment and verdict revision in multi-turn LLM debates, revealing significant behavioral disparities where GPT-4.1 exhibits strong inertia and autonomy-focused reasoning while Claude 3.7 Sonnet and Gemini 2.0 Flash demonstrate greater flexibility, empathy, and susceptibility to order effects.

Pratik S. Sachdeva, Tom van Nuenen

Published 2026-03-10
📖 5 min read🧠 Deep dive

Imagine you have three very smart, very opinionated robots. You give them a tricky moral problem—like a messy family drama or a dispute between friends—and ask them to decide who is "in the wrong."

In the past, researchers just asked these robots to give an answer once, like a student taking a pop quiz. But in the real world, these robots are starting to work together in teams, talking back and forth to solve problems. This paper asks: What happens when we let these robots actually debate each other?

The authors, Pratik and Tom, set up a giant "robot courtroom" using 1,000 real-life drama stories from Reddit's "Am I the Asshole?" (AITA) community. They pitted three top-tier AI models against each other: GPT-4.1 (OpenAI), Claude 3.7 Sonnet (Anthropic), and Gemini 2.0 Flash (Google).

Here is the breakdown of their findings, using some everyday analogies:

1. The Two Ways They Talked

The researchers tested two different ways the robots could talk:

  • The "Synchronous" Method (The Group Chat): Everyone types their answer at the exact same time, hits send, and then sees what the other person wrote. It's like a group chat where everyone posts their opinion simultaneously.
  • The "Round-Robin" Method (The Town Hall): They take turns. Person A speaks, then Person B hears Person A and speaks, then Person C hears both and speaks. It's like a meeting where you can't speak until the person before you is done.

2. The Personality Clash: The Stubborn Mule vs. The Chameleon

The biggest surprise was how differently the robots behaved when they heard each other's arguments.

  • GPT-4.1 is the "Stubborn Mule":
    When GPT-4.1 heard a counter-argument, it rarely changed its mind. It was incredibly stubborn. If it thought you were "Not the Asshole" (NTA) in the first round, it stuck to that gun, even if the other robot gave a great argument. It only changed its mind about 0.6% to 3% of the time. It had a strong "inertia"—it wanted to keep doing what it was doing.

    • The Metaphor: Imagine a mule that has decided to walk left. Even if you show it a map proving right is the way, it just digs its hooves in and says, "Nope, still walking left."
  • Claude and Gemini are the "Chameleons":
    These two were much more flexible. When they heard a good point, they were willing to rethink their stance. They changed their minds about 30% to 40% of the time.

    • The Metaphor: Imagine a chameleon. If the environment (the other robot's argument) changes color, the chameleon changes its color to match. They were much more open to persuasion.

3. The "Order Effect": Who Speaks First Matters

When they used the "Town Hall" (Round-Robin) style, the order in which they spoke became a superpower.

  • If Claude spoke first, GPT was much more likely to agree with Claude, even if GPT initially disagreed.
  • If GPT spoke first, it was harder to sway them.
  • Gemini was the ultimate "people pleaser." If it spoke second, it almost always agreed with whoever spoke first.

The Takeaway: The robot that speaks first often sets the tone, and the robot that speaks second often just goes along with the flow to avoid conflict. This is called "conformity."

4. What Values Did They Care About?

The researchers also looked at why the robots changed their minds. They analyzed the "values" the robots used in their arguments.

  • GPT-4.1 cared mostly about Personal Autonomy and Direct Communication. It loved the idea of "You do you" and "Say exactly what you mean."
  • Claude and Gemini cared more about Empathy, Emotional Safety, and Constructive Dialogue. They were more focused on how the people in the story felt and how to keep the peace.

When the robots finally agreed on a verdict, they also agreed on the values behind it. It was like two people finally agreeing on a movie choice; they didn't just pick the same movie, they realized they both loved the same genre.

5. The "Open Source" Wildcards

They also tested some open-source models (DeepSeek and Llama).

  • DeepSeek was surprisingly stubborn, acting just like GPT-4.1.
  • Llama 8B (a smaller model) was chaotic. It changed its mind constantly, even when it couldn't reach an agreement with the other robot. It was like a student who keeps changing their answer on a test until the teacher takes the paper away.

The Big Picture: Why This Matters

This paper teaches us that how we design the conversation matters just as much as the AI itself.

If you are building a system where AI agents give advice (like for mental health or legal disputes), you can't just assume they will "debate" their way to the truth.

  • If you use a parallel format (everyone talks at once), you might get a stubborn AI that refuses to listen.
  • If you use a sequential format (taking turns), you might get an AI that just agrees with the first person it hears to be polite (sycophancy).

The Final Lesson:
AI isn't just a static calculator that gives the same answer every time. It's a social creature that changes its behavior based on the rules of the game. If you want AI to be wise, you have to design the "room" where it talks, not just the "brain" that thinks.