Ψ\Psi-Bench: Evaluating Persona-Sensitive Influencing in Persuasive Dialogues

This paper introduces Ψ\Psi-Bench, a benchmark designed to evaluate the ability of large language models to proactively influence realistic users through persona-sensitive persuasion, revealing that while current models can generate coherent arguments, they still require significant improvement in leveraging user profiles for effective personalized interaction.

Original authors: Peixuan Han, Hongyi Du, Jiayu Liu, Yihang Sun, Yutong Liu, Jiaxuan You

Published 2026-06-03
📖 4 min read☕ Coffee break read

Original authors: Peixuan Han, Hongyi Du, Jiayu Liu, Yihang Sun, Yutong Liu, Jiaxuan You

Original paper licensed under CC BY 4.0 (http://creativecommons.org/licenses/by/4.0/). This is an AI-generated explanation of the paper below. It is not written or endorsed by the authors. For technical accuracy, refer to the original paper. Read full disclaimer

Imagine you have a very smart, polite robot friend. Currently, most of these robots are like excellent waiters: if you ask for the menu, they bring you exactly what you asked for. If you say, "I'm sad," they say, "I'm sorry to hear that." They are great at listening and responding, but they rarely take the initiative to change your mind or guide you toward a better decision on their own.

The paper Ψ-Bench asks a big question: Can we teach these robots to be more like a skilled coach or a persuasive friend? Can they look at who you are, understand your hidden personality, and gently nudge you to see things differently or do something helpful?

Here is a breakdown of how they tested this, using simple analogies:

1. The "Role-Playing Game" Setup

To test if robots can be good persuaders, the researchers created a video game-like environment called Ψ-Bench.

  • The Players: They set up three different "levels" or scenarios:
    • The Debate Club: Two people arguing about whether wired mice are better than wireless ones.
    • The Therapy Session: A robot acting as a counselor helping someone who feels down.
    • The Favor: Asking a busy friend to drive you to the airport.
  • The "Secret Character Sheet": In real life, you know your friend's personality (e.g., "He loves sports," "She hates being told what to do"). In this test, the robot persuader doesn't get this sheet. Instead, the robot has to guess who they are talking to just by listening to what they say. The "client" (the person being persuaded) is a simulated human with a specific, hidden personality.

2. The Challenge: "One Size Does Not Fit All"

The researchers found that while the robots are great at sounding smart and polite, they are terrible at tailoring their advice.

  • The Analogy: Imagine a robot trying to convince a stubborn, competitive athlete to rest. If the robot says, "Resting is good for your health," the athlete might ignore them. But if the robot says, "Even the greatest champions need to recover to win the next game," the athlete might listen.
  • The Result: The robots mostly used the "health" argument. They failed to use the "champion" argument because they didn't realize they were talking to a competitive athlete. They were like a chef cooking the same meal for everyone, regardless of whether the guest was a vegetarian, a child, or a food critic.

3. The "Magic Cheat Sheet" Experiment

The researchers wanted to know: What if we just gave the robot the Secret Character Sheet?

  • The Test: They let the robots see the client's profile (age, hobbies, personality) before the conversation started.
  • The Result: The robots got 18% better at persuading. This proves that the robots can be great persuaders, but only if they know who they are talking to. The bottleneck isn't their intelligence; it's their ability to figure out the user's personality on the fly.

4. The "Sherlock Holmes" Solution

Since we can't always give robots a cheat sheet (in real life, we don't have a file on everyone), the researchers tried to build a "Sherlock Holmes" module.

  • How it works: This is a small, specialized AI that listens to the conversation and tries to guess the client's personality profile in real-time. It then feeds that guess to the main robot.
  • The Outcome: This "Sherlock" module helped the robots get much better at persuading, even without a cheat sheet. It showed that if a robot can learn to "read the room" and guess who you are, it becomes a much more effective guide.

The Bottom Line

The paper concludes that current AI is like a polite but generic assistant. It can write a good speech, but it struggles to change a specific person's mind because it doesn't truly "know" them.

To make AI truly helpful and proactive, we need to move beyond just asking questions. We need AI that can observe, infer, and adapt to the unique personality of the human it is talking to. The paper provides a new "gym" (the benchmark) to train and test these robots so they can learn to be better coaches, counselors, and friends.

Important Note: The paper strictly tests these robots in a controlled, simulated environment. It does not claim these robots are ready to be deployed in real-world therapy or to manipulate people in dangerous ways. It is purely a tool to measure and improve how well AI understands human personality.

Drowning in papers in your field?

Get daily digests of the most novel papers matching your research keywords — with technical summaries, in your language.

Try Digest →