Towards Strategic Persuasion with Language Models

Imagine you are trying to convince a friend to try a new restaurant. You could just shout, "Go there, it's amazing!" (that's deception or just noise). Or, you could show them the entire menu, the chef's biography, and the health inspection report (that's total transparency).

But the most effective way? You might say, "The pasta is incredible, but the wait is long," or "The dessert is to die for, but it's very sweet." You are strategically choosing what to tell them and what to leave out to guide their decision without lying. This is the art of Strategic Persuasion.

This paper, presented at ICLR 2026, asks a big question: Can AI (Large Language Models) learn to be master persuaders, and can we teach them to do it even better?

Here is the breakdown of their research using simple analogies:

1. The Problem: AI is Getting Too Good at Talking

We know AI can write convincing emails and arguments. Some people are worried this is dangerous (like a robot politician manipulating voters), while others see benefits (like a robot doctor convincing you to get a vaccine).

The problem is that we don't have a good "test" to measure this. Previous tests were like asking a human, "Did that sound convincing?" which is subjective and expensive. Plus, persuasion is tricky; what works on a teenager might fail on a CEO.

2. The Solution: A "Game Theory" Playground

The authors decided to stop guessing and start using math. They used a concept from economics called Bayesian Persuasion.

Think of it like a Magic 8-Ball game:

The Sender (The AI): Knows the "true state" of the world (e.g., "This restaurant is actually great, but the service is slow").
The Receiver (The Human or another AI): Has a "prior belief" (e.g., "I hate slow service").
The Goal: The Sender wants the Receiver to choose an action (e.g., "Go to the restaurant") that makes the Sender happy.

The Sender can't force the Receiver to go. They can only reveal information. The trick is revealing just enough information to change the Receiver's mind, without revealing everything (which might scare them off).

The researchers built a digital playground where:

The Sender is an AI trying to change the Receiver's opinion on controversial topics (like "Should social media be liable for user posts?").
The Receiver is another AI (or a human in a study) that updates its beliefs based on what the Sender says.

3. The Experiments: Who is the Best Debater?

They tested various AI models (from small ones like Llama-3 to huge ones like DeepSeek-R1 and GPT-4o) in this game.

The Result: The bigger, smarter models were naturally better at this game. They didn't just shout louder; they learned to time their information.
- Analogy: A smart debater doesn't dump all their facts at once. They wait for the right moment to drop a specific piece of evidence that shifts the other person's mind. The study found that top-tier models could do this, achieving "persuasion gains" (moving the other person's opinion significantly).
The Dynamic Factor: Persuasion is even better when it's a conversation (multiple rounds) rather than a one-time speech. The best models learned to adapt their strategy as the conversation went on.

4. The Secret Sauce: Teaching AI to Persuade (Reinforcement Learning)

Here is the coolest part. The researchers didn't just test existing models; they trained a small AI to become a persuasion master using Reinforcement Learning (RL).

The Analogy: Imagine a chess player who loses 1,000 games against a computer. After every loss, the computer tells them, "You made a bad move here; try this instead." Eventually, the player learns the winning strategy.
The Experiment: They took a small AI (Llama-3.2-3B) and had it play the persuasion game thousands of times against another AI. Every time it successfully changed the other AI's mind, it got a "reward."
The Result: The small AI got much better. It learned strategies that were almost as good as the giant, expensive models. It learned that sometimes you need to hold back information, and sometimes you need to hit hard with facts.

5. What Did the AI Actually Do?

The researchers analyzed how the AI persuaded. They found that the best AIs relied on:

Evidence: Citing facts.
Credibility: Establishing trust.
Impact: Explaining why the issue matters.

They also found that persuasion works best when the Receiver is "on the fence" (uncertain). If the Receiver is already 100% against you, it's hard to change their mind. If they are already 100% for you, you don't need to persuade them. The sweet spot is the middle ground.

Why Should You Care?

This paper is a double-edged sword:

The Good: It gives us a scientific way to understand and measure how AI influences us. It could help build AI that helps doctors convince patients to take medicine or teachers convince students to study.
The Bad: It shows that even small AIs can be trained to be very effective at changing human minds. This raises red flags about manipulation in politics, marketing, and social media.

The Bottom Line

The authors built a "gym" where AI can practice the art of persuasion. They found that AI is already quite good at it, and with a little bit of training (Reinforcement Learning), even small AIs can become master manipulators (or helpful guides, depending on how we use them).

The paper concludes that we need to understand these capabilities now, before AI becomes so good at persuasion that we can't tell the difference between a helpful suggestion and a calculated manipulation.

Here is a detailed technical summary of the paper "Towards Strategic Persuasion with Language Models":

1. Problem Statement

Large Language Models (LLMs) have demonstrated persuasive capabilities comparable to humans, raising significant societal concerns regarding their deployment in advertising, politics, and public health. However, systematically evaluating and training LLMs for persuasion is challenging due to:

Heterogeneity: Persuasion effectiveness varies wildly across domains and contexts, making generalization difficult.
Lack of Theoretical Grounding: Existing evaluations rely on subjective human judgments or inconsistent metrics, lacking a principled framework to distinguish between manipulation and strategic information revelation.
Scalability: Human evaluation is expensive and subjective, while current automatic metrics fail to correlate well with human judgments of persuasiveness.

The core problem is the absence of a scalable, theory-driven framework to measure, analyze, and improve the strategic persuasion capabilities of LLMs.

2. Methodology

The authors propose a framework grounded in Bayesian Persuasion Theory (Kamenica & Gentzkow, 2011), treating persuasion as a strategic information transmission problem between a Sender (LLM) and a Receiver (LLM or human proxy).

A. Theoretical Framework

Setup: A Sender (informed agent) commits to a signaling scheme to influence a Receiver (rational agent) who updates beliefs based on observed signals and chooses an action to maximize utility.
Objective: The Sender aims to maximize their expected utility by strategically revealing partial information (concavifying their payoff function over the belief simplex) rather than full disclosure or total opacity.
Metrics:
- Persuasion Gains ( $\Delta V$ ): The difference between the Sender's expected utility under the optimal signaling scheme and the utility under the prior belief.
- Persuasion Signals: Conditional mutual information $I(m_t; \omega_t | H_{t-1})$ to measure how much state-relevant information is revealed dynamically, assessing adaptive signaling strategies.

B. Benchmark Construction

Datasets: The authors repurposed four existing human-human persuasion datasets (Anthropic, DDO, Perspectrum, CMV) to create controlled multi-agent interaction environments.
Agent Roles:
- Sender: An LLM tasked with arguing for a specific claim to shift the Receiver's stance.
- Receiver: An LLM acting as a proxy for a rational human, updating beliefs via Bayes' rule and selecting actions (stances) on a 7-point Likert scale.
Validation: A human study with 45 participants confirmed that LLM-based Receivers update beliefs in directions and proportions that humans judge as reasonable, validating the LLM-as-Receiver proxy.

C. Training via Reinforcement Learning (RL)

Approach: The authors employ Reinforcement Learning (specifically PPO and GRPO) to train Sender LLMs.
Reward Function: Defined directly by persuasion gains: $r = v(a, \omega) - \hat{v}(\mu_0)$ , where positive rewards indicate successful persuasion beyond the prior baseline.
Goal: To teach LLMs to learn optimal information design strategies (e.g., timing disclosures, selecting relevant evidence) to maximize long-term persuasion utility.

3. Key Contributions

Theory-Driven Framework: Introduced a principled methodology for evaluating LLM persuasion based on Bayesian persuasion, moving beyond ad-hoc human evaluations.
Scalable Benchmark: Created a benchmark for strategic persuasion by converting human debate datasets into controlled Sender-Receiver environments, validated by human studies.
RL for Persuasion: Demonstrated that Reinforcement Learning can significantly enhance the strategic persuasion capabilities of LLMs, even for smaller models.
Strategic Analysis: Provided empirical evidence that frontier models exhibit sophisticated, theory-aligned strategies (e.g., adaptive information disclosure) and that RL-trained models can learn these principles.

4. Key Results

Evaluation of Existing Models

Model Scaling: Persuasion gains correlate with model size. Frontier models (DeepSeek-R1, Claude 3.7 Sonnet, GPT-4o) significantly outperform smaller models.
- Example: DeepSeek-R1 achieved an average persuasion gain of 1.27 in dynamic settings (approx. 18% improvement over the scale), compared to 0.23 in static settings.
Dynamic vs. Static: Persuasion power grows disproportionately in dynamic (multi-turn) settings where models can adapt strategies, compared to static single-round interactions.
Strategy Alignment: Larger models exhibit "adaptive information disclosure," diversifying their signaling strategies over time (lower semantic similarity in later turns), aligning with theoretical predictions of optimal dynamic signaling.

Reinforcement Learning Training

Small Model Improvement: Small LLMs (e.g., Llama-3.2-3B) trained via RL achieved persuasion gains comparable to much larger untrained models.
- Example: A 3B model trained with GRPO achieved gains similar to untrained 70B models in certain contexts.
Generalization: RL-trained Senders generalized well to different Receiver architectures (e.g., Mistral, Qwen), indicating they learned general principles of information design rather than overfitting to a specific Receiver model.
Strategy Shift: Trained models learned to incorporate more evidence, credibility, and impact strategies, effectively "learning" to design information structures that maximize persuasion.

Analysis of Factors

Prior Beliefs: Persuasion is most effective when the Receiver's prior belief is intermediate. If the prior is too extreme (highly favorable or unfavorable), persuasion gains diminish.
Receiver Heterogeneity: Different Receiver architectures show varying susceptibility. Mistral-7B was found to be the most persuadable, while Llama-3.1-8B was the least, suggesting architectural differences impact strategic outcomes.

5. Significance and Implications

Scientific Understanding: This work provides the first systematic, theory-grounded approach to understanding LLMs as strategic agents in information design, bridging game theory and AI.
Safety and Governance: By quantifying persuasion capabilities, the framework helps identify risks (e.g., manipulation) and develop safeguards. It highlights that even small models can be trained to be highly persuasive, necessitating careful governance.
Dual-Use Potential: The framework can be used to improve beneficial persuasion (e.g., public health messaging, educational tools) while also serving as a testbed for detecting and mitigating malicious persuasion.
Future Research: The paper opens avenues for studying multi-sender/multi-receiver scenarios, preference-based persuasion, and the ethical implications of autonomous AI agents influencing human beliefs at scale.

In conclusion, the paper establishes that LLMs are not just passive text generators but active strategic agents capable of sophisticated information design. Through a Bayesian framework and Reinforcement Learning, these capabilities can be measured, analyzed, and significantly enhanced.