HumanLM: Simulating Users with State Alignment Beats Response Imitation

The paper proposes HumanLM, a novel training framework that improves user simulation by aligning psychologically grounded latent states with ground-truth responses via reinforcement learning, outperforming existing response-imitation methods on the comprehensive Humanual benchmark and in real-time human evaluations.

Shirley Wu, Evelyn Choi, Arpandeep Khatua, Zhanghan Wang, Joy He-Yueya, Tharindu Cyril Weerasooriya, Wei Wei, Diyi Yang, Jure Leskovec, James Zou

Published 2026-03-05
📖 4 min read☕ Coffee break read

Imagine you are trying to teach a robot to act like a specific person, let's call him "Bob." Bob is a grumpy, sarcastic, but deeply caring guy who loves politics and hates when people are treated unfairly.

The Old Way (Response Imitation):
Most AI models today try to learn how to be Bob by reading thousands of his old text messages and trying to copy his exact words. It's like a student trying to pass a test by memorizing the answer key.

  • The Problem: If the teacher asks a new question, the student panics. They might use Bob's favorite slang words (like "ugh" or "lol") but get the point completely wrong. They might say something cheerful when Bob would actually be furious. They are copying the surface (the paint) but missing the soul (the canvas underneath).

The New Way (HUMANLM):
The researchers behind this paper, HUMANLM, realized that to truly simulate a person, you don't need to memorize their words. You need to understand their brain state before they speak.

Think of it like an actor preparing for a role.

  • Old Method: The actor memorizes the script line-by-line.
  • HUMANLM Method: The actor asks, "What is my character feeling right now? What do they believe? What is their goal?"

How HUMANLM Works (The "State Alignment" Secret Sauce)

The paper introduces a framework where the AI doesn't just guess the answer; it first writes down its "thought process" based on six psychological dimensions:

  1. Belief: What does this person think is true? (e.g., "The government is lying.")
  2. Goal: What are they trying to achieve? (e.g., "I want to shame the politician.")
  3. Value: What matters to them? (e.g., "Honesty is more important than politeness.")
  4. Stance: Do they agree or disagree? (e.g., "Strongly disagree.")
  5. Emotion: How do they feel? (e.g., "Heartbroken but angry.")
  6. Communication: How do they say it? (e.g., "Sarcastic and direct.")

The Training Process:
Imagine a coach (the AI Judge) watching the robot try to act like Bob.

  1. The robot generates a "thought trace" (the 6 states above).
  2. The coach checks: "Does this robot's anger match Bob's actual anger? Does its belief match Bob's?"
  3. If the robot gets the feelings right but the words slightly off, the coach gives a high score.
  4. If the robot uses the perfect words but gets the feelings wrong (e.g., sounding happy when Bob is sad), the coach gives a low score.

The robot learns to maximize these "State Alignment Scores." It learns that getting the internal state right is more important than getting the exact words right.

The "HUMANUAL" Benchmark

To prove this works, the team built a massive testing ground called HUMANUAL.

  • Think of this as a giant "Turing Test" arena.
  • They gathered 26,000 real people and 216,000 real responses from places like Reddit, Amazon reviews, and political blogs.
  • They tested the AI on everything: news comments, book reviews, and even angry emails.

The Results: Why It Matters

When they put HUMANLM to the test against other AI models:

  • The Imitators: The old models sounded like robots trying to be human. They used emojis and slang but often missed the point or sounded fake.
  • HUMANLM: It sounded like a real human. In a study with 111 real people, 41.4% of participants said HUMANLM's response was "mostly similar" or "nearly identical" to what they would have written.
  • The "Humanlikeness" Score: 76.6% of people thought HUMANLM sounded "quite natural" or "indistinguishable from a human."

The Big Picture

The paper argues that if we want AI to help us (like helping politicians understand voters, or helping companies design better products), we can't just train AI to mimic us. We have to train AI to understand us.

In a nutshell:

  • Old AI: "I will say 'Wow, that's terrible!' because Bob said that 50 times."
  • HUMANLM: "Bob is feeling heartbroken about the fire, he believes the government is negligent, and he wants to express his anger sarcastically. Therefore, I will say: 'Oh, great plan, let's cut the fire budget while people lose their homes. Brilliant.'"

By focusing on the internal state (the "why" and "how" of a person's mind) rather than just the output (the "what"), HUMANLM creates a simulation that feels genuinely human, not just a cheap copy.