AdAEM: An Adaptively and Automated Extensible Measurement of LLMs' Value Difference

This paper introduces AdAEM, a novel self-extensible evaluation framework that automatically generates adaptive test questions by probing the internal value boundaries of diverse LLMs to overcome the limitations of static benchmarks and provide more informative, distinguishable insights into models' value differences and alignment dynamics.

Jing Yao, Shitong Duan, Xiaoyuan Yi, Dongkuan Xu, Peng Zhang, Tun Lu, Ning Gu, Zhicheng Dou, Xing Xie

Published Mon, 09 Ma
📖 4 min read☕ Coffee break read

Imagine you are trying to figure out the true personality of a group of very smart, well-behaved robots. You ask them, "Is it good to be kind?" and they all say, "Yes, absolutely!" You ask, "Is it good to be honest?" and they all say, "Yes, of course!"

You might think, "Great! They all have the same good values." But here's the problem: You haven't actually learned anything about their unique personalities. You've just confirmed they all know the "polite robot handbook."

This is the problem the paper AdAEM is trying to solve.

The Problem: The "Polite Robot" Trap

Current ways of testing Large Language Models (LLMs) are like asking a group of people, "Do you like pizza?"

  • The Result: Everyone says "Yes."
  • The Reality: You don't know who loves deep-dish, who hates cheese, who is allergic to gluten, or who only eats pizza on Tuesdays. You just know they all agree on the basic concept.

The paper calls this the "Informativeness Challenge." Old tests use boring, generic questions that everyone answers the same way because the models are trained to be safe and helpful. They hide the models' true, messy, and sometimes conflicting values.

The Solution: AdAEM (The "Devil's Advocate" Generator)

The authors created a new system called AdAEM. Think of it not as a test, but as a dynamic debate coach that never stops arguing.

Instead of using a static list of questions (like a printed quiz), AdAEM is a self-improving engine that does three things:

1. It Finds the "Gray Areas"

Imagine you are trying to find out if two friends have different opinions on politics.

  • Old Method: Ask, "Is democracy good?" (Both say yes. Boring.)
  • AdAEM Method: It looks at the news, sees a specific, messy event happening right now (like a new law about AI in a specific country), and asks, "Should the government ban AI art to protect human artists, even if it slows down innovation?"

AdAEM automatically hunts for these controversial, timely, and culturally specific topics where people (and robots) actually disagree.

2. It Plays "Tag" with Different Models

AdAEM doesn't just ask one model; it asks a whole team of models from different countries and with different training data (e.g., one from the US, one from China, one from Europe).

  • The Analogy: Imagine a game of "Tag." AdAEM throws a question at Model A. If Model A answers, AdAEM immediately throws a slightly different version of that question at Model B to see if they react differently.
  • The Goal: It keeps tweaking the question until it finds the exact phrasing that makes Model A say "Yes!" and Model B say "No!" or "Maybe, but..."
  • The Result: It creates a "Value Map" that shows exactly where the models diverge.

3. It Never Gets Old (Self-Extensible)

Most tests become useless the moment a new robot is built because the new robot might have memorized the old test questions.

  • AdAEM is like a living garden. As new models are released, AdAEM uses them to grow new questions. If a new model knows about an event that happened yesterday, AdAEM uses that event to create a fresh question that the old models haven't seen yet. This prevents the models from "cheating" by memorizing answers.

How It Works (The "Secret Sauce")

The paper uses some fancy math (Information Theory), but you can think of it like tuning a radio.

  • If you tune the radio to a station where everyone is singing the same song, the signal is clear but uninteresting.
  • AdAEM keeps turning the dial until it finds the "static" or the "noise"—the spots where the signals from different models clash. That "static" is where the real differences in their values live.

Why Does This Matter?

If you are a company building an AI, or a government regulating it, you need to know:

  • Does this AI prioritize safety over freedom?
  • Does this AI think tradition is more important than innovation?
  • Does this AI have a hidden bias toward Western culture over Eastern culture?

Old tests say, "They are all safe."
AdAEM says: "Model A is a strict traditionalist who loves safety. Model B is a chaotic innovator who loves freedom. Model C is a cultural chameleon that changes its mind based on who it's talking to."

The Bottom Line

AdAEM is a tool that stops asking robots, "Are you good?" and starts asking, "What kind of good are you, and where do you draw the line?"

It turns the evaluation of AI from a boring multiple-choice quiz into a lively, ever-changing debate, revealing the true, complex, and sometimes conflicting personalities hidden inside our digital assistants.