On Distinguishing Capability Elicitation from Capability Creation in Post-Training: A Free-Energy Perspective

This paper proposes a free-energy framework to distinguish between capability elicitation, which reweights existing behaviors within a model's accessible support, and capability creation, which expands that support through mechanisms like search or tool use, arguing that this distinction is more critical than the traditional SFT versus RL dichotomy in post-training.

Original authors: Yuhao Li, Shengchao Liu

Published 2026-05-12
📖 6 min read🧠 Deep dive

Original authors: Yuhao Li, Shengchao Liu

Original paper licensed under CC BY 4.0 (http://creativecommons.org/licenses/by/4.0/). This is an AI-generated explanation of the paper below. It is not written or endorsed by the authors. For technical accuracy, refer to the original paper. Read full disclaimer

The Big Question: Did We Teach the Model, or Did We Just Wake It Up?

Imagine you have a very talented but slightly confused musician (the AI model) who has practiced for years on their own (pre-training). Now, you want to teach them a new song.

There is a big debate in the AI world about how we teach them.

  • Method A (SFT): You play them a recording of a perfect performance and say, "Copy this exactly."
  • Method B (RL): You let them play, and every time they hit a good note, you give them a treat. Every time they hit a bad note, you don't.

The common belief is: Method A just makes them imitate what they already know (Imitation), while Method B helps them discover new, amazing things they never knew they could do (Discovery).

The authors of this paper say: "Stop. That distinction is too simple."

They argue that the real question isn't how you teach (copying vs. rewards), but what you are actually teaching. Did you just help the musician play a song they were already capable of but kept messing up? Or did you actually give them the ability to play a song they physically couldn't play before?

They call these two things:

  1. Capability Elicitation: Waking up a skill that was already there but sleeping.
  2. Capability Creation: Giving the musician a brand new skill they didn't have.

The "Energy Landscape" Analogy

To explain this, the authors use a physics concept called Free Energy. Imagine the musician's mind is a hilly landscape.

  • The Valleys (Basins): These are the easy songs the musician plays naturally. They are deep, comfortable, and easy to fall into.
  • The Hills (Tails): These are songs the musician could play, but they are very high up. It takes a lot of effort (or a lot of tries) to get there.
  • The Walls (Barriers): These are songs separated by a massive, unclimbable wall. The musician cannot reach them just by walking around; they need a ladder or a bridge.
  • The Other Side of the World (Unsupported): These are songs that simply don't exist in the musician's universe yet.

How Training Works on This Map

Both "Copying" (SFT) and "Rewards" (RL) work by tilting the landscape.

  • If you give a reward for a song in a Valley, the valley gets deeper. The musician plays that song more often.
  • If you give a reward for a song on a Hill, the hill gets a ramp. The musician can now climb up to that song more easily.

The Crucial Point:
If the song was already in a Valley or on a Hill, you haven't created a new ability. You've just made an existing ability more reliable. This is Elicitation.

If the song was behind a Wall, and your training method somehow built a bridge or a ladder to get there, then you have created a new ability. This is Creation.


The Four Zones of Learning

The paper breaks down post-training into four specific scenarios based on this map:

1. The "Safe Zone" (Demonstration-Covered Elicitation)

  • The Scenario: The musician already knows the song perfectly but sometimes forgets the lyrics. You show them the sheet music (demonstrations).
  • The Result: They stop forgetting. They didn't learn a new song; they just stabilized an old one.
  • The Takeaway: Whether you use copying or rewards, if the answer was already easy to find, you are just polishing a rough gem, not creating a new one.

2. The "Hidden Gem" (Tail Reweighting)

  • The Scenario: The musician knows a complex jazz solo, but they only play it once in a million tries. It's hidden in the "Hills."
  • The Result: You use a reward system to say, "Wow, that jazz solo was great!" Suddenly, they start playing it all the time.
  • The Takeaway: It looks like magic because the performance jumped up. But the musician could have played it all along; they just needed a nudge to find it. This is still Elicitation, not creation.

3. The "Bridge Builder" (Barrier-Crossing Discovery)

  • The Scenario: The musician needs to play a song that requires a sequence of steps they've never taken together. It's behind a wall.
  • The Result: You don't just give a reward at the end. You give rewards for steps along the way, or you let them use a tool (like a ladder) to cross the gap.
  • The Takeaway: This is Capability Creation. The training didn't just tilt the hill; it changed the terrain so the musician could reach a place they were previously blocked from.

4. The "Impossible Zone" (Unsupported Regimes)

  • The Scenario: You ask the musician to play a song that requires a violin, but they only have a guitar.
  • The Result: No amount of copying or rewarding will help. The "energy" required to play that song is infinite.
  • The Takeaway: You cannot "create" a capability here with just training. You need new information, a new instrument, or a different model entirely.

Why This Matters

The paper argues that we are often confused because we look at the method (SFT vs. RL) instead of the mechanism.

  • Myth: "RL is magic because it creates new skills."

  • Reality: RL only creates new skills if it is paired with tools, search, or interaction that helps the model cross "walls." If RL is just rewarding the model for things it could already do, it's just Elicitation.

  • Myth: "SFT is weak because it just copies."

  • Reality: If the "copying" data comes from a super-smart source (like a search engine or a stronger AI), SFT can teach the model things it never knew, effectively acting as Creation.

The Bottom Line

When we see an AI get better, we shouldn't just ask, "Did they use Reinforcement Learning?"

We should ask: "Did they just make the AI better at things it could already do, or did they actually give the AI the ability to do something it couldn't do before?"

The paper suggests that most of the time, we are just waking up skills that were already there (Elicitation), and we need to be very careful before claiming we have truly invented new capabilities (Creation).

Drowning in papers in your field?

Get daily digests of the most novel papers matching your research keywords — with technical summaries, in your language.

Try Digest →