CapTrack: Multifaceted Evaluation of Forgetting in LLM Post-Training

This paper introduces CapTrack, a capability-centric framework that redefines LLM forgetting as systematic behavioral drift rather than mere knowledge loss, revealing through a large-scale study that post-training significantly degrades robustness and default behaviors, with instruction fine-tuning causing the most pronounced effects.

Lukas Thede, Stefan Winzeck, Zeynep Akata, Jonathan Richard Schwarz

Published 2026-03-10
📖 5 min read🧠 Deep dive

Imagine you have a brilliant, well-read librarian named LLM (Large Language Model). Before you hire them for a specific job, they have already read almost everything in the world. They know facts, can write code, speak many languages, and tell jokes. They are your "Out-of-the-Box" (OOB) librarian.

Now, you want to hire this librarian to work specifically in a Law Firm or a Hospital. To do this, you give them a crash course (called "Post-Training") to teach them the specific jargon and rules of that field.

The Problem:
Usually, when we check if the training worked, we ask: "Do they know the new legal cases?" or "Can they diagnose this disease?" If they say "Yes," we think the training was a success.

But this paper argues that this is like checking if a chef learned to make sushi by only asking, "Can you cut the fish?" You might forget to ask: "Did they forget how to bake a cake?" or "Did they suddenly become rude to customers?" or "Do they now refuse to answer simple questions?"

The authors call this "Forgetting." And they argue that in the past, we only measured forgetting as "losing facts." But in reality, the librarian might lose their personality, their politeness, or their ability to follow complex instructions, even if they still know the facts.

Enter: CapTrack (The "Capability Tracker")

To fix this, the authors built a new tool called CapTrack. Think of CapTrack as a 360-degree performance review for the librarian, divided into three categories:

  1. CAN (What they can do): This is their raw brainpower. Can they still solve math problems? Do they know history? Can they write code?
    • Analogy: Can the librarian still read a complex book?
  2. WILL (What they want to do): This is their default attitude. Are they helpful? Do they talk too much or too little? Do they refuse to answer safe questions?
    • Analogy: Is the librarian now grumpy? Do they give short, rude answers? Do they refuse to help with simple tasks?
  3. HOW (How they do it): This is their ability to follow rules. If you ask for a list in bullet points, do they do it? If you ask for a citation, do they format it correctly?
    • Analogy: If you ask the librarian to "write the report in blue ink," do they actually use blue ink, or do they ignore you?

What They Discovered

The team took several famous librarians (models like LLaMA, Gemma, and Qwen) and gave them crash courses in Law and Medicine. Then, they used CapTrack to see what changed. Here are the big findings:

1. The "Specialist" Curse
When they trained the librarians to be lawyers, the librarians got really good at law. But they started forgetting other things.

  • The Surprise: They didn't just forget facts. They forgot how to be robust. If you asked a question in a slightly different way, they got confused. They also forgot how to speak other languages well.
  • The Metaphor: It's like a chef who learns to make perfect sushi but forgets how to boil water or how to be polite to a customer. They are a "specialist" but a "bad employee."

2. The Training Method Matters
They tried two different ways to train the librarians:

  • Method A (Instruction Fine-Tuning / IFT): This is like a strict teacher shouting, "Do exactly what I say! Repeat after me!"
    • Result: The librarian learned the job fast, but they became rude, short, and stubborn. They forgot how to be helpful and started refusing to answer questions they used to answer easily.
  • Method B (Preference Optimization / DPO): This is like a coach saying, "Here are two answers; pick the better one."
    • Result: This was much gentler. The librarian learned the job but kept their personality and politeness. In fact, if you used Method A first and then Method B, the librarian actually recovered some of their lost kindness and helpfulness.

3. Bigger Isn't Always Better
You might think a bigger librarian (a model with more "brain power") would forget less.

  • Result: Nope. A giant 80-billion-parameter model forgot just as much as a smaller one. The size of the model didn't protect them from losing their "soft skills."

4. The "No Free Lunch" Rule
The authors tried to fix the forgetting by:

  • Mixing in more general data (like reading a newspaper while studying law).
  • Using architectural tricks (like "merging" the new model with the old one).
  • Result: It's a trade-off. You can keep the librarian's old personality (Stability), but then they won't learn the new job as well (Plasticity). Or, you can have them learn the new job perfectly, but they lose their old personality. You can't have both perfectly.

The Big Takeaway

The paper tells us to stop just asking, "Is the model smart?" and start asking, "Is the model still a good, reliable, polite, and rule-following assistant?"

If you train an AI to be a doctor, you don't just want it to know medicine; you want it to still be able to listen, follow instructions, and not suddenly decide to refuse to talk to you. CapTrack is the new checklist to make sure we aren't trading our AI's soul for a little bit of extra knowledge.