Boosting In-Context Learning in LLMs Through the Lens of Classical Supervised Learning

This paper introduces Supervised Calibration (SC), a loss-minimization framework that enhances In-Context Learning in Large Language Models by learning optimal per-class affine transformations to correct systematic biases and alter decision boundary orientations, thereby achieving state-of-the-art performance across multiple models and datasets.

Korel Gundem, Juncheng Dong, Dennis Zhang, Vahid Tarokh, Zhengling Qi

Published 2026-03-05
📖 5 min read🧠 Deep dive

Imagine you have a brilliant, well-read librarian (the Large Language Model or LLM) who has never been to your specific town before. You want them to help you sort a pile of letters into "Happy," "Sad," or "Angry" categories.

To teach them, you don't give them a textbook. Instead, you show them just a few examples right on the spot: "Here is a letter about a puppy, it's Happy. Here is one about a broken toy, it's Sad." This is called In-Context Learning (ICL). The librarian is smart enough to guess the pattern and sort the rest of the letters.

However, there's a problem. Sometimes, the librarian gets confused by how you showed them the examples. Maybe they got too excited about the "Happy" examples and started calling everything "Happy," even the sad ones. Or maybe they got confused by the order you showed the letters. Their predictions become biased and unstable.

The Old Way: Just Moving the Goalpost

Previously, researchers tried to fix this by using a technique called Label Marginal Calibration.

The Analogy: Imagine the librarian is standing at a finish line (the decision boundary) trying to catch the letters. If they are catching too many "Happy" letters, the old method simply tells them: "Hey, move the finish line a little to the left so you catch fewer Happy letters."

The Flaw: This only works if the librarian is mostly right but just a little too eager. But what if the librarian is completely wrong? What if they think "Sad" is actually "Happy"? Simply moving the line won't help. They need to turn around and face the other way. The old methods couldn't do that; they could only shift the line, not flip the librarian's entire perspective.

The New Way: Supervised Calibration (SC)

This paper introduces a new method called Supervised Calibration (SC). Instead of just telling the librarian to move the line, SC acts like a smart coach who re-trains the librarian's brain using the examples you already gave them.

Here is how it works, broken down into simple steps:

1. The "Surrogate" Practice Game

The coach realizes they can't ask the librarian for outside help (no new data allowed). So, they create a practice game using the examples you already provided.

  • They take the examples you gave, hide one, and ask the librarian to guess it using the other examples.
  • Since the librarian knows the answer (because it was in your original list), they can check if their guess was right.
  • This creates a mini-dataset of "Guess vs. Reality" right on the spot.

2. The "Flip and Tilt" Adjustment

Now, the coach looks at the librarian's mistakes.

  • The Shift: If the librarian is too eager, the coach adjusts the baseline (the "bias").
  • The Flip (The Magic Part): If the librarian is completely backwards (thinking Sad = Happy), the coach can flip the decision boundary. They can say, "Actually, for this specific task, when the signal is high, it means 'Sad', not 'Happy'."
  • The Scale: They can also stretch or shrink the librarian's confidence. If the librarian is overconfident, the coach tells them to be more humble.

This is like having a coach who can not only move the goalpost but also tell the player to run in the opposite direction if they are running the wrong way!

3. The Safety Nets (Regularization)

Since the librarian is only seeing a few examples, they might get too confident in their new, weird rules. To prevent this, the coach adds two safety rules:

  • Context Invariance: The coach checks: "Does the librarian give the same answer if we shuffle the order of the examples?" If the answer changes wildly, the coach says, "Calm down, be consistent."
  • Trust Region: The coach says, "Don't change your mind too drastically unless you are sure." This prevents the librarian from overreacting to a single weird example.

Why This Matters

The paper tested this "Coach" (SC) on three different types of librarians (LLMs) and nine different sorting tasks.

  • The Result: The SC method consistently outperformed the old methods.
  • The "Wow" Moment: On a difficult task called SST-5 (sorting movie reviews into 5 categories), the old librarians were only getting about 25% right. The SC method boosted this to 44%.
  • How? In that specific case, the librarian was so confused that it was essentially guessing backwards. The SC method realized this, flipped the decision boundary, and suddenly the librarian started getting it right.

Summary

Think of In-Context Learning as asking a smart friend to help you with a new game by showing them a few examples.

  • Old Fix: "Hey, you're guessing too much 'A', guess 'B' a bit more." (Only shifts the bias).
  • New Fix (SC): "Wait, you're actually playing the game backwards! Let's flip your strategy, adjust your confidence, and make sure you aren't getting confused by the order of the cards."

This new framework allows AI models to be much more robust, stable, and accurate, even when they are given very few examples to learn from. It turns a "guessing game" into a "principled strategy."