Enhancing Instruction Following of LLMs via Activation Steering with Dynamic Rejection

The paper introduces DIRECTER, a novel activation steering method that dynamically modulates steering strength through a plausibility-guided decoding loop and layer sensitivity analysis to enhance LLM instruction-following accuracy while preventing the oversteering that typically degrades text quality.

Minjae Kang, Jaehyung Kim

Published 2026-03-10
📖 4 min read☕ Coffee break read

Imagine you have a brilliant, well-read librarian (the Large Language Model or LLM) who knows everything in the world. You ask them to write a story about a journey to Japan, but with a very specific rule: "Do not use the letter 'e'."

If you just ask nicely, the librarian might forget the rule halfway through because they are so used to writing normally. This is the problem of Instruction Following.

The Old Way: The "Over-Enthusiastic" Coach

Previous methods tried to fix this by hiring a coach who stands next to the librarian and shouts, "DON'T USE 'E'! DON'T USE 'E'!" constantly.

  • The Problem: Sometimes the coach shouts too loud. The librarian gets so focused on not using 'e' that they forget how to write a sentence. They might start speaking in gibberish, or they might stop writing the story about Japan entirely and just scream "NO 'E'!" over and over.
  • The Term: This is called Oversteering. The model is so busy following the rule that it breaks the task.

The New Way: DIRECTER (The "Smart, Adaptive Coach")

The authors of this paper created a new method called DIRECTER. Think of DIRECTER not as a shouting coach, but as a smart, adaptive traffic controller who guides the librarian step-by-step.

Here is how it works, using three simple concepts:

1. The "Plausibility Check" (The Reality Test)

Every time the librarian is about to write the next word, DIRECTER does a quick mental simulation:

  • Step A: It asks, "What would the librarian write if I didn't interfere?" (The Raw Plan).
  • Step B: It asks, "What would they write if I did force them to follow the rule?" (The Steered Plan).

Then, it compares the two.

  • Scenario 1 (Good): The librarian was going to write "cat," and the rule forces them to write "feline." Both make sense. DIRECTER says, "Go ahead, write 'feline'!"
  • Scenario 2 (Bad): The librarian was going to write "The sun is bright," but the rule forces them to write "The sun is brrr." That sounds weird and breaks the story. DIRECTER says, "Wait, that sounds wrong. Let's ignore the rule for this specific word and just write 'bright'."

This prevents the model from going off the rails. It only applies the rule when it makes sense.

2. The "Dimmer Switch" (Dynamic Strength)

Old methods used a "light switch"—either the rule was ON (100% force) or OFF. DIRECTER uses a dimmer switch.

If the librarian starts to struggle, DIRECTER doesn't just turn the rule off completely. It slowly turns the "force" down.

  • If the librarian is confident, DIRECTER turns the force up high.
  • If the librarian starts to stumble, DIRECTER turns the force down slightly.
  • If they are about to make a mistake, DIRECTER turns the force all the way down and lets the librarian write naturally.

This happens dynamically for every single word, not just once at the beginning.

3. The "Layer Ranking" (Finding the Right Levers)

Inside the librarian's brain (the computer model), there are many different "layers" of thinking. Some layers handle grammar, some handle facts, and some handle style.

DIRECTER does a quick, one-time test at the start to figure out: "Which specific layers of the brain are most sensitive to this specific rule?"

  • It creates a ranked list of these layers.
  • When it needs to apply the rule, it starts with the most sensitive layers.
  • If that's too much, it drops the least important layers from the list.

This is like a mechanic who knows exactly which screw to turn to fix a car engine, rather than randomly hitting the dashboard.

Why Is This a Big Deal?

  • No Extra Training: You don't need to re-teach the librarian. You just give them this new "traffic controller" tool.
  • Better Quality: Because it checks for "plausibility," the stories don't sound broken or robotic. They sound natural but still follow the rules.
  • Works Everywhere: It works on math problems, creative writing, and strict formatting rules (like "no commas").

The Bottom Line

DIRECTER is like having a smart co-pilot for an AI. Instead of forcing the AI to follow rules blindly (which causes crashes), the co-pilot gently nudges the AI toward the rule, checks if the nudge makes sense, and backs off if it doesn't. The result is an AI that listens to you perfectly without losing its mind.