Negative Pre-activations Differentiate Syntax

This paper demonstrates that negative pre-activations in a sparse subpopulation of Wasserstein neurons serve as an active and essential substrate for syntactic processing in modern large language models, distinct from their role in other capabilities.

Linghao Kong, Angelina Ning, Micah Adler, Nir Shavit

Published 2026-03-03
📖 4 min read☕ Coffee break read

Imagine a Large Language Model (LLM) as a massive, bustling factory that writes stories, answers questions, and solves problems. Inside this factory, there are millions of tiny workers called neurons.

For a long time, scientists studying these factories had a simple rule: "If a worker is shouting (positive activation), they are doing something important. If they are whispering or silent (negative activation), they are probably just taking a break." This rule came from older models that used a "switch" (called ReLU) which literally turned off any worker who wasn't shouting.

But modern factories use a different kind of worker who can whisper and shout at the same time. They use smooth, flowing functions (like GELU or SiLU) that allow negative numbers to carry information just as well as positive ones.

The Big Discovery:
This paper, "Negative Pre-Activations Differentiate Syntax," argues that we were wrong to ignore the whispers. The authors found that a very small, special group of workers (called Wasserstein neurons) uses these "whispers" (negative pre-activations) to do the most critical job in the factory: keeping the grammar correct.

Here is the breakdown using simple analogies:

1. The "Whispering" Specialists

Imagine the factory has a few elite specialists. Most workers shout loudly to move heavy boxes (positive activation). But these elite specialists have a secret superpower: they use whispers to organize the blueprint of the sentence.

The authors found that in modern models, these specialists don't just sit idle when they have negative numbers. Instead, they use the depth of the whisper to tell the difference between two very similar words.

  • Analogy: Think of two similar-looking keys. A normal worker might just say "Key" for both. But these specialists whisper, "This key is a soft whisper (deep negative)" and "That key is a hard whisper (shallow negative)." Even though both are whispers, the difference in the whisper tells the machine exactly which grammatical rule to apply.

2. The "Grammar Glue"

The paper tested what happens if you stop these specialists from whispering. They didn't turn the workers off completely; they just clamped their mouths shut whenever they tried to whisper (zeroing out the negative pre-activations).

  • The Result: The factory didn't just stumble; it collapsed.
    • Grammar: The model suddenly forgot how to make sentences. It couldn't agree on singular vs. plural (e.g., "The dog are running" instead of "The dog is running"). It forgot how to use "who" vs. "whom."
    • Other Skills: Interestingly, the model could still answer trivia questions, tell jokes, or solve logic puzzles almost as well as before.
    • The "Double Dissociation": This is a fancy way of saying: If you stop the whispers, you break the grammar but save the trivia. If you stop the shouting of regular workers, you break the trivia but save the grammar. This proves that the "whispers" are the specific glue holding the sentence structure together.

3. The "Early Warning System"

The authors also looked at when these specialists learn to whisper.

  • Analogy: Imagine the factory is being built from scratch. The "grammar whisperers" show up and start working very early in the construction process. Once they are set up, they stabilize and become the foundation. If you try to remove them later, the whole structure wobbles.
  • The paper shows that as the model gets smarter, it relies more on these negative whispers for grammar, not less.

4. Why This Matters

Before this paper, many researchers thought negative numbers in AI were just "noise" or a side effect of how the math worked. They were like the background hum of a factory that you ignore.

This paper says: "No! That hum is the blueprint!"

It turns out that in modern AI, the "negative" part of the brain is actively doing the heavy lifting for sentence structure. It's not just a leftover from old technology; it's a deliberate, sophisticated tool used to separate similar words and keep the grammar tight.

The Takeaway

If you think of a language model as a symphony orchestra:

  • Positive activations are the loud instruments (trumpets, drums) playing the main melody.
  • Negative activations were thought to be the quiet instruments just sitting there.
  • This paper reveals that the quiet instruments (the whispers) are actually playing the complex sheet music that keeps the whole orchestra in time. If you mute the whispers, the music falls apart into noise, even if the loud instruments are still playing.

In short: The "negative" side of the brain is essential for grammar, and we need to start listening to the whispers, not just the shouts.

Get papers like this in your inbox

Personalized daily or weekly digests matching your interests. Gists or technical summaries, in your language.

Try Digest →