GLUScope: A Tool for Analyzing GLU Neurons in Transformer Language Models

Imagine you have a giant, incredibly complex robot brain (a Large Language Model) that writes stories, answers questions, and solves problems. For a long time, scientists trying to understand how this brain works have been looking at its tiny internal switches, called neurons.

Think of a neuron like a light switch in a house. In the old days, these switches were simple: they were either ON (positive) or OFF (zero). If a switch was ON, it meant the neuron was excited about a specific word or idea. Tools like "Neuroscope" were like flashlights that helped researchers find the brightest ON switches to see what the robot was thinking.

The Problem: The Switches Got Complicated

Recently, engineers upgraded the robot's brain. They replaced the simple ON/OFF switches with smart, two-part switches (called GLU neurons).

Imagine a smart switch that doesn't just have a single "On" button. Instead, it has:

A Gate (like a security guard checking a list).
An Input (the actual message being delivered).

For the message to get through, both parts need to work together. But here's the twist: The Gate and the Input can each be Positive (helpful) or Negative (blocking).

This creates four different scenarios for how a single neuron can behave:

Gate Open (+), Message Positive (+): The neuron is fully excited. (The old tools could see this).
Gate Open (+), Message Negative (-): The neuron is excited but delivering a "stop" signal.
Gate Closed (-), Message Positive (+): The neuron is trying to speak, but the gate is shut.
Gate Closed (-), Message Negative (-): The neuron is actively suppressing something.

The old tools were like blindfolds. They only looked for the "fully excited" (Scenario 1) switches. They missed the other three scenarios, which turned out to be doing very different, important jobs.

The Solution: GLUScope

The authors of this paper built a new tool called GLUScope. Think of GLUScope as a high-tech microscope with four different colored lenses.

Instead of just asking, "When is this neuron ON?", GLUScope asks:

"When is the Gate Open and the Message Positive?"
"When is the Gate Open but the Message Negative?"
And so on for all four combinations.

For every single neuron, GLUScope shows researchers:

A Dashboard: A chart showing how often each of the four scenarios happens.
Real Examples: Actual sentences from the training data that triggered each specific scenario.

A Real-Life Detective Story

The paper gives a great example of how this new tool solved a mystery that the old tools couldn't.

Researchers found a specific neuron that seemed to be related to the word "again."

The Old Way: If they used the old tools, they would only see the neuron firing when "again" was a good guess for the next word. They would think, "Ah, this neuron just loves the word 'again'!"
The GLUScope Way: When they looked through the four lenses, they discovered something surprising.
- Most of the time, the neuron was actually firing in the "Gate Open, Message Negative" mode. It was saying, "Stop! Don't use 'again' right now!"
- But in a rare, specific scenario (Gate Closed, Message Negative), the neuron was quietly whispering, "Actually, 'again' is the perfect word here."

Without GLUScope, researchers would have missed the "whisper" because it wasn't the loudest signal. They would have completely misunderstood what the neuron was doing.

Why This Matters

Just like a mechanic needs to understand that a car engine has different modes (idle, acceleration, braking) rather than just "on" or "off," AI researchers need to understand these four modes of neurons to truly understand how AI thinks.

GLUScope is the first tool to give researchers the map they need to navigate this complex, four-way traffic of modern AI brains, helping them figure out exactly what these digital neurons are trying to say.

1. Problem Statement

While significant progress has been made in the mechanistic interpretability of Large Language Models (LLMs), existing tools for analyzing individual neurons are largely outdated regarding modern model architectures.

The Gap: Most existing tools (e.g., Neuroscope, Transformer Debugger) assume "vanilla" activation functions like ReLU, GELU, or Swish. These functions are typically unidirectional or smooth approximations where positive activation is the primary signal of interest.
The Challenge: Modern LLMs (e.g., Llama, OLMo, Gemma) predominantly use Gated Linear Unit (GLU) variants, such as SwiGLU and GEGLU. In these architectures, a neuron's output is determined by the element-wise multiplication of two distinct pathways: a "gate" ( $x_{gate}$ ) and an "input" ( $x_{in}$ ).
The Complexity: Unlike vanilla neurons, GLU neurons can produce four distinct sign combinations based on the signs of $x_{gate}$ $x_{g a t e}$ and $x_{in}$ $x_{in}$ :
1. Gate Positive / Input Positive ( $+,+$ )
2. Gate Positive / Input Negative ( $+,-$ )
3. Gate Negative / Input Positive ( $-,+$ )
4. Gate Negative / Input Negative ($-,-$)
  Each combination can yield vastly different functional behaviors and activation patterns. Traditional tools that only track the strongest overall activations (usually positive) fail to capture the nuanced, and often highly interpretable, behaviors found in the other three combinations.

2. Methodology

The authors developed GLUScope, an open-source ecosystem designed specifically to handle the complexity of gated activation functions. The methodology involves three core artifacts:

A. Data Collection & Preprocessing

Model: The authors utilized OLMo-7B-0424, chosen because its training dataset is public, allowing for reproducible analysis.
Dataset: A subset of Dolma (approx. 20M tokens) was used. The data was tokenized with specific constraints (max 1024 tokens per example, first token as EOS) to ensure lightweight processing.
Activation Recording: Instead of just recording the final output, the system records intermediate values for every neuron across the four sign combinations:
- $x_{gate}$ (pre-activation gate)
- $x_{in}$ (pre-activation input)
- $Swish(x_{gate})$
- Final Output ( $Swish(x_{gate}) \cdot x_{in}$ )
Granularity: For each sign combination, the system tracks the frequency of occurrence and the top- $k$ (set to 16) strongest activations, along with their specific dataset indices.

B. The GLUScope Website (Visualization)

Interface: A web-based tool that visualizes data for selected neurons.
Structure: Each neuron page is divided into:
1. Summary Statistics: A table displaying the frequency and statistical properties (min, max, average) of the four sign combinations.
2. Text Examples: For each of the four combinations, the tool displays the top 16 text examples where that specific combination occurred. Crucially, it highlights the tokens where the activation happened and allows users to inspect intermediate activation values.

C. Reproducibility

The authors released the code, the processed activation dataset, and the Dolma subset on GitHub and Hugging Face, enabling researchers to reproduce the analysis or generate similar datasets for other models.

3. Key Contributions

First GLU-Specific Tool: GLUScope is the first tool explicitly designed to visualize and analyze neurons in models using gated activation functions, addressing a critical gap in the interpretability landscape.
Four-Way Sign Decomposition: It introduces a methodology that treats the four sign combinations ( $+,+$ ; $+,-$ ; $-,+$ ; $-,-$) as distinct functional states rather than noise, recognizing that "negative" activations in gated models can be highly meaningful.
Open-Source Ecosystem: The release of a pre-computed activation dataset for OLMo-7B and the associated visualization code lowers the barrier to entry for neuron analysis research.

4. Results & Usage Examples

The paper demonstrates the tool's utility through two specific case studies:

Case Study 1: Model-Wide Correlation Analysis
- Using the activation dataset, the authors computed the correlation between the cosine similarity of a neuron's input and output weights ( $\cos(w_{in}, w_{out})$ ) and the frequency of positive gate activations ( $x_{gate} > 0$ ).
- Finding: A strong negative correlation was discovered, suggesting that neurons with specific weight geometries tend to gate differently. This type of large-scale statistical analysis is enabled by the structured dataset.
Case Study 2: Deep Dive into Neuron 31.9634
- Context: Weight analysis suggested this neuron relates to the token "again" (via $w_{out}$ ) and "minus again" (via $w_{gate}, w_{in}$ ).
- Traditional Approach Failure: A standard tool looking only at the strongest positive activations would see weak or unrelated patterns (e.g., tokens like "door" or "volcanoes").
- GLUScope Insight: By isolating the Gate Negative / Input Negative ($-,-$) combination (which occurred only 17.34% of the time), the authors found a highly interpretable pattern:
  - The strongest activations in this specific category occurred on tokens like "once" (as in "once again").
  - In these contexts, "again" was a plausible next token, and the neuron acted to boost its probability.
- Conclusion: The most semantically meaningful behavior of this neuron was hidden in a "negative" activation regime that traditional tools would have filtered out or missed entirely.

5. Significance

Paradigm Shift in Interpretability: The paper argues that the "positive activation = feature detection" heuristic is insufficient for modern LLMs. GLUScope demonstrates that negative or mixed-sign activations in GLU models can carry the most distinct semantic information.
Enabling New Research: By providing a structured way to query specific activation regimes, the tool allows researchers to uncover features that are sparse or conditional on specific gating states.
Community Resource: The release of the dataset and code encourages the community to move beyond vanilla activation analysis, fostering a new wave of interpretability research tailored to the architecture of state-of-the-art models (Llama, Gemma, etc.).

Limitations

The tool is currently limited to Transformer models with gated activations and cannot directly analyze Mixture of Experts (MoE) models (like DeepSeek) or non-Transformer architectures (like Mamba).
It focuses on individual neurons rather than Sparse Autoencoder (SAE) features, though the authors acknowledge SAEs as a complementary approach.