Gauge Freedom and Metric Dependence in Neural Representation Spaces

The Big Idea: The "Map vs. Territory" Problem

Imagine you have a very smart robot (a neural network) that learns to recognize pictures of cats and dogs. To do this, the robot converts every picture into a long list of numbers (a "vector") inside its brain. These lists of numbers are called representations.

Scientists love to study these lists of numbers to understand how the robot thinks. They often ask questions like:

"How similar are the numbers for a picture of a Golden Retriever and a picture of a Poodle?"
"Which pictures are the robot's 'closest neighbors'?"

To answer these, they use a ruler called Cosine Similarity. It measures the angle between two lists of numbers. If the angle is small, the robot thinks they are similar.

The Paper's Shocking Discovery:
The author, Jericho Cain, argues that the angle between two numbers doesn't actually mean anything on its own.

Why? Because the "ruler" the robot uses to measure these numbers is arbitrary. You can stretch, squash, or twist the robot's internal coordinate system without changing how it actually works. If you do this, the "angle" between a cat and a dog changes completely, even though the robot still thinks they are a cat and a dog.

It's like looking at a map. If you stretch a map of the world so that Europe looks huge and Africa looks tiny, the distance between London and Cairo changes on the paper. But the actual flight path between the two cities hasn't changed. The robot's "brain" is the flight path; the numbers are just the map.

The Core Concept: "Gauge Freedom" (The Shape-Shifting Brain)

The paper introduces a concept called Gauge Freedom.

The Analogy: The Translator and the Dictionary
Imagine a secret agent (the neural network) sending a message to headquarters.

The agent encodes a message into a code (the hidden representation).
Headquarters decodes it using a dictionary (the weights).

Now, imagine the agent decides to change their code. Instead of writing "A" for "Attack," they write "Z". But, they also tell headquarters to change their dictionary so that "Z" now means "Attack."

Result: The message received is exactly the same. The mission is unchanged. The outcome is identical.

However, if an outside observer (a scientist) looks at the codes the agent is sending before they are decoded, they see a totally different pattern.

Before the change: "A" and "B" might look very close together.
After the change: "Z" and "Q" might look far apart.

The paper proves that neural networks have this exact superpower. You can mathematically twist the internal numbers (the code) and fix the final decoder (the dictionary) to compensate. The robot's predictions remain 100% perfect, but the "geometry" of its internal thoughts looks completely different.

Why This Matters: The "Cosine Similarity" Trap

Scientists often use Cosine Similarity to measure how "close" two ideas are in the robot's mind. They assume that if the angle is small, the robot thinks the concepts are related.

The Problem:
Because of the "Gauge Freedom" (the ability to twist the code), Cosine Similarity is not a fixed truth. It depends entirely on which "twist" the robot happened to use when it was trained.

Scenario A: You train a robot. You measure the similarity between "Cat" and "Dog." It's 0.9 (very similar).
Scenario B: You take that same robot, apply a mathematical twist to its internal numbers, and fix the output. You measure the similarity again. Now it's 0.4 (not very similar).

The robot didn't learn anything new. It didn't forget anything. It still predicts "Cat" and "Dog" perfectly. But the scientist's measurement of "similarity" changed wildly.

The Metaphor:
Imagine measuring the distance between two cities using a rubber ruler.

If you stretch the ruler, the distance changes.
If you shrink the ruler, the distance changes.
But the actual distance between the cities hasn't moved.

The paper says: "Stop measuring the rubber ruler! Stop measuring the angle. It's an illusion created by your choice of coordinates."

The Experiments: Proving the Twist

The author ran simple experiments to prove this isn't just theory:

The Setup: They trained a robot to recognize digits (0-9) and images of cars (CIFAR-10).
The Twist: They applied a random mathematical "twist" to the robot's internal numbers and fixed the output layer.
The Result:
- Predictions: The robot got 100% the same answers.
- Similarity: The "closeness" of the numbers changed drastically.
- Nearest Neighbors: If you asked the robot, "What is the picture most similar to this one?", the answer changed. In one test, 28% of the "closest" neighbors changed just because of the twist!

This means that if you read a paper saying "Neural networks group cats and dogs together," you have to ask: "Did they group them together because of the learning, or just because of the random twist of the coordinates?"

The Solution: Finding the "True" Shape

If the coordinates are fake, how do we find the real structure? The paper suggests two ways:

Use "Gauge-Invariant" Tools: Instead of measuring angles (which change when you twist the ruler), measure things that don't change. The paper mentions methods like CKA or SVCCA, which look at the structure of the data rather than the specific numbers. It's like measuring the shape of a shadow rather than the length of the shadow, which changes depending on the sun's angle.
Pick a "Standard" Ruler (Whitening):
Imagine all the numbers in the robot's brain are squashed in one direction and stretched in another (like a deflated balloon).
- Whitening is a mathematical process that inflates the balloon until it's a perfect sphere.
- This creates a "canonical" (standard) coordinate system.
- If everyone agrees to use this "inflated sphere" ruler, then when two scientists measure the distance between "Cat" and "Dog," they will get the same answer.

Summary for the Everyday Reader

Neural networks turn data into lists of numbers.
Scientists measure the "distance" between these numbers to understand how the AI thinks.
The Catch: The "distance" depends on an arbitrary choice of how the numbers are arranged. You can rearrange the numbers without changing the AI's behavior.
The Consequence: Many popular studies claiming to find "semantic similarity" or "clusters" in AI brains might just be measuring the arbitrary arrangement of numbers, not the actual intelligence.
The Fix: We need to either use measurement tools that ignore these arbitrary arrangements or force the AI to use a standard "ruler" (like whitening) before we start measuring.

In short: Don't trust the ruler until you know who made it. The AI's "thoughts" are real, but the way we measure them is often just a trick of the light.

1. Problem Statement

The paper addresses a fundamental oversight in the analysis of neural network representations: the assumption that the coordinates of hidden state vectors possess intrinsic geometric meaning.

The Core Issue: Neural representations are often analyzed as vectors in a fixed Euclidean space using metrics like cosine similarity or Euclidean distance. However, the specific coordinate system used to describe these vectors is arbitrary.
The Mechanism: If a hidden representation $h(x)$ is transformed by an invertible linear map $D$ (where $D \in GL(d)$ ), the network's output function remains unchanged if the downstream weights are adjusted inversely ( $W' = WD^{-1}$ ).
The Consequence: Because the network function is invariant under this transformation, the representation vectors are only defined up to an invertible linear transformation (a "gauge freedom"). Consequently, geometric quantities dependent on the metric structure (like angles and distances) are gauge-dependent. They can change drastically even when the information encoded and the model's predictions remain identical.

2. Methodology

The author approaches the problem using a geometric framework treating representation spaces as vector spaces with gauge symmetry under the general linear group $GL(d)$ .

Theoretical Framework:
- Gauge Symmetry: Formally defines the transformation $h \to Dh$ and $W \to WD^{-1}$ as a symmetry that leaves the model function $f(x)$ invariant.
- Metric Dependence: Demonstrates that under a transformation $D$ , the inner product changes from the standard Euclidean dot product to a new metric tensor $G = D^\top D$ . Cosine similarity, therefore, becomes a function of the chosen gauge:
  $\cos_D(u, v) = \frac{u^\top G v}{\sqrt{u^\top G u} \sqrt{v^\top G v}}$
- Whitening as a Canonical Gauge: Proposes "whitening" (transforming by $D = \Sigma^{-1/2}$ , where $\Sigma$ is the covariance matrix) as a way to fix a canonical coordinate system where the representation distribution is isotropic (identity covariance).
- Feature Superposition: Analyzes how gauge transformations affect the geometry of feature directions and the Gram matrix, showing that "feature overlap" is a metric-dependent property.
Experimental Design:
- Controlled Transformations: The author trains models (MLPs and CNNs), extracts hidden representations, and applies random invertible linear transformations ( $D$ ) to these representations.
- Compensation: The final linear classifier weights are adjusted ( $W' = WD^{-1}$ ) to ensure the model's predictions remain mathematically identical.
- Metrics Evaluated:
  - Cosine Similarity: Measured the mean absolute change in pairwise cosine similarities.
  - Nearest-Neighbor Structure: Measured the Jaccard overlap of $k$ -nearest neighbor sets before and after transformation.
  - Gauge Strength: Varied the condition number $\kappa$ of the transformation matrix $D$ to observe how distortion scales with the "strength" of the gauge change.

3. Key Contributions

Formalization of Gauge Freedom: The paper explicitly frames neural representation analysis as a gauge theory problem, identifying that representations are defined modulo $GL(d)$ .
Demonstration of Metric Instability: It proves that standard similarity measures (cosine similarity) and structural properties (nearest-neighbor graphs) are not intrinsic properties of the learned model but artifacts of the specific coordinate realization.
Explanation of Literature Anomalies: Provides a unified geometric explanation for previously observed phenomena, such as:
- The instability of cosine similarity in embedding spaces.
- Anisotropy in contextualized embeddings.
- The necessity of methods like SVCCA and CKA, which are designed to be more invariant to linear transformations.
Proposal of Canonical Gauges: Suggests whitening as a standard procedure to remove second-order anisotropy, providing a stable, isotropic metric for comparing representations.

4. Experimental Results

The experiments were conducted on the Digits dataset (MLP) and CIFAR-10 (Convolutional Neural Network).

Functional Invariance: In all experiments, the transformed models produced identical predictions (prediction agreement = 1.0) with negligible logit differences ( $\sim 10^{-5}$ ), confirming the gauge symmetry holds.
Cosine Similarity Distortion:
- Digits: Mean absolute change in pairwise cosine similarity was 0.1328.
- CIFAR-10: Mean absolute change was 0.0501.
- Condition Number Sweep: As the condition number $\kappa$ of the transformation increased (up to $\kappa=20$ ), cosine distortion increased linearly. At $\kappa=20$ , the mean distortion was $\approx 0.08$ , and the nearest-neighbor overlap dropped significantly.
Nearest-Neighbor Instability:
- Despite identical predictions, the set of nearest neighbors changed drastically.
- In the Digits experiment, the Jaccard overlap for $k=10$ neighbors was only 0.72, meaning 28% of nearest neighbors changed.
- In the CIFAR-10 experiment with high $\kappa$ , the top-1 neighbor flip rate reached 37%.
Whitening Effect: Applying whitening collapsed the covariance eigenvalue spectrum to unity, effectively removing the anisotropy and fixing a canonical metric.

5. Significance and Implications

Re-evaluation of Interpretability: Many current interpretability techniques rely on the assumption that "close vectors in cosine space" imply "similar semantic concepts." This paper argues that such conclusions are fragile and depend entirely on the arbitrary coordinate system chosen by the training process or initialization.
Methodological Shift: The paper advocates for two main approaches in representation analysis:
1. Gauge-Invariant Methods: Prioritizing metrics that do not change under linear transformations (e.g., subspace alignment, CKA, SVCCA).
2. Canonical Coordinates: Explicitly fixing a coordinate system (e.g., via whitening) before performing geometric analysis to ensure reproducibility and comparability.
Future Directions: The author notes that while this study focuses on linear gauge freedom and second-order geometry, future work should explore how normalization layers, residual connections, and nonlinearities in large transformers might constrain or alter the practical choice of gauge.

In summary, the paper establishes that neural representation geometry is not absolute; it is relative to the coordinate system. Ignoring this "gauge freedom" leads to potentially misleading interpretations of model behavior and semantic similarity.

Gauge Freedom and Metric Dependence in Neural Representation Spaces

The Big Idea: The "Map vs. Territory" Problem

The Core Concept: "Gauge Freedom" (The Shape-Shifting Brain)

Why This Matters: The "Cosine Similarity" Trap

The Experiments: Proving the Twist

The Solution: Finding the "True" Shape

Summary for the Everyday Reader

1. Problem Statement

2. Methodology

3. Key Contributions

4. Experimental Results

5. Significance and Implications

More like this

DyMRL: Dynamic Multispace Representation Learning for Multimodal Event Forecasting in Knowledge Graph

How unconstrained machine-learning models learn physical symmetries

Experiential Reflective Learning for Self-Improving LLM Agents

Learning Mesh-Free Discrete Differential Operators with Self-Supervised Graph Neural Networks

Physics-Informed Neural Network Digital Twin for Dynamic Tray-Wise Modeling of Distillation Columns under Transient Operating Conditions