Every Language Model Has a Forgery-Resistant Signature

Imagine you walk into a bakery and buy a loaf of bread. You want to know: Did this bread actually come from this specific bakery, or did someone else bake a fake one and try to pass it off?

In the world of Artificial Intelligence (AI), specifically Large Language Models (LLMs) like the ones powering chatbots, this is a huge problem. Companies sell access to their "black box" models via the internet. You type a question, and they type an answer. But how do you know the answer really came from their model and not a copycat or a hacker?

This paper introduces a brilliant, invisible solution: The Ellipse Signature.

Here is the breakdown in simple terms, using some tasty analogies.

1. The Invisible Geometry of AI

Most people think of AI outputs as just text. But behind the scenes, before the AI writes a word, it does a bunch of math. It calculates probabilities for every possible next word.

The authors discovered that because of how these AI models are built (specifically a step called "normalization"), their math doesn't just happen anywhere. It happens on a very specific, invisible shape.

The Analogy: Imagine the AI's brain is a giant, invisible trampoline. No matter where you jump on it, you always land on a specific, curved surface.
The Science: That surface is a high-dimensional ellipse (think of a stretched-out, multi-dimensional oval).
The Result: Every single time the AI generates a word, the math behind that word must land exactly on this invisible ellipse. It's a geometric law of the universe for that specific model.

2. The "Signature"

Because every model has its own unique architecture (different size, different training), every model has its own unique ellipse.

The Analogy: Imagine every bakery has a unique, invisible mold they use to shape their bread. Even if you can't see the mold, if you look at the shape of the bread, you can tell exactly which bakery made it.
The Signature: The "Ellipse Signature" is just checking: Does this output land on the specific ellipse of "Model A"?
- If Yes: It almost certainly came from Model A.
- If No: It came from somewhere else.

3. Why is this a Big Deal? (The "Forgery-Resistant" Superpower)

The paper highlights four superpowers that make this method better than previous attempts (like watermarks or fingerprints):

A. It's Naturally Occurring (No Setup Needed)

Old Way: To watermark a model, the bakery owner has to intentionally mix a secret ingredient into the dough. If they forget, there's no watermark.
Ellipse Way: The "signature" is baked into the physics of the model itself. It happens automatically. You don't need to ask the company to turn it on; it's always there, like the sound of a specific engine running.

B. It's Self-Contained (No Secrets Needed)

Old Way: To verify a fingerprint, you might need to see the original recipe or the secret key.
Ellipse Way: You can verify the signature just by looking at the output (the text and its math). You don't need to see the model's secret weights or the user's prompt. It's like verifying a signature on a check just by looking at the ink, without needing to see the bank's vault.

C. It's Compact (One Word is Enough)

Old Way: Some methods need a whole paragraph of text to find a pattern.
Ellipse Way: The signature is in every single word. You can verify the source of a model just by looking at the math behind one single word it generated.

D. It's Hard to Fake (The "Forgery-Resistant" Part)

This is the most important part.

The Problem: If I want to pretend I am "Model A," I need to make my fake output land on "Model A's" ellipse.
The Old Way (Linear Signatures): Previously, hackers could figure out the shape of the ellipse by asking the model a few questions and then just drawing a line to copy it. Easy!
The Ellipse Way: To copy an ellipse, you have to figure out the exact shape of a 3D (or 3,000-dimensional) oval.
- The Analogy: Imagine trying to recreate a specific, complex 3D sculpture just by looking at a few photos of it from the outside.
- The Reality: The paper shows that to figure out the exact shape of the ellipse for a big model, you would need to ask the model millions of questions and spend thousands of years of computer time to solve the math.
- The Cost: It would cost millions of dollars in API fees just to try to steal the signature. So, for all practical purposes, it is impossible to forge.

4. The "Secret Key" Protocol

The authors propose a new way to verify AI outputs, similar to how we use passwords or digital signatures today.

The Setup: The AI company (the "Signer") knows the exact shape of their ellipse (the "Secret Key").
The Action: The AI generates text. The math of that text is the "Message."
The Verification: A third party (like a regulator or a user) checks if the math of the text fits the ellipse.
The Result: If it fits, it's authentic. If it doesn't, it's fake.

Why Should We Care?

This is a game-changer for accountability.

Scenario: A company releases a model that accidentally generates hate speech or dangerous advice. They deny it, saying, "That wasn't our model!"
The Solution: A trusted third party can check the "Ellipse Signature" of the output. If the math doesn't match the company's ellipse, the company is lying. If it does match, the company is caught.

Summary

Think of every AI model as having a unique, invisible geometric fingerprint that is impossible to fake without the secret recipe. This paper proves that this fingerprint exists, explains how to find it, and shows that it's so hard to copy that it can finally be used to hold AI companies accountable for what their models say.

It turns the invisible math of AI into a trustworthy ID card that no one can forge.

1. Problem Statement

The proliferation of closed-weight Large Language Models (LLMs) with public APIs has created a need for language model forensics: methods to identify the source of a generated output or extract hidden model details without access to the model's weights.

Existing methods face significant limitations:

Watermarks/Fingerprints: Often require proactive implementation by the provider (not naturally occurring) or need long text sequences to be effective (not compact).
Linear Signatures: Previous work identified that model outputs lie on a linear subspace. However, these are easy to forge; an attacker can extract the linear constraints via API queries and generate new outputs that satisfy them.
Zero-Knowledge Proofs: While secure, they require significant computational overhead and changes to the inference pipeline.

The authors propose a new forensic method based on a geometric constraint inherent to almost all modern LLMs: the "Ellipse Signature." They argue that this signature is naturally occurring, self-contained, compact, and, crucially, forgery-resistant.

2. Methodology

2.1 The Geometric Constraint (The Ellipse)

The core insight is that the final layers of standard Transformer architectures impose a specific geometric structure on the output log-probabilities (logprobs).

Architecture: Most LLMs use a normalization layer (RMS Norm or Layer Norm) followed by a linear projection (unembedding matrix $W$ ) to the vocabulary space.
Normalization: Normalization maps hidden representations onto the surface of a high-dimensional sphere (or a hyperplane intersecting a sphere for Layer Norm).
Linear Projection: The subsequent linear layer ( $W$ ) and affine transformations stretch and rotate this sphere.
Result: The resulting logit vectors lie on the surface of a $d$ -dimensional hyperellipsoid (an ellipse) embedded in the $v$ -dimensional vocabulary space ( $v \gg d$ ).
Logprobs: Since the Softmax function is invariant to scalar addition, the log-probabilities (logprobs) returned by APIs also lie on this same elliptical manifold (after centering).

2.2 Verification (The Signature Check)

To verify if an output came from a specific model:

Recover Parameters: Using a known set of outputs from the target model, fit an ellipse to the data to recover the affine transformation parameters (rotation, scaling, and bias).
Distance Check: For a new, unknown logprob vector, apply the inverse affine transform. If the vector originated from the model, the transformed vector should lie on the unit sphere (magnitude $\approx 1$ ).
Decision: If the distance to the unit sphere is minimal, the output is attributed to that model. If it falls on a different ellipse (or no ellipse), it is attributed to a different source.

2.3 Forgery Resistance Analysis

The paper argues that forging an ellipse signature is computationally infeasible for production-scale models due to two main bottlenecks:

Sample Complexity (Query Cost): To uniquely define a $d$ $d$ -dimensional ellipse, one needs $O(d^2)$ $O (d^{2})$ samples. For a model with hidden size $d=4096$ $d = 4096$ (e.g., Llama 3 8B), this requires $\approx 8.4$ $\approx 8.4$ million unique API queries.
- The cost scales super-cubically ( $O(d^3 \log d)$ ) when accounting for the need to send multi-token prefixes to generate unique samples.
- Cost Estimate: Extracting the ellipse for a 70B model could cost $16 million in API fees.
Computational Complexity (Fitting Cost): Fitting an ellipse to $O(d^2)$ $O (d^{2})$ points in $d$ $d$ dimensions involves solving a system of equations with $O(d^2)$ $O (d^{2})$ variables.
- The time complexity is $O(d^6)$ .
- Time Estimate: Extrapolating from experiments on small models, fitting the ellipse for a 70B model would take thousands of years using current hardware.

3. Key Contributions

Identification of the Ellipse Signature: The authors formalize the geometric constraint that LLM outputs lie on a high-dimensional ellipse, establishing it as a unique "signature" for every model.
Proof of Forgery Resistance: They demonstrate that unlike linear signatures, ellipse signatures cannot be forged without access to the model weights because extracting the ellipse parameters from API outputs is computationally prohibitive ( $O(d^6)$ time and super-cubic monetary cost).
Protocol for Output Verification: They propose a system analogous to Cryptographic Message Authentication Codes (MACs).
- Secret Key: The model's ellipse parameters.
- Message: The logprob output.
- Tag: The position of the logprob on the ellipse.
- Only the model provider (who knows the ellipse) can generate valid outputs; a third party can verify them without seeing the weights.
Empirical Validation:
- Successfully extracted ellipse parameters from small open-weight models (e.g., 1M parameter models) with high accuracy.
- Demonstrated that outputs from Model A are easily distinguished from Model B by checking their distance to Model A's ellipse (Figure 3).
- Showed that even different checkpoints of the same model (e.g., Olmo 2 vs. Olmo 2-300) have distinct, identifiable ellipses.

4. Results

Accuracy: In cross-model experiments, the generating model always had the smallest distance to its own ellipse by several orders of magnitude compared to other models.
Robustness: The method is robust to the $\epsilon$ smoothing term in normalization layers, which slightly shrinks the sphere, provided the model is large enough.
Scalability Limits: While feasible for small models, the paper concludes that for current production models (7B+ parameters), the ellipse is effectively "unbreakable" via API queries due to the $O(d^6)$ fitting complexity and API costs.
Comparison: Table 2 in the paper highlights that the Ellipse Signature is the only method that is simultaneously:
- Naturally Occurring (no provider intervention).
- Self-Contained (no input or full weights needed).
- Compact (detectable in a single token generation).
- Forgery-Resistant.

5. Significance and Implications

Model Accountability: This provides a mechanism for legal and regulatory bodies to verify if a harmful output was generated by a specific closed-source model, even if the provider denies it. A trusted third party could hold the "ellipse key" to verify claims.
Forensics: It offers a new tool for detecting model theft or unauthorized fine-tuning. If a model claims to be "Model X" but its outputs do not lie on "Model X's" ellipse, it is likely a different model.
Security Paradigm: It shifts the security model of LLMs from "watermarking" (which can be removed) to "inherent structural constraints" (which are hard to replicate without the weights).
Limitations:
- Requires API access to logprobs (currently limited to a few providers like OpenAI).
- Not a cryptographic guarantee (security is polynomial, not exponential), though practically unbreakable for large models.
- The signature can be erased if the provider modifies the final layer or removes normalization.

In conclusion, the paper establishes that the geometric constraints of modern LLM architectures create a natural, forgery-resistant signature that can be used for robust output verification and model attribution, filling a critical gap in the landscape of AI forensics.