Imagine a Large Language Model (LLM) like DeepSeek-V3 as a massive, multi-story library. Inside this library, every sentence you type is transformed into a unique, high-dimensional "fingerprint" (a vector) as it moves through the different floors (layers) of the building.

The big question this paper asks is: How does the library organize these fingerprints? Specifically, does it keep the "structure" of the sentence (syntax) separate from the "meaning" of the sentence (semantics), or are they all mixed together in a big smoothie?

Here is what the researchers found, explained simply:

1. The "Average" Trick (Finding the Core)

The researchers realized that if you have a bunch of sentences that look the same grammatically (e.g., "The cat sat," "The dog ran," "The bird flew"), they share a common "skeleton."

The Analogy: Imagine taking a photo of 100 different people wearing the exact same type of hat. If you average all those photos together, the faces blur out, but the hat becomes super sharp and clear.
The Method: They did this mathematically. They took sentences with the same grammar structure and averaged their fingerprints to create a "Syntax Centroid" (the pure grammar hat). They did the same for sentences with the same meaning but different words to create a "Semantic Centroid" (the pure meaning hat).

2. The "Subtraction" Test (Removing the Hat)

Once they had these "pure" grammar and meaning vectors, they tried to remove them from the original sentence fingerprints.

The Analogy: Imagine you have a photo of a person wearing a hat. If you digitally subtract the "hat" vector from the photo, the hat disappears. If the photo still looks like the person, you know the hat was a separate layer. If the person's face disappears too, the hat and face were mixed together.
The Result: When they subtracted the "Grammar Hat" from a sentence, the sentence lost its ability to match with other sentences that had the same grammar. When they subtracted the "Meaning Hat," it lost its ability to match with sentences that meant the same thing.
The Conclusion: This proves that the model encodes grammar and meaning in a linear way. They are like distinct ingredients in a recipe that can be mathematically separated, rather than a chemical reaction where they become a new substance.

3. The "Floor Plan" Discovery (Where things live)

The library has many floors. The researchers found that grammar and meaning live on different floors.

Grammar (Syntax): This is like the foundation and the lower floors. It is present right from the start and stays consistent all the way to the top. The model knows the structure of a sentence almost immediately.
Meaning (Semantics): This is like the middle floors. When a sentence enters the library, the model first looks at the words and structure (low floors). Then, as the sentence moves to the middle, the model figures out what it actually means. By the time it reaches the very top floor (where the model writes its answer), the meaning is still there, but the focus shifts to generating the output.
The Analogy: Think of reading a book. First, you recognize the letters and words (grammar). Then, in the middle of the paragraph, you understand the story (meaning). You don't need to re-recognize the letters to understand the story, but you do need the letters to start.

4. The One-Way Street (Asymmetry)

Here is the most interesting part: The separation isn't perfectly equal.

Grammar is independent: If you remove the "Meaning" from a sentence, the "Grammar" stays perfectly intact. The skeleton remains standing even if you take away the flesh.
Meaning is dependent: If you remove the "Grammar" from a sentence, the "Meaning" gets a bit wobbly. It doesn't disappear completely, but it gets harder to recognize.
The Analogy: Imagine a house. If you remove the furniture (meaning), the house structure (grammar) is still clearly a house. But if you remove the walls and roof (grammar), the furniture (meaning) is just a pile of stuff on the ground; it's hard to tell what it was supposed to be.

Summary

The paper shows that in these giant AI models:

Grammar and Meaning are distinct: They are encoded separately, not hopelessly mixed.
They are linear: You can mathematically "subtract" one from the other.
They live in different places: Grammar is everywhere (especially early on), while Meaning peaks in the middle of the model's processing.
Grammar is the sturdy foundation: You can strip away meaning without breaking the grammar, but stripping away grammar makes the meaning harder to hold onto.

This suggests that even though these models are trained just by predicting the next word, they naturally develop a structure that looks a lot like how human linguists think language works: a structural framework that supports a layer of meaning.

Technical Summary: Differential Syntactic and Semantic Encoding in LLMs

Problem Statement

This study investigates how Large Language Models (LLMs) encode syntactic (structural) and semantic (meaning) information within their high-dimensional internal representations. While the success of LLMs has spurred interest in decoding where and how linguistic competence is stored, there remains significant disagreement regarding the relationship between syntax and semantics. Generative traditions often posit a strict autonomy of syntax, whereas functionalist approaches view them as deeply entangled. The paper aims to resolve this by determining whether these two components are linearly encoded, how they are distributed across network layers, and to what extent they can be decoupled in models trained without explicit linguistic priors.

Methodology

The authors utilize a geometric approach based on linear operations to probe the representations of the DeepSeek-V3 model (671B parameters), with qualitative replication on smaller models (Qwen2-7b, Gemma3-12b, Pythia-6.9b).

1. Dataset Construction

The study relies on matched sentence pairs generated using other LLMs (Gemini, ChatGPT, DeepSeek):

Syntactic Matching: Pairs of sentences sharing the same Part-of-Speech (POS) template but expressing unrelated meanings ("syntax twins").
Semantic Matching: Pairs of sentences consisting of an original sentence and its English paraphrase, as well as translations of the original sentence into six languages (Arabic, Chinese, German, Italian, Spanish, Turkish).

2. Centroid Construction and Ablation

To isolate specific information types, the authors construct "centroids" by averaging hidden representations:

Syntactic Centroid ( $S_i$ ): The average of representations of all "syntax twins" sharing a specific POS template. This averages out semantic variance while retaining syntactic structure.
Semantic Centroid ( $T_i$ ): The average of representations of all translations of a sentence $X_i$ (excluding the original and its English paraphrase). This averages out syntactic and lexical variance while retaining semantic content.

Ablation Procedure: The authors remove specific information from a sentence vector $X_i$ by subtracting its projection onto the respective centroid.

Syntax ablation: $X_i \perp S_i = X_i - \frac{X_i \cdot S_i}{|S_i|^2} S_i$
Semantic ablation: $X_i \perp T_i = X_i - \frac{X_i \cdot T_i}{|T_i|^2} T_i$

3. Similarity Measurement

Instead of linear metrics like Centered Kernel Alignment (CKA), which the authors note provide weak signals in high dimensions, they employ a rank-based similarity measure derived from Information Imbalance. This metric quantifies how well the nearest neighbors in one representation space predict the nearest neighbors in another.

4. Representation Aggregation

Two methods are used to aggregate token-level hidden states into sentence-level vectors:

Concatenation: Concatenating the last $N$ tokens (preserves positional information).
Averaging: Averaging the representations of the last $N$ tokens (removes positional information).

Key Results

1. Linear Encoding of Syntax and Semantics

The study finds that both syntax and semantics are at least partially linearly encoded. Subtracting the syntactic or semantic centroids from sentence vectors significantly reduces the similarity between matched pairs (syntax twins or paraphrases), respectively. This suggests that a significant proportion of the relevant information is captured by these linear directions.

2. Differential Layer Profiles

The cross-layer encoding profiles of syntax and semantics differ:

Syntax: Syntactic similarity is high in early layers and remains relatively constant throughout the network. It is more prominent in concatenated representations, suggesting a reliance on positional information.
Semantics: Semantic similarity is low in early layers, rises to a peak in the central layers, and decreases slightly in the final output layers. Semantic similarity is stronger in averaged representations.

3. Asymmetric Decoupling

A crucial finding is the asymmetry in how syntax and semantics influence each other:

Removing Semantics: Ablating the semantic centroid from syntax twins does not significantly reduce their syntactic similarity. Syntax remains robust even when semantic information is removed.
Removing Syntax: Ablating the syntactic centroid from paraphrases significantly reduces their semantic similarity, particularly in central layers.
Interpretation: This suggests that while semantics can be partially separated from syntax, syntax is more autonomous. Removing syntactic structure (e.g., word order) degrades the ability to recover meaning, whereas removing meaning does not destroy the syntactic skeleton.

4. Norm Decomposition

Decomposing the squared norm of sentence vectors reveals that:

The syntactic component dominates in early layers.
The semantic component dominates in central layers.
Together, these centroids account for a significant but not total fraction (approx. 40% in central layers) of the vector norm, leaving a substantial "residual" component.

5. Downstream Probe Effects

The ablation methods impact downstream probing tasks as expected:

POS Classification: Ablating syntactic centroids drastically reduces accuracy; ablating semantic centroids has minimal effect.
Paraphrase Recall: Ablating semantic centroids drastically reduces recall; ablating syntactic centroids has a smaller (though present) negative effect.

Significance and Claims

The paper claims three primary contributions to the fields of LLM interpretability and computational linguistics:

Identification of a Semantic Core: The results confirm that a "semantic core" exists in LLM processing, concentrated specifically in the inner layers of the network, distinct from the more stable, layer-spanning syntactic processing.
Evidence for Linear Superposition: The study provides further evidence that simple linear superposition is a fundamental mechanism for encoding abstract linguistic features (syntax and meaning) in deep networks.
Emergent Autonomy of Syntax: The observation of an imperfect but clear separation between syntax and semantics in models trained without explicit linguistic priors suggests that the autonomy of syntax may be an inherent, optimal property of linguistic representations. This finding bridges the gap between generative linguistic theories (autonomous syntax) and functionalist views, implying that this distinction might emerge universally in cognitive systems, from human brains to LLMs.

The authors maintain a modest stance, acknowledging that their linear approach captures only partial aspects of these complex features and that a significant portion of the representation norm remains unexplained by these centroids. They suggest future work should explore non-linear feature extraction and the temporal dynamics of these encodings.

Differential syntactic and semantic encoding in LLMs