Symbol-Equivariant Recurrent Reasoning Models

Imagine you are trying to teach a robot how to solve a Sudoku puzzle.

The puzzle has a simple rule: every row, column, and 3x3 box must contain the numbers 1 through 9 exactly once. But here's the catch: the actual numbers don't matter. If you replaced every "1" with a "Red Apple," every "2" with a "Blue Banana," and every "3" with a "Green Grape," the logic of the puzzle would remain exactly the same. The robot should be able to solve it regardless of what symbols you use.

The Problem: The Robot is "Color-Blind" to Logic

Previous AI models (called RRMs) were like students who memorized the specific answers to a practice test. If you gave them a test with the numbers 1–9, they did great. But if you gave them a test with 1–16 (a bigger Sudoku) or swapped the numbers for colors, they got confused.

To fix this, researchers used a "brute force" method: they showed the robot thousands of puzzles where the numbers were randomly swapped (data augmentation). It was like forcing the student to memorize every possible variation of the test. It worked, but it was slow, expensive, and the robot still couldn't handle puzzles it had never seen before.

The Solution: The "Universal Translator" (SE-RRM)

The authors of this paper introduced a new model called SE-RRM (Symbol-Equivariant Recurrent Reasoning Model).

Think of the old models as a chef who only knows how to cook with specific ingredients (e.g., "I only know how to use this specific brand of tomato"). If you give them a different tomato, they panic.

The new SE-RRM is like a chef who understands the concept of a tomato. They know that a tomato is a "red, round, acidic fruit" regardless of the brand.

The Magic Trick: Instead of memorizing that "Red = 1" and "Blue = 2," the SE-RRM is built with a special architectural rule: "If you swap the labels, the logic stays the same."
It treats the symbols (numbers, colors, shapes) as interchangeable tokens. It doesn't care if the puzzle uses numbers 1–9, colors, or emojis. It just cares about the relationships between them.

How It Works: The "3D Puzzle"

In the old models, the AI looked at the puzzle as a flat 2D grid (rows and columns).
In the new SE-RRM, the AI adds a third dimension. Imagine the puzzle isn't just a flat sheet of paper, but a stack of transparent sheets.

Sheet 1: The positions (where the numbers go).
Sheet 2: The symbols (what the numbers are).
The AI looks at the puzzle in 3D, allowing it to see that "Position A" and "Position B" are related, and "Symbol X" and "Symbol Y" are related, without getting confused by what the symbols actually are.

The Results: Why This Matters

The researchers tested this new model on three types of challenges:

Sudoku (The Logic Test):
- Old Models: Could solve standard 9x9 puzzles but failed miserably when asked to solve a 4x4 (smaller) or 16x16 (bigger) puzzle. They couldn't "extrapolate" (guess the rules for new sizes).
- SE-RRM: Solved the 9x9 puzzles better than anyone else. More impressively, it successfully solved 4x4 puzzles it had never seen before and made decent guesses on 16x16 and 25x25 puzzles. It learned the rules, not just the answers.
ARC-AGI (The "Human Intelligence" Test):
- These are puzzles that test if a machine can think like a human (e.g., "If I move this shape, what happens?").
- The Win: The SE-RRM achieved top-tier results using only 8 variations of the puzzle for training. The old models needed 1,000 variations to get similar results. It's the difference between learning a language by reading one book vs. reading a thousand different books.
Mazes (The Planning Test):
- Even when the "symbol swap" trick wasn't strictly necessary (because walls aren't the same as exits), the new model still performed better, proving its architecture is just generally smarter.

The Big Picture

This paper is a breakthrough because it stops AI from being a "parrot" that just repeats what it memorized. Instead, it builds a "reasoner" that understands the underlying structure of a problem.

Efficiency: It needs way less data to learn.
Scalability: It can handle bigger, stranger problems without needing to be retrained.
Robustness: It doesn't break when you change the "colors" of the problem.

In short, the authors built a robot that doesn't just memorize the map; it understands the concept of navigation.

1. Problem Statement

Reasoning tasks involving structured constraints, such as Sudoku, ARC-AGI (Abstraction and Reasoning Corpus), and maze solving, remain challenging for current neural networks.

Limitations of LLMs: Large Language Models (LLMs) often struggle with tightly constrained symbolic reasoning, exhibiting poor performance on tasks like 3-SAT or ARC-AGI-2 without heavy orchestration or symbolic search.
Limitations of Existing RRMs: Recurrent Reasoning Models (RRMs), such as the Hierarchical Reasoning Model (HRM) and Tiny Recursive Model (TRM), offer a compact alternative to LLMs by performing iterative refinement. However, they lack explicit symbol equivariance.
- In problems like Sudoku, the specific identity of symbols (e.g., the digit '1' vs. '2') is interchangeable; the solution logic depends only on the relationships between symbols, not their absolute values.
- Current RRMs treat symbols as distinct, fixed embeddings. To handle this, they rely on costly data augmentation (permuting symbols in the training data) to learn invariance. This increases sample complexity and hinders generalization to unseen symbol configurations or larger grid sizes (extrapolation).

2. Methodology: Symbol-Equivariant Recurrent Reasoning Models (SE-RRMs)

The authors propose SE-RRMs, a new architecture that enforces permutation equivariance at the architectural level, guaranteeing that permuting input symbols results in a correspondingly permuted output.

Core Architectural Changes

3D Tensor Representation:
- Vanilla RRM: Represents a task as a 2D matrix of size $D \times I$ (Features $\times$ Positions), where each position has a unique symbol embedding.
- SE-RRM: Introduces a third dimension to link positions and symbols, creating a tensor of size $D \times I \times K$ (Features $\times$ Positions $\times$ Symbols).
- Embedding Strategy: Instead of learning unique embeddings for every symbol, SE-RRM uses a single shared embedding vector ( $d \in \mathbb{R}^D$ ) for all "usual" symbols across all positions. Special tokens (e.g., masks) have their own embeddings. If a symbol is absent at a position, a zero vector is used.
Dual Self-Attention Mechanism:
The SE-RRM block ( $G$ ) replaces the single self-attention layer of vanilla RRMs with two sequential attention layers:
- Position Attention ( $T^{D,I}$ ): Operates along the position dimension ( $I$ ), reasoning about spatial relationships (similar to standard Transformers).
- Symbol Attention ( $T^{D,K}$ ): Operates along the symbol dimension ( $K$ ). This allows the model to reason about the relationships between different symbols (e.g., "1 is not 2") regardless of where they appear.
- Update Rule: The block applies normalization and MLPs (SwiGLU) token-wise across all dimensions, preserving equivariance.
Theoretical Guarantees:
- Proposition 2.3: The authors prove that the SE-RRM block is equivariant under permutations of input symbols ( $\rho: [K] \to [K]$ ). This means if the input symbols are permuted, the output logits are permuted in the exact same way, ensuring consistent logic regardless of symbol labeling.
Training & Inference:
- The model uses Deep Supervision, applying a loss signal at intermediate recurrent steps to guide the fixed-point iteration.
- Unlike previous RRMs that require massive augmentation, SE-RRMs require minimal augmentation (e.g., only 8 dihedral augmentations for ARC-AGI) because the architecture inherently handles symbol permutations.

3. Key Contributions

Architectural Innovation: Introduction of SE-RRMs, which explicitly encode symbol equivariance via a 3D tensor structure and dual attention mechanisms (position and symbol dimensions).
Reduced Data Dependency: Demonstrated that explicit symmetry encoding drastically reduces the need for data augmentation, leading to better sample efficiency.
Extrapolation Capability: Unlike vanilla RRMs that fail on unseen symbol sets, SE-RRMs can generalize to larger problem sizes (e.g., training on $9\times9$ Sudoku and testing on $16\times16$ or $25\times25$ ) because they do not rely on fixed symbol embeddings.
Parameter Efficiency: The proposed model uses only 2 million parameters, significantly fewer than HRM (27M) or TRM (7M), yet achieves superior performance.

4. Experimental Results

A. Sudoku (Constraint Satisfaction)

Setup: Models trained exclusively on $9\times9$ Sudoku.
Performance:
- $9\times9$ : SE-RRM achieved 93.73% Fully Solved Rate (FSR) and 97.58% Grid-Point Accuracy (GPA), outperforming HRM (63.53% FSR) and TRM (71.94% FSR).
- $4\times4$ (Generalization): SE-RRM achieved 95.46% FSR, whereas HRM and TRM dropped to 0% (failing to extrapolate rules to smaller grids).
- $16\times16$ & $25\times25$ (Extrapolation): SE-RRM generalized to larger grids with unseen symbols (digits 10-25), achieving 51.95% GPA on $16\times16$ and 31.49% on $25\times25$ . Vanilla RRMs could not operate on these sizes.
Scaling: SE-RRM showed superior test-time scaling, improving FSR from 16% (1 step) to 98.84% (128 steps).

B. ARC-AGI (Abstract Reasoning)

Setup: Evaluated on ARC-AGI-1 and ARC-AGI-2.
Augmentation: SE-RRM used only 8 augmentations per sample (vs. 1000+ for HRM/TRM).
Results:
- ARC-AGI-1: SE-RRM achieved 45.3% pass@2, outperforming HRM (40.3%) and matching TRM (44.6%).
- ARC-AGI-2: SE-RRM achieved 7.1% FSR, comparable to TRM (7.8%) and significantly better than HRM (5.0%).

C. Maze Solving

Setup: Grid-based maze solving where symbol equivariance is not naturally applicable (walls $\neq$ start/end).
Adaptation: The authors broke symbol equivariance by using distinct embeddings for different symbols.
Results: SE-RRM achieved 88.8% FSR, outperforming TRM (85.3%) and HRM (74.5%), demonstrating the architecture's flexibility even when symmetry is not enforced.

5. Significance and Conclusion

Robustness: By explicitly encoding symmetry, SE-RRMs learn the structure of reasoning problems rather than memorizing specific symbol mappings. This leads to robustness against distributional shifts and unseen symbol configurations.
Scalability: The ability to extrapolate to larger grids ( $N \times N$ ) without retraining is a critical step toward solving combinatorial problems of arbitrary size.
Efficiency: The model achieves state-of-the-art results in structured reasoning with a fraction of the parameters and training data augmentation required by competitors.
Implication: This work suggests that for structured reasoning tasks, inductive biases (like equivariance) are more effective than simply scaling model size or data volume, offering a promising path toward more reliable AI reasoning systems.