Functorial Neural Architectures from Higher Inductive Types

Imagine you are teaching a robot to navigate a city. You show it how to turn left, then how to turn right. You expect that if you ask it to "turn left, then right," it will simply combine those two learned skills perfectly.

But here's the problem: Standard AI (like the ones powering chatbots) is terrible at this. It can memorize specific routes, but it fails miserably when asked to combine them in new ways. It's like a student who can solve $2+2$ and $3+3$ , but gets confused when asked to solve $2+3$ because they are trying to memorize the answer rather than understanding the rule of addition.

This paper, "Functorial Neural Architectures from Higher Inductive Types," proposes a radical new way to build AI. Instead of trying to teach the AI the rules through trial and error, the authors say: "Let's build the rules into the robot's skeleton."

Here is the breakdown using simple analogies:

1. The Problem: The "Mixer" vs. The "Assembler"

Think of a standard AI (like a Transformer) as a smoothie blender.

When you put ingredients (words or steps) into the blender, it mixes them all together.
If you ask for "Left then Right," the blender smashes the "Left" and "Right" together.
The Flaw: If you change the order to "Right then Left," the blender makes a different smoothie. But in math and logic, sometimes "Left then Right" is actually the same as "Right then Left" (like walking in a circle). The blender can't tell the difference because it's looking at the order of the ingredients, not the meaning of the combination. It's too messy to be a perfect rule-follower.

The authors propose a new architecture called a Transport Decoder, which acts more like a Lego assembler.

Instead of blending, it builds the answer piece by piece.
It has a specific "Left" Lego brick and a specific "Right" Lego brick.
To make "Left then Right," it just snaps the two bricks together.
The Magic: Because the bricks are snapped together structurally, the AI cannot make a mistake about how they fit. It is "compositional by construction."

2. The Secret Sauce: "Higher Inductive Types" (The Blueprint)

How do we tell the AI which bricks to use? The authors use a branch of advanced math called Topology (the study of shapes and spaces).

Imagine the task is navigating a specific shape, like a Torus (a donut shape).

On a donut, you can walk around the hole (Loop A) or go through the hole (Loop B).
In math, there is a rule: Walking around the hole and then through it is the same as going through and then around. They are "homotopic" (they can be stretched into each other).
The authors use a Higher Inductive Type (HIT) as a blueprint. This blueprint lists the "generators" (the basic moves) and the "relations" (the rules that say which moves are actually the same).

The Compilation Process:
The authors created a "compiler" that takes this mathematical blueprint and automatically builds the AI architecture.

Generators become small, independent neural networks (the Lego bricks).
Relations become special "glue" (called 2-cells) that teaches the AI how to stretch one path into another.
The result is an AI that is mathematically guaranteed to follow the rules of the shape it is navigating.

3. The Experiments: The Donut, The Figure-8, and The Klein Bottle

The team tested their new "Lego AI" against the old "Blender AI" on three different shapes:

The Torus (The Donut):
- The Test: Can the AI combine loops correctly?
- Result: The Lego AI was 2 to 3 times better than the Blender AI. Even though the Blender AI had more parameters (more "brain power"), it couldn't figure out the structural rules.
The Figure-8 (Two Circles joined at a point):
- The Test: Here, order matters! Going around Circle A then Circle B is different from B then A.
- Result: The Blender AI completely collapsed. It got confused and started drawing random circles. The Lego AI was 5 to 10 times better. It perfectly understood that order changes the shape.
The Klein Bottle (A twisted, non-orientable surface):
- The Test: This is the hardest level. It has a weird rule: if you go around one loop, the direction of the other loop flips.
- Result: The Lego AI included a special "glue" (the learned 2-cell) that handled this flip. It reduced errors by 46% compared to the standard Lego AI that didn't have this glue. This proved that the AI could learn complex mathematical proofs inside its architecture.

4. Why This Matters

The paper proves a hard truth: You cannot teach a standard AI to be perfectly logical just by giving it more data. The "blender" architecture (attention mechanisms) is fundamentally broken for tasks that require strict logical composition.

The Solution:
Stop trying to teach the AI the rules. Instead, build the rules into the AI's DNA.

If you want an AI to navigate obstacles, build it with a structure that respects the geometry of obstacles.
If you want an AI to write code, build it with a structure that respects the syntax of programming.

The Takeaway

This paper is like saying, "We've been trying to teach a dog to do calculus by giving it more treats. Instead, let's just build a calculator."

By using advanced math to design the AI's skeleton, the authors created a system that cannot fail at the specific logical rules of the task. It's not just a smarter AI; it's a safer, more reliable AI that guarantees it will do the right thing, no matter how complex the combination of inputs gets.

1. Problem Statement

Neural networks systematically fail at compositional generalization: the ability to produce correct outputs for novel combinations of known parts (e.g., handling a 5-digit addition problem after only seeing 2-digit examples, or navigating two obstacles after learning one).

The Core Issue: The paper argues this is not a capacity or data limitation but an architectural failure. Standard architectures, particularly those relying on softmax self-attention, fail because they are not functorial.
Theoretical Gap: While the principle that compositional semantics should be functorial (preserving structure) is established in fields like DisCoCat, existing neural architectures lack a systematic method to enforce this property by construction. Current methods rely on ad-hoc engineering that offers no guarantees of compositional correctness.

2. Methodology: From Higher Inductive Types to Architectures

The authors propose a compilation functor that translates Higher Inductive Type (HIT) specifications (from Homotopy Type Theory) directly into neural network architectures. This ensures that the algebraic structure of the task dictates the network's structure.

Key Theoretical Framework

HIT Specifications: Tasks are modeled as spaces defined by generators (basepoints, loops) and relations (2-cells/homotopies).
- Example: A Torus ( $T^2$ ) is defined by two loops ( $a, b$ ) and a 2-cell witnessing commutativity ($ab = ba$).
- Example: A Klein Bottle ( $K$ ) involves a non-trivial relation ( $bab^{-1} = a^{-1}$ ).
The Compilation Functor ( $D$ ): Maps the category of the task (based on the fundamental group $\pi_1$ $π_{1}$ ) to a category of parametric maps (neural networks).
1. Generators: Each generator of the group is mapped to an independent generator network (an MLP) that produces a parametric loop.
2. Composition: Word concatenation is mapped to structural concatenation (list-append) of the generated loop segments.
3. Relations (2-Cells): Group relations are mapped to learned homotopies (natural transformations), implemented as separate MLPs that deform one loop sequence into another to satisfy the relation.

Architectural Classes

The paper defines two distinct classes of decoders:

Type-B (Functorial): Decoders that strictly compose outputs from independently generated segments (e.g., $D(w_1 \cdot w_2) = D(w_1) \oplus D(w_2)$ ). These are strict monoidal functors by construction.
Type-A (Non-Functorial): Decoders that allow cross-segment information flow (e.g., standard Transformers with attention). The paper proves these cannot be functorial for non-trivial groups.

3. Key Contributions

Compilation Functor from HITs: A systematic method to derive neural architectures from type-theoretic specifications. The algebraic structure of the task (generators and relations) automatically determines the network topology, guaranteeing compositional correctness by construction.
Formal Impossibility Theorem for Attention:
- Theorem 4.1: Proves that softmax self-attention is not functorial for any non-trivial compositional task, regardless of parameter settings.
- Reasoning: Attention computes outputs based on token content and order. If two different token sequences represent the same group element (e.g., $ab$ and $ba$ in an abelian group), a functorial decoder must produce identical outputs. However, attention produces different key vectors for different orderings, breaking the functorial property.
Formalization in Cubical Agda: The core positive results (transport decoders are strict monoidal functors) and negative results (attention is not) are formally verified in the Cubical Agda proof assistant, providing machine-checked guarantees.
Type-A vs. Type-B Distinction: A rigorous classification of architectures based on whether they enforce monoidal factorization (structural composition) or rely on learned cross-segment dependencies.

4. Experimental Results

The authors validate their theory on three topological spaces representing different levels of the HIT hierarchy, testing generalization to word lengths ( $L$ ) up to 10 (trained only on $L \le 2$ ).

Space	Fundamental Group ( $\pi_1$ )	Key Property	Result
Torus ( $T^2$ )	$\mathbb{Z}^2$ (Abelian)	Commutativity ($ab=ba$)	Type-B decoders outperform Type-A by 2–2.7×. Type-A models degrade or stagnate as $L$ increases.
Wedge of Circles ( $S^1 \vee S^1$ )	$F_2$ (Free, Non-Abelian)	Order Sensitivity ( $ab \neq ba$ )	The gap widens to 5.5–10×. Type-A models (Transformers) suffer "topological collapse," failing to distinguish which circle to trace (Circle Accuracy drops to ~14%). Type-B models maintain 100% accuracy.
Klein Bottle ( $K$ )	$\mathbb{Z} \rtimes \mathbb{Z}$	Non-trivial Relation ( $bab^{-1}=a^{-1}$ )	The learned 2-cell (proof term $H$ ) closes a 46% error gap on non-canonical words. Only the full HIT-compiled architecture (Transport + Homotopy) correctly handles the frame flip required by the group relation.

Key Observations:

Scaling: Type-B decoders exhibit constant per-segment error ( $O(1)$ ) as word length grows. Type-A decoders show error degradation ( $\Omega(1)$ ) because attention patterns become out-of-distribution.
Depth Obstruction: For non-solvable groups, Transformers fail due to both functoriality and depth limitations (computing prefix products is NC1-complete). For solvable groups (like the Klein bottle), the failure is purely due to the lack of functoriality.

5. Significance and Impact

Shift from Learning to Construction: The paper shifts the paradigm from asking "Can a network learn to compose?" to "Does the architecture guarantee functoriality?" It proposes a specify-verify-compile-train pipeline where guarantees are baked into the architecture before training begins.
Theoretical Limits of Transformers: It provides a structural impossibility proof explaining why Transformers struggle with systematic generalization in compositional tasks, regardless of scaling or training data volume.
Verified Machine Learning: By formalizing the results in Cubical Agda, the authors demonstrate a new paradigm for verified ML, where topological and algebraic guarantees hold for all parameter values, not just specific trained instances.
Practical Application: The framework is immediately applicable to domains with compositional structure, such as multi-step robotics planning, modular program synthesis, and molecular ring generation, by specifying the domain's topology as a HIT and compiling it into a certified architecture.

In conclusion, the paper establishes that compositional generalization is equivalent to functoriality. By compiling Higher Inductive Type specifications into neural architectures, one can achieve systematic generalization that standard attention-based models fundamentally cannot, with guarantees formally verified by proof assistants.

Functorial Neural Architectures from Higher Inductive Types

1. The Problem: The "Mixer" vs. The "Assembler"

2. The Secret Sauce: "Higher Inductive Types" (The Blueprint)

3. The Experiments: The Donut, The Figure-8, and The Klein Bottle

4. Why This Matters

The Takeaway

1. Problem Statement

2. Methodology: From Higher Inductive Types to Architectures

Key Theoretical Framework

Architectural Classes

3. Key Contributions

4. Experimental Results

5. Significance and Impact

More like this

Sparse Goodness: How Selective Measurement Transforms Forward-Forward Learning

The Long Delay to Arithmetic Generalization: When Learned Representations Outrun Behavior

Adaptive Memory Crystallization for Autonomous AI Agent Learning in Dynamic Environments

Design Conditions for Intra-Group Learning of Sequence-Level Rewards: Token Gradient Cancellation

Spectral Entropy Collapse as an Empirical Signature of Delayed Generalisation in Grokking