Original authors: Nicholas J. Cooper, François G. Meyer, Michael L. Roberts, Carlos Zapata-Carratalá, Lijun Chen, Danna Gurari

Published 2026-05-07✓ Author reviewed ⓘ

📖 5 min read🧠 Deep dive

CC BY 4.0

Original authors: Nicholas J. Cooper, François G. Meyer, Michael L. Roberts, Carlos Zapata-Carratalá, Lijun Chen, Danna Gurari

Original paper licensed under CC BY 4.0 (http://creativecommons.org/licenses/by/4.0/). ✨ This is an AI-generated explanation of the paper below. It is not written by the authors. For technical accuracy, refer to the original paper. Read full disclaimer

Imagine that building a Deep Neural Network (DNN) is like constructing a massive, complex factory. For the last 40 years, engineers have been building these factories by stacking standard Lego bricks (layers) in different ways. We know these factories work incredibly well, but we've never really had a blueprint that explains exactly how the bricks fit together at the most fundamental level. We've been looking at the factory from the outside, guessing how the gears turn inside.

This paper introduces a new, ultra-detailed blueprint called a Hierarchical Combinatorial Framework. It doesn't just look at the factory; it disassembles it down to the molecular level of how data is moved and mixed.

Here is the breakdown of their discovery using simple analogies:

1. The New Blueprint: From "Black Boxes" to "Transparent Gears"

Most previous theories treated neural network layers like "black boxes." They said, "This box takes an image and gives you a label," without explaining the internal machinery.

The authors propose a new way to see these networks using Hierarchical Combinatorial Complexes (HCCs). Think of this as a set of Russian nesting dolls:

The Elements (The Bricks): The raw data (numbers).
The Slices (The Piles): Grouping those numbers into rows or columns.
The Modes (The Shelves): Organizing those piles into specific dimensions (like height, width, color).
The Tensors (The Boxes): The actual 3D (or higher) containers holding the data.
The Operations (The Mixers): The machines that combine these boxes (like Matrix Multiplication).
The Architecture (The Factory Floor): How all the mixers and boxes are connected.

The key innovation here is that they explicitly model the "Tensor Operations" (the mixers). Previous theories ignored the specific shape and structure of these mixers. This paper says, "Let's count exactly how many gears are in the mixer and how they interlock."

2. The History Lesson: Why New Architectures Work

The authors used their new blueprint to look back at 40 years of neural network history. They measured the "complexity" of famous architectures (like the original Perceptron, CNNs, ResNets, and Transformers) by counting specific types of connections.

The Analogy: Imagine measuring the complexity of a car.

1986 (FCNN): A bicycle. Simple, one gear.
1998 (CNN): A car with a transmission. It has more gears (higher order operations) to handle different terrains.
2016 (ResNet): A car with a turbocharger and a bypass valve (skip connections). It adds more parts to the engine to make it run smoother.
2017 (Transformer): A jet engine. It uses a completely different, more complex type of combustion (a 3-way mixer instead of a 2-way one).

The Finding: Every time a "groundbreaking" architecture was invented, it wasn't just a tweak; it was a jump to a higher level of complexity. The paper found that the most successful models were the first to introduce a new "gear" or a new way of mixing data that hadn't been used before.

3. The Discovery: A Universe of Unbuilt Factories

Here is the most exciting part. The authors realized that while we have been building with 2-way mixers (binary operations) and 3-way mixers, there is a whole universe of 4-way, 5-way, and even higher mixers that we have completely ignored.

They asked: "What if we built a factory using these super-complex mixers?"

Using their framework, they didn't just guess; they systematically generated 3,028 new factory designs using these higher-complexity mixers. They didn't just theorize; they built them and tested them.

The Result:
They found that some of these "weird," high-complexity designs were shockingly efficient.

The Analogy: Imagine a standard delivery truck (MobileNetV2) that is famous for being small and efficient. The authors built a new vehicle using their complex mixers. This new vehicle was smaller (using only 10% of the parts) but could carry more cargo (achieved higher accuracy) than the famous truck.
Specifically, one of their new 5-layer models beat a famous 30-layer model while using a fraction of the parameters.

4. The "Red Star" Architecture

They highlighted one specific design (the "Red Star") that was a champion.

It used a "skip connection" (sending data around a mixer) but combined it with a very complex 4-way mixer.
It reused parts (weights) in clever ways, like a mechanic reusing a bolt from one engine part to fix another.
It proved that you don't need a massive, deep network to get great results; you just need the right kind of complex mixing.

Summary

This paper is like giving engineers a new set of tools to understand and build neural networks.

The Tool: A precise mathematical language to describe exactly how data is mixed, not just how it flows.
The Insight: History shows that breakthroughs happen when we invent new types of "mixers."
The Experiment: They built thousands of new designs using these unexplored, complex mixers.
The Surprise: Some of these new designs are incredibly efficient, outperforming current industry standards with far fewer resources.

The paper concludes that the future of neural networks might not be about making them deeper or wider, but about making them structurally more complex in ways we haven't tried yet. They have released their 3,000+ new designs for anyone to study and use.

Technical Summary: On the Architectural Complexity of Neural Networks

Problem Statement

Deep neural networks (DNNs) have achieved significant empirical success through the proliferation of diverse and complex architectures. However, existing unified theoretical frameworks (e.g., Geometric Deep Learning, Categorical Deep Learning) rely on high-level abstractions of tensor operations, often treating them as black-box parameterized functions or abstract linear transformations. This abstraction obscures the intricate hierarchical structure of tensor operations—specifically the lower-level information regarding how tensors are coupled, sliced, and transformed. Consequently, there is a gap in the theoretical understanding of how architectural complexity evolves over time and a lack of systematic methods to construct novel architectures based on new types of tensor operations. Furthermore, Neural Architecture Search (NAS) is currently limited to varying connections between fixed sets of existing operations, failing to explore the space of architectures built from fundamentally new tensor operations.

Methodology

The authors introduce a unified hierarchical combinatorial framework based on Hierarchical Combinatorial Complexes (HCCs). This framework explicitly models the structure of tensor operations rather than abstracting them away. The framework constructs a rank-5 HCC to represent neural networks, organized as follows:

Rank 0 — Elements: A base set of real-valued variables.
Rank 1 — Slices: Ordered sets derived from the elements.
Rank 2 — Modes: Partitions of slices, representing the dimensions of a tensor.
Rank 3 — Tensors: Generalized tensors defined as 3-cells. Unlike standard multidimensional arrays, these can represent "jagged" tensors (incomplete arrays) and "hyper-tensors" (mapping multi-indices to multiple elements) by utilizing partitions of ordered sets and strict weak orders.
Rank 4 — Operations: This level is divided into two types:
- Mode Maps: Functions between tensors that preserve slice space structures (e.g., flattening, unfolding, patch-ifying).
- Tensor Operations: Mechanisms for combining multiple tensors (e.g., matrix multiplication, Hadamard product, multi-head projection). These are defined via Tensor Operation Matrices (TOMs), which encode the incidence relationships between input tensors and the modes of the output tensor, including contractions (summations).
Rank 5 — Neural Networks: Composed of mode maps and tensor operations, represented by Tensor Equation Matrices (TEMs) that describe the relational structure between operations and tensors.

The framework introduces specific metrics to quantify Architectural Complexity:

Operation Complexity ( $C_{op}$ ): Number of operations.
Tensor Complexity ( $C_T$ ): Number of tensors.
Arity Complexity ( $C_\alpha$ ): Maximum number of operands in a single operation.
Order Complexity ( $C_O$ ): Maximum number of modes in an operation.
Coupling-Arity Complexity ( $C_A$ ): Maximum size of a coupling (shared modes between inputs).

The authors leverage this framework to perform two main tasks: a retrospective analysis of 40 years of DNN evolution and a systematic generation of novel architectures.

Key Contributions

Hierarchical Combinatorial Framework: The paper constructs the first framework that explicitly models the structure of tensor operations, parameterizing a broad space of architectures and formalizing concepts like architecture diagrams as incidence relationships.
Retrospective Complexity Analysis: The authors apply the framework to analyze eight foundational architectures (FCNN, CNN, ResNet, Transformer, Poly-Net, MO-Net, ViM, TT-Net). They define a "complexity signature" for each and trace the evolution of these signatures over the last four decades.
Systematic Generation of Novel Architectures: Moving beyond the boundary of known architectures, the authors systematically generate a dataset of 3,028 novel higher-complexity architectures. These are constructed by sampling new Tensor Operation Matrices (TOMs) and Tensor Equation Matrices (TEMs) with higher arity ( $C_\alpha$ ) and coupling arity ( $C_A$ ) than previously explored.
Theoretical Decomposition: The paper provides theoretical proofs (e.g., Theorem A.35) demonstrating that under specific conditions (associativity and distributivity of base operations), higher-arity tensor operations can be decomposed into sequences of binary operations, and conversely, sequences of binary operations can be equivalent to higher-arity operations.

Results

Evolution of Architectural Complexity

The analysis of historical architectures reveals a clear trend: groundbreaking architectural shifts correspond to increases in specific types of complexity.

FCNNs represent the baseline with low complexity.
CNNs introduced higher order complexity ( $C_O$ ) via convolution.
ResNets increased tensor and operation complexity ( $C_T, C_{op}$ ) via skip connections.
Transformers marked the first significant increase in Arity Complexity ( $C_\alpha$ ), utilizing ternary operations for self-attention.
Post-Transformer architectures (Poly-Net, MO-Net, ViM, TT-Net) further increased complexity, with some exploring higher coupling arity ( $C_A > 2$ ) and higher arity ( $C_\alpha > 3$ ).
The study notes that many high-complexity architectures were discovered accidentally or described using lower-complexity encodings; the framework reveals their true, higher-complexity signatures.

Novel Architecture Performance

The dataset of 3,028 sampled architectures was evaluated on image classification tasks (CIFAR-10, CIFAR-100, Tiny ImageNet).

Parameter Efficiency: Many sampled architectures demonstrated remarkable parameter and depth efficiency.
Specific Achievement: A specific "red star" architecture (sample $\star$ ) with only 5 layers and approximately 198,000 parameters (152,000 from the base stage, 46,342 from the novel block) achieved 65.52% accuracy on CIFAR-100.
Comparison: This performance surpassed MobileNetV2 (64.29% accuracy), a widely used lightweight architecture with 2.5 million parameters, using less than 10% of the parameters.
Efficiency: The results suggest that higher complexity tensor operations can yield models that are significantly more efficient than current state-of-the-art lightweight models.

Significance and Claims

The paper claims to provide the first unified language for rigorously analyzing and constructing neural networks based on the explicit structure of tensor operations. Its significance lies in:

Uncovering Hidden Complexity: It reveals that the evolution of deep learning is driven by increases in specific complexity metrics (particularly arity and coupling arity) that were previously obscured by high-level abstractions.
Defining Boundaries: It identifies the boundary of known architectural complexity classes, highlighting that large classes of higher-complexity architectures (e.g., $C_A > 2$ ) have remained largely unexplored.
Systematic Construction: It moves beyond trial-and-error or connection-based search (NAS) to a systematic construction of architectures from novel tensor operations.
Resource Efficiency: The empirical results demonstrate that exploring these higher-complexity spaces can lead to architectures that are not only novel but also significantly more parameter-efficient than existing models, challenging the assumption that performance requires massive parameter counts.

The authors conclude that their framework enables the exploration of new spaces of architectures built from higher complexity tensor operations, offering a path toward next-generation, highly efficient neural network designs. The dataset and code are publicly released to facilitate further research in this domain.

On the Architectural Complexity of Neural Networks