On De-Individuated Neurons: Continuous Symmetries Enable Dynamic Topologies

Imagine you are building a house out of LEGO bricks. In a standard AI neural network, every brick is a distinct, individual "neuron." If you want to make the house bigger or smaller, you have to carefully move specific bricks around. If you pull one out, the whole structure might collapse because that brick was holding up a specific corner. This is why changing the size of an AI model usually breaks it or requires a massive amount of retraining.

This paper, "On De-Individuated Neurons," proposes a radical new way to build these digital houses. Instead of using distinct, individual bricks, the author suggests building with liquid clay or malleable metal.

Here is the breakdown of the paper's ideas using simple analogies:

1. The Problem: The "Individual Brick" Trap

In current AI, we treat every neuron like a unique person with a specific job.

The Issue: If you want to fire a neuron (remove it) or hire a new one (add it), it's like firing a specific employee and hiring a replacement. The new person doesn't know the job, and the team has to relearn how to work together.
The Result: You can't easily change the size of the network without losing what it has learned.

2. The Solution: "Isotropic" Neurons (The Liquid Clay)

The author introduces a new type of mathematical building block called an "Isotropic Activation Function."

The Analogy: Imagine instead of individual bricks, you have a block of playdough.
How it works: In this playdough model, there are no "individual" neurons. The whole layer is just one big, continuous shape. Because the shape is perfectly symmetrical (isotropic), it doesn't matter which part of the playdough you call "Neuron A" or "Neuron B." They are all the same.
The Magic: Because there are no distinct individuals, you can stretch the playdough to make the layer wider (add neurons) or squish it to make it narrower (remove neurons) without changing the shape of the house. The function remains exactly the same.

3. The Trick: The "Diagonal" View

How do you actually cut or add to this playdough without messing it up? The paper uses a mathematical trick called Diagonalization.

The Analogy: Imagine looking at a tangled ball of yarn. It looks messy and impossible to untangle. But if you shine a light from a specific angle (a "basis change"), the shadows of the yarn lines up perfectly in straight, parallel rows.
The Process: The author shows how to rotate the network's view so that every connection lines up perfectly one-to-one.
- Neurodegeneration (Pruning): Once the connections are lined up, you can see which threads are very thin (weak). You can simply snip those thin threads. Because the system is symmetrical, snipping a weak thread doesn't break the whole tapestry; it just removes a tiny bit of weight.
- Neurogenesis (Growth): Conversely, you can add new, empty threads (scaffold neurons) that are currently doing nothing. Because the system is flexible, these new threads can be "trained" to start working without disrupting the existing pattern.

4. The "Intrinsic Length" (The Safety Net)

When you cut a thread, sometimes a tiny bit of "fray" (bias) is left behind that could ruin the pattern.

The Analogy: Imagine a balloon. If you cut a piece off, the air might rush out. To stop this, the author introduces a new parameter called "Intrinsic Length."
Function: Think of this as a hidden, invisible spring inside the playdough. When you cut a neuron, this spring absorbs the leftover "fray" or bias, ensuring the house stays perfectly stable even as you shrink it.

5. The Biological Connection: Growing and Shrinking

Nature does this all the time. A baby's brain has way too many neurons. As they learn, the brain prunes the useless connections and strengthens the useful ones.

The Paper's Discovery: The author tested this on a computer vision task (recognizing cats and dogs). They started with a network that had too many neurons, let it grow, and then let it shrink.
The Result: The network that started big and shrank down performed better than a network that stayed the same size. It mimicked the biological advantage of "over-abundance followed by pruning."

6. The "50% Sparsity" Bonus

Because the network can be rearranged into these perfect, straight lines (diagonalized), the author discovered something amazing:

The Analogy: You can rearrange a messy room so that half the furniture is stacked perfectly against the wall, leaving the other half of the room empty.
The Result: You can theoretically remove 50% of the connections in a dense network and still have it work exactly the same way. It's like having a super-efficient version of the AI that uses half the memory but does the exact same job.

Summary

This paper suggests we stop thinking of AI neurons as distinct, fragile individuals. Instead, we should view them as a fluid, symmetrical system. By doing this, we gain the superpower to:

Grow and shrink the AI in real-time as it learns.
Prune the weak parts without breaking the brain.
Save space by cutting the network in half without losing performance.

It's a shift from building with rigid LEGOs to sculpting with intelligent, self-healing clay.

1. Problem Statement

Current artificial neural networks (ANNs) rely on element-wise functional forms (e.g., standard activation functions like ReLU or Tanh applied individually to neurons). This construction implicitly assumes that neurons are individuated units with a fixed basis. Consequently:

Architectural Rigidity: Modifying network topology (adding or removing neurons) in real-time is difficult because strong interconnectivity and basis-dependence make it hard to preserve function during structural changes.
Limited Pruning/Growth: Existing methods for pruning (neurodegeneration) or growing (neurogenesis) often rely on discrete permutations or masking, which can degrade performance or require retraining.
Lack of Biological Analogy: Biological brains exhibit plasticity, growing and shrinking neurons dynamically. Current ANNs lack a mathematical framework to replicate this continuous structural adaptation while maintaining functional invariance.

The paper posits that the assumption of "individuated neurons" is a conceptual limitation rather than a mathematical necessity, preventing true dynamic topology.

2. Methodology

The core methodology involves replacing element-wise primitives with Isotropic Primitives, specifically Isotropic Activation Functions, derived from continuous orthogonal symmetries ( $O(n)$ ).

A. Isotropic Activation Functions

Instead of applying a function $f(x_i)$ to each component $i$ of a vector, the paper defines activation functions that are equivariant under orthogonal transformations.

Definition: A function $f: \mathbb{R}^n \to \mathbb{R}^n$ is isotropic if $f(R\vec{x}) = Rf(\vec{x})$ for any orthogonal matrix $R \in O(n)$ .
Form: These functions take the form $f(\vec{x}) = \sigma(\|\vec{x}\|) \hat{x}$ , where the output direction is preserved relative to the input, and the magnitude is scaled by a function of the input norm.
Implication: This creates basis-independent (de-individuated) neurons. There is no canonical basis for decomposition; the "neuron" is a gauge-dependent concept.

B. Layer Diagonalisation

Leveraging the basis independence, the authors propose a layer-wise diagonalisation procedure using Singular Value Decomposition (SVD).

Mechanism: By applying orthogonal reparameterisations to the affine layers surrounding an isotropic non-linearity, the weight matrix of the intermediate layer can be transformed into a diagonal matrix ( $\Sigma$ ).
Result: The network is re-expressed such that neurons in one layer communicate one-to-one with neurons in the next layer, ordered by singular value magnitude.
Sparsity: This allows the network to be re-expressed in a sparse basis. Theoretically, a dense network can be diagonalised to retain exact functionality with 50% fewer parameters asymptotically.

C. Dynamic Topology (Neurogenesis & Neurodegeneration)

Once diagonalised, the network can dynamically change its width:

Neurodegeneration (Pruning): Neurons corresponding to small singular values ( $\Sigma_{ii} \approx 0$ $Σ_{ii} \approx 0$ ) can be removed. Because the connectivity is one-to-one, removing a "neuron" is equivalent to removing a single connection, minimizing functional loss.
- Challenge: Removing a neuron leaves a residual bias term.
- Solution: Introduction of an "Intrinsic Length" parameter ( $o$ ). This trainable parameter acts as a perpendicular bias that "absorbs" the residual bias, ensuring the network remains functionally invariant even as weights approach zero.
Neurogenesis (Growth): New "scaffold neurons" can be added by expanding the dimensionality of the vector space. These new neurons are initialized with zero singular values (functionally inactive) but are connected via the affine maps. Due to the non-diagonal Jacobian of isotropic functions, gradients flow to these new neurons, allowing them to differentiate and become active during training.

3. Key Contributions

Ontological Inversion (Symmetry-Led Design): The paper reverses the traditional design logic. Instead of deriving symmetries from the concept of individual neurons, it prescribes symmetries (continuous orthogonal groups) to define the primitive functional forms. Neurons emerge as a derived concept rather than a fundamental unit.
Isotropic Primitives & De-individuation: It introduces a class of non-element-wise primitives that are basis-independent. This removes the "individuation" of neurons, allowing for continuous reparameterisation without functional degradation.
Dynamic Topology Framework: It provides a mathematically rigorous procedure for real-time network restructuring (growth and pruning) that preserves function exactly (for growth) or with high approximation (for pruning).
Intrinsic Length Parameter: A novel tunable parameter introduced to handle the bias terms during pruning, ensuring analytical invariance.
Theoretical Sparsity: Demonstrates that isotropic networks can achieve an asymptotic 50% sparsity factor (reducing parameters by half) while retaining exact network functionality through diagonalisation.

4. Experimental Results

The methodology was tested on CIFAR-10 classification using Multi-Layer Perceptrons (MLPs) with isotropic-tanh activation functions.

Dynamic Adaptation: Networks were trained, then their widths were dynamically adjusted (increased or decreased) by $\pm 1$ neuron per epoch for 48 additional epochs.
Performance Stability:
- Networks transitioning between widths (16, 24, 32 neurons) maintained high accuracy with minimal loss.
- Overabundance Hypothesis: Networks that started with a wider width (32) and were pruned down to 16 or 24 neurons performed better than networks that started at the smaller width. This mirrors biological findings where initial overproduction of neurons followed by pruning leads to more efficient networks.
- Failure Case: Pruning down to 8 neurons caused a significant accuracy drop, indicating a hard limit on capacity for this specific task.
Comparison: Isotropic networks significantly outperformed control networks using standard anisotropic (element-wise) Tanh activations, even when the isotropic networks were pruned.

5. Significance and Future Implications

Biological Plausibility: The approach offers a mathematical framework for artificial neural networks that mimics biological neuroplasticity (growth and pruning) more closely than current static architectures.
Efficiency: The ability to dynamically adjust network size based on task demand could lead to more efficient models that allocate resources only where necessary.
Interpretability: The diagonalisation procedure provides a clear view of which "neuron-to-neuron" connections are impactful, offering new avenues for mechanistic interpretability.
New Design Space: By decoupling the concept of a "neuron" from a fixed basis, the paper opens a new design axis for neural networks, suggesting that future primitives could be defined by continuous symmetries rather than discrete element-wise operations.
Limitations & Future Work: The paper notes that while forward-pass function is invariant, gradient trajectories under adaptive optimizers (like Adam) may diverge due to the reparameterisation. Future work will explore merging networks with different functions and continual learning applications.

In summary, this paper proposes a paradigm shift from static, element-wise networks to dynamic, symmetry-principled networks, enabling real-time architectural adaptation with minimal functional cost.

On De-Individuated Neurons: Continuous Symmetries Enable Dynamic Topologies

1. The Problem: The "Individual Brick" Trap

2. The Solution: "Isotropic" Neurons (The Liquid Clay)

3. The Trick: The "Diagonal" View

4. The "Intrinsic Length" (The Safety Net)

5. The Biological Connection: Growing and Shrinking

6. The "50% Sparsity" Bonus

Summary

1. Problem Statement

2. Methodology

A. Isotropic Activation Functions

B. Layer Diagonalisation

C. Dynamic Topology (Neurogenesis & Neurodegeneration)

3. Key Contributions

4. Experimental Results

5. Significance and Future Implications

More like this

Complexity of Classical Acceleration for ℓ1\ell_1ℓ1​-Regularized PageRank

MapTab: Are MLLMs Ready for Multi-Criteria Route Planning in Heterogeneous Graphs?

Language Guided Adversarial Purification

Graph-based Active Learning for Entity Cluster Repair

Neural Green's Operators for Parametric Partial Differential Equations

Complexity of Classical Acceleration for $\ell_1$ -Regularized PageRank