Parallelized Hierarchical Connectome: A Spatiotemporal Recurrent Framework for Spiking State-Space Models

⚕️

This is an AI-generated explanation of a preprint that has not been peer-reviewed. It is not medical advice. Do not make health decisions based on this content. Read full disclaimer

Imagine you are trying to teach a computer to understand a long story, like a movie script or a medical report.

For a long time, computers had two main ways to do this, and both had a major flaw:

The "Super Fast" Way (Transformers/SSMs): These models read the whole story at once, like a super-fast scanner. They are incredibly efficient and can run on many processors simultaneously. But, they are like a line of people passing a note down a chain. Person A talks to Person B, who talks to Person C. They can't talk to each other sideways or talk back to the person who just spoke. They are strictly "one-way" and "one-at-a-time" in their thinking, which limits how complex their understanding can be.
The "Biologically Real" Way (Spiking Neural Networks): These models mimic the human brain. Neurons fire, send signals, and talk to their neighbors sideways and backwards. They are great at understanding complex patterns. But, they are slow. They have to wait for one neuron to finish before the next one can start, like a single person reading a book page by page. This makes them too slow for modern, massive datasets.

The New Solution: The "Parallelized Hierarchical Connectome" (PHC)

The authors of this paper built a new framework called PHC (and a specific version called PHCSSM) that combines the best of both worlds. Think of it as building a high-speed subway system for a giant, complex city.

Here is how it works, using simple analogies:

1. The "City Map" vs. The "Train Line"

Old Models: Imagine a train line where the train stops at Station A, then Station B, then Station C. The train can't go from A to C directly, and it can't have a conversation with Station B while moving. It's a straight line.
The PHC Model: Imagine a city with a central train line (the Neuron Layer) and a complex web of roads connecting all the neighborhoods (the Synapse Layer).
- The Train Line handles the time aspect (reading the story from start to finish). It moves super fast because it's parallel (many trains running at once).
- The Roads handle the space aspect (neighbors talking to neighbors). This is where the "lateral connections" happen.

2. The "Multi-Transmission Loop" (The Magic Trick)

This is the paper's biggest innovation. Usually, if you want a train to stop and talk to the city, the whole system has to pause.

The PHC Trick: The system runs a "loop" inside a single moment of time.
- Step 1: The train drops off a passenger (a signal).
- Step 2: The city roads instantly carry that passenger to different neighborhoods, who chat and pass the message around.
- Step 3: The message comes back to the train.
- Step 4: The train checks: "Did everyone agree on the message?" If yes, the train moves to the next station. If not, the city loops around one more time to refine the message.

This allows the computer to do deep, complex "thinking" (like a human brain) without slowing down the train. It does all the "chatting" in parallel, not one by one.

3. The "Biological Rules" (The Strict City Planner)

The authors didn't just build a fast system; they forced it to follow strict rules that real brains follow. This might sound like it would make things slower, but the paper proves it actually makes the system smarter and more stable.

Dale's Law (The "Good Guy/Bad Guy" Rule): In the brain, a neuron is either an "Exciter" (pushes the button to fire) or an "Inhibitor" (presses the brake). It can't be both. The PHC model enforces this. It prevents the system from getting confused or chaotic, acting like a built-in safety guard.
Short-Term Plasticity (The "Memory Fatigue" Rule): If you shout at someone repeatedly, they eventually stop listening (or get more excited). Real synapses get tired or get stronger based on recent activity. The model includes this "fatigue" mechanism, allowing it to adapt to how fast information is coming in, rather than just treating every signal the same.
Reward-Modulated Learning (The "Good Job" Rule): Instead of just calculating math errors, the model gets a "reward signal" (like a teacher saying "Good job!") when it gets a classification right. It uses this to tweak its connections instantly, learning from its mistakes in a very human-like way.

Why Does This Matter?

It's Cheaper: Because the model reuses the same "roads" and "stations" over and over (instead of building a new layer for every step), it uses 10 to 100 times fewer parameters (memory) than current top models. It's like building a small, efficient city instead of a massive, sprawling metropolis.
It's Fast: It keeps the speed of the "Super Fast" models. You can train it on long sequences (like long videos or years of medical data) without it taking forever.
It's Smarter: By following the "rules of the brain," it handles complex, messy data better. In tests, it beat the current state-of-the-art models on several medical and physiological benchmarks, even though it was much smaller.

The Bottom Line

The authors took the "speed" of modern AI and the "intelligence" of the human brain and merged them. They created a system that can think deeply and sideways (like a brain) but run at the speed of light (like a supercomputer), all while using a fraction of the energy and memory. It's a new way to build AI that is not just fast, but also biologically grounded and efficient.

1. Problem Statement

The paper addresses a fundamental trade-off in modern sequence modeling between computational efficiency and biological realism/expressivity:

Limitations of Standard SSMs: State-Space Models (SSMs) like S4, Mamba, and S5 achieve high-speed training via parallel associative scans ( $O(\log T)$ complexity) by constraining state-transition matrices to be diagonal. This ensures neurons are mutually decoupled within a timestep, preventing lateral (inter-neuron) interactions, feedback loops, or spatial recurrence. To compensate for this lack of expressivity, current models rely on "layer stacking" ( $L$ independent layers), which linearly increases parameter complexity to $\Theta(D^2L)$ and deviates from biological principles.
Limitations of Spiking Neural Networks (SNNs): While SNNs possess rich spatiotemporal dynamics (lateral connections, Dale's Law, plasticity), their inherent sequential dependencies force strictly sequential execution (Backpropagation Through Time, BPTT), creating prohibitive training bottlenecks for long sequences.
The Gap: No existing framework unifies learnable lateral connections (spatial recurrence) with parallel scan efficiency while enforcing strict neuro-physical constraints (e.g., Dale's Law, Short-Term Plasticity).

2. Methodology: The PHC Framework

The authors propose the Parallelized Hierarchical Connectome (PHC), a general framework that upgrades temporal-only SSMs into spatiotemporal recurrent networks. The core innovation is Intra-Step Spatiotemporal Decoupling.

A. Architectural Core: Neuron & Synapse Layers

Instead of stacking independent layers, PHC collapses the depth into a single spatial plane with two shared components:

Neuron Layer (NL): Encapsulates intrinsic temporal dynamics (e.g., membrane potential decay) as a strictly diagonal operator. This preserves $O(\log T)$ parallel scan efficiency.
Synapse Layer (SL): Mediates all inter-neuronal communication. It contains:
- Pre-synapse Module: Handles synaptic delays and Short-Term Plasticity (STP) dynamics.
- Post-synapse Module: Applies a biologically constrained weight matrix ( $W_{struct}$ ) to route signals.

B. The Multi-Transmission Loop

To recover the logical depth lost by collapsing layers, PHC introduces a Multi-Transmission Loop within a single timestep:

Sensory inputs and hidden states circulate $M$ times between the NL and SL.
This creates intra-slice spatial recurrence, allowing signals to propagate recurrently across the hierarchical connectome before moving to the next timestep.
Convergence: The loop uses a Cauchy convergence criterion to enable dynamic early exit, stopping when the synaptic current stabilizes.
Complexity: This achieves deep logical processing with only $\Theta(D^2)$ parameters (shared across transmission steps) rather than $\Theta(D^2L)$ .

C. Biological Constraints (PHCSSM Instantiation)

The framework is instantiated as PHCSSM, a Spiking State-Space Model that integrates five specific neuro-physical priors:

Adaptive Leaky Integrate-and-Fire (ALIF): Neurons have adaptive thresholds and refractory periods, modeled via log-domain parallel scans.
Dale's Law: The weight matrix enforces sign constraints (excitatory neurons only excite; inhibitory only inhibit).
Short-Term Plasticity (STP): Uses the Tsodyks-Markram formalism to make synaptic weights time-varying and state-dependent (facilitation and depression).
Hierarchical Connectome Topology: Neurons are partitioned into regions (e.g., $R_0, R_1$ ) with specific feedforward and feedback projection rules (e.g., $R_0 \to R_1$ ).
Reward-Modulated STDP (R-STDP): An online learning rule where binary spike timing updates weights based on a global reward signal, complementing gradient descent.

3. Key Contributions

First Parallelizable SSM with Lateral Connections: PHC is the first framework to introduce learnable weighted lateral connections within an SSM recurrence structure while maintaining $O(\log T)$ training efficiency. It resolves the dichotomy between parallel scans and spatial recurrence.
Parameter Efficiency: By substituting sequential layer stacking with a shared hierarchical connectome and multi-transmission loop, PHCSSM reduces parameter complexity from $\Theta(D^2L)$ to $\Theta(D^2)$ .
Parallelized Neuro-Physical Dynamics: The authors derive mathematical formulations that translate non-linear biological dynamics (ALIF, STP) into affine recurrences solvable via log-domain parallel prefix sums, enabling scalable training.
Native Online Learning: PHCSSM supports Reward-Modulated STDP using genuine binary spikes, a capability structurally impossible for continuous-valued SSMs which rely on surrogate gradients.

4. Experimental Results

The model was evaluated on six physiological benchmarks from the UEA Multivariate Time-Series Classification Archive.

Performance:
- SCP2: Achieved 59.3% accuracy, setting a new state-of-the-art for SSMs on this benchmark (surpassing LinOSS-IMEX at 58.9%).
- MotorImagery: Outperformed Mamba by 6.0 percentage points (53.7% vs. 47.7%).
- EigenWorms: Achieved 83.9% accuracy with only 2,701 parameters, surpassing larger models like LinOSS-IMEX and Mamba.
Parameter Efficiency: PHCSSM uses 1 to 2 orders of magnitude fewer parameters than comparable SSMs (e.g., 9,485 parameters vs. 448,072 for LinOSS-IMEX on SCP2).
Ablation Study: Removing any of the five biological constraints resulted in a performance drop, confirming they act as stabilizing inductive biases rather than bottlenecks. Specifically, Dale's Law and R-STDP were found to significantly reduce training variance and improve generalization.
Efficiency: Training time and GPU memory usage were comparable to unconstrained baselines, demonstrating that biological constraints do not impose asymptotic computational overhead.

5. Significance

Paradigm Shift: The paper challenges the assumption that biological realism must come at the cost of computational efficiency. It demonstrates that biologically grounded inductive biases can lead to more parameter-efficient and robust sequence modeling.
Unification: It bridges the gap between the theoretical efficiency of linear SSMs and the dynamic richness of recurrent spiking networks, offering a path to "digital twins" of neural circuits that are trainable at scale.
Scalability: By decoupling spatial depth (transmission steps) from temporal depth (sequence length), the framework opens new avenues for modeling complex spatiotemporal systems without the memory bottlenecks of traditional RNNs or the expressivity limits of diagonal SSMs.

In summary, PHCSSM proves that strict biological constraints (Dale's Law, STP, etc.) can be mathematically integrated into parallelizable architectures to create models that are not only biologically plausible but also superior in parameter efficiency and competitive in performance.