Emergence of Spatial Representation in an Actor-Critic Agent with Hippocampus-Inspired Sequence Generator

⚕️

This is an AI-generated explanation of a preprint that has not been peer-reviewed. It is not medical advice. Do not make health decisions based on this content. Read full disclaimer

The Big Idea: A "Mental Time Machine" for Robots

Imagine you are trying to navigate a huge, foggy maze. You can only see a few feet in front of you, and the walls all look exactly the same. If you just look at what's right in front of your nose, you'll get lost immediately. You need a way to remember where you've been and predict where you're going.

This paper introduces a new way to build a robot (or AI agent) that can solve this problem. The researchers took inspiration from the hippocampus, the part of the human brain responsible for memory and navigation. They built a robot brain that mimics how our brain creates "mental movies" of the future, even when the sensory input is very blurry or sparse.

The Problem: The "Blind" Robot

Most AI navigation systems work like a person with perfect vision. They see the whole map clearly and calculate the best path. But in the real world (and in this experiment), the robot's "vision" is terrible.

The Input: The robot sees a low-resolution, black-and-white image.
The Sparsity: To make it even harder, the researchers made the robot's brain "sparse." Imagine the robot only gets a tiny spark of information once in a while, like a lighthouse beam flashing in a thick fog. Most of the time, it sees nothing.

If you give a standard AI (like an LSTM, which is a common type of memory network) this sparse, foggy data, it gets confused and fails. It's like trying to drive a car using only a single, flickering candle for light.

The Solution: The "Hippocampus" Module

The researchers built a special module for their robot called the CA3 Sequence Generator. Think of this as a mental time machine or a conveyor belt of memories.

Here is how it works, using an analogy:

The Dentate Gyrus (The Gatekeeper): Imagine a bouncer at a club. The robot sees a lot of visual data, but the bouncer (the DG) only lets a few "VIPs" (very specific, important landmarks) inside. This creates a sparse, high-quality list of clues.
The CA3 (The Conveyor Belt): Once a clue gets in, it doesn't just sit there. It jumps onto a conveyor belt that keeps moving.
- Even if the robot stops seeing new clues for a while, the old clues keep traveling down the belt.
- This creates a "trail" of past locations.
- Crucially, the belt also has a preview function. As a clue moves down the belt, it triggers a "ghost" of the next clue. This is the robot's way of "planning ahead" or "replaying" the path it just took, even before it physically gets there.

The Magic: Why It Works Better

The paper found something surprising: This "mental time machine" only works when the input is sparse.

In the Fog (Sparse Input): When the robot has very little information, the conveyor belt is a lifesaver. It holds onto the few clues it has and stretches them out over time, allowing the robot to build a map of the maze. It outperforms standard AI models by a huge margin.
In the Sunlight (Dense Input): When the robot has perfect, clear vision (lots of data), the conveyor belt actually gets in the way. Standard AI models (LSTMs) are better at processing a flood of clear data. The "mental time machine" is too slow and rigid for a flood of information.

The Analogy:

Standard AI (LSTM): Like a person with a high-definition map. They can handle a lot of detail but get lost if the map is torn or foggy.
This New Model (CA3): Like a person with a compass and a few key landmarks. They can't see the whole map, but they can remember, "I passed the big oak tree 10 seconds ago, so the exit must be to the left." They thrive in the fog.

What Did the Robot Learn?

As the robot learned to navigate, its internal "brain cells" started behaving exactly like real animal brain cells:

Place Fields: Specific neurons started firing only when the robot was in a specific spot (like a "Home" button).
Remapping: When the researchers moved the "goal" (the reward) to a new spot, the robot's internal map instantly rearranged itself to find the new path, just like a human would.
Orthogonalization: The robot learned to make its memories distinct. It stopped confusing "the red wall" with "the blue wall," creating a clean, organized map in its mind.

The Takeaway

This paper proves that simplicity and constraints can be powerful. By forcing the AI to work with very little information (sparse input) and giving it a specific structure to "remember" that information over time (the sequence generator), the AI naturally developed a sophisticated spatial map.

It suggests that the reason our own brains use complex, rhythmic firing patterns (theta sequences) isn't just to store memories, but to predict the future when our senses are unreliable. It's a biological hack that turns a "blind" moment into a clear path forward.

In short: If you want an AI to navigate a confusing, foggy world, don't give it a supercomputer with perfect vision. Give it a simple "mental conveyor belt" that helps it remember where it's been and guess where it's going next.

1. Problem Statement

The paper addresses two interconnected challenges:

Neuroscience: The origin of hippocampal theta sequences. While often attributed to sequential sensory drive or planning, the paper proposes a mechanistic interpretation where these sequences arise from intrinsic recurrent circuitry in the CA3 region that propagates transient inputs over long horizons, acting as a temporal memory buffer.
Reinforcement Learning (RL): How to construct robust spatial representations for navigation agents operating under sparse sensory inputs (low-bandwidth, noisy environments). Standard recurrent architectures (like LSTMs) often struggle when immediate sensory evidence is unreliable or sparse, requiring mechanisms to maintain long-range context without explicit geometric cues.

2. Methodology

The authors propose a minimal, biologically inspired architecture integrated into an Actor-Critic reinforcement learning agent for egocentric visual navigation.

A. The Environment

Platform: DeepMind Lab (19x19 tile continuous maze).
Conditions: Sparse obstacles, uniform visual textures (no distinct landmarks per tile), and a hidden reward.
Challenge: Spatial relations cannot be trivially inferred from visual similarity; the agent must rely on memory and path integration.

B. The Model Architecture

The agent consists of three main modules:

Visual Encoder (Fixed): A shallow ResNet (3 convolutional blocks) pre-trained on the environment. It extracts general visual features.
Dentate Gyrus (DG) - Sparsification Module:
- Maps visual features to a low-dimensional space ( $F=16$ ).
- Applies Batch Normalization and a high activation threshold ( $\tau=2.43$ ).
- Result: Produces extremely sparse activity (~2.5% active units), mimicking the low firing rate of biological DG granule cells.
CA3 - Sequence Generator (Fixed Recurrent Core):
- Modeled as a linear RNN shift register (not a trainable RNN).
- Mechanism: Each of the 16 DG features triggers a dedicated, pre-wired sequence of length $\ell = L + R - 1$ $ℓ = L + R - 1$ .
  - $L$ : Number of theta cycles (sequence length).
  - $R$ : Number of active units per cycle (repetition).
- Dynamics: A transient input $u_t$ injects activity into the first $R$ slots, which then shifts one step per timestep along the register. This propagates the input signal over time without external sequential drive.
Decoder & Actor-Critic:
- The flattened CA3 state is fed into a Multi-Layer Perceptron (MLP) decoder.
- The output drives the Policy (Action) and Value (Critic) heads.
- Training: Standard Advantage Actor-Critic (A2C/PPO) using Sample Factory. Only the DG projection weights and the Decoder weights are trained; the CA3 sequence generator is fixed.

C. Baselines

The proposed model (CA3) is compared against:

LSTM: A standard trainable LSTM with matched parameter counts.
Random RNN: A randomly initialized fixed RNN.
SSM (State Space Models): Fixed HiPPO-LegS and other variants from Gu et al. (2020).
Input Regimes: Tested under both Sparse Input (2.5% activity) and Dense Input (no thresholding).

3. Key Contributions

Mechanistic Hypothesis: Demonstrates that hippocampal theta sequences can emerge from intrinsic recurrent dynamics propagating sparse inputs, serving as a temporal buffer when sensory evidence is unreliable.
Sparse-Dense Regime Synergy: Reveals a critical interaction between input sparsity and memory architecture. The CA3 model outperforms LSTMs and SSMs under sparse input conditions but underperforms them under dense input.
Emergent Biological Phenomena: The model naturally develops neurobiologically plausible features without explicit supervision:
- Place Fields: Localized spatial tuning in CA3 units.
- DG Orthogonalization: Input features become increasingly orthogonal over time.
- Task-Dependent Remapping: Place fields shift when the reward location changes.
- Spatial Kernels: Population activity develops distance-dependent correlation structures.
Inductive Bias for RL: Proposes that sparse coding combined with intrinsic sequence dynamics is a powerful inductive bias for navigation tasks in low-bandwidth environments.

4. Key Results

A. Behavioral Performance

Sparse Input: The CA3 agent (with $L=64, R=8$ ) achieves ~~86% success rate, significantly outperforming LSTMs (~~56%), Random RNNs (~~51%), and SSMs (~~52-65%). LSTMs fail to reach 80% success within the training budget.
Dense Input: The trend reverses. LSTMs achieve ~93% success, while the CA3 agent drops to ~71%. This confirms that the CA3 mechanism is specifically optimized for sparse, noisy inputs where long-horizon buffering is critical.
Sequence Length: Performance degrades as sequence length ( $L$ ) decreases. An agent with $L=1$ (pure feedforward) fails to learn robust navigation.
Generalization: The agent transfers well to new reward locations (50M frames) and new maps (150M frames), indicating a generalized spatial representation.

B. Neural Representation Analysis

Place Fields: CA3 units develop localized "place fields" similar to biological data. Units further along the sequence ( $t+16$ ) exhibit broader spatial tuning (lower spatial information) compared to early units, matching biological observations.
Spatial Information (SI): Units activated later in the sequence have higher SI, correlating with the agent's ability to navigate. Permuting weights of high-SI units significantly degrades performance, proving causality.
Orthogonalization: The DG input to CA3 becomes increasingly orthogonal during training, creating unique representations for different locations.
Remapping: When the reward location changes, the center of mass of place fields shifts, demonstrating dynamic remapping.
Population Kernels: The CA3 population vector correlation shows a smooth, isotropic dependence on spatial distance, unlike the LSTM agent which shows non-isotropic, visually driven correlations.

5. Significance and Implications

Neuroscience: Provides a parsimonious explanation for hippocampal sequences, suggesting they are an intrinsic property of CA3 circuitry used to sustain transient sensory inputs, rather than solely a result of external sequential drive or complex planning algorithms. It explains how place cells can persist even with disrupted entorhinal inputs.
Machine Learning:
- Architecture Design: Suggests that for tasks with sparse or noisy observations, fixed, structured recurrent reservoirs (like the CA3 shift register) are superior to trainable RNNs like LSTMs.
- Inductive Bias: Highlights the value of "sparse coding + sequence generation" as a specific inductive bias for reinforcement learning in navigation.
- Interpretability: The model bridges the gap between black-box deep RL and interpretable biological mechanisms, showing how complex spatial representations can emerge from simple structural constraints.

In conclusion, the paper successfully demonstrates that a minimal, biologically inspired sequence generator can solve complex navigation tasks under sparse sensory conditions while reproducing key physiological phenomena of the mammalian hippocampus, offering a unified view of biological memory and efficient reinforcement learning.