Emergence of Internal State-Modulated Swarming in Multi-Agent Patch Foraging System

The Big Idea: How Hungry Robots Learn to Huddle

Imagine a vast, empty field scattered with patches of delicious, glowing berries. Now, imagine hundreds of little robots (let's call them "Foragers") are dropped into this field. Their only goal is to eat as many berries as possible to survive. They can't talk to each other, they can't see the whole map, and they can't see what's behind a wall. They only have a few "flashlights" (rays) they can shine in front of them to see what's nearby.

The researchers wanted to answer a simple question: If these robots are just trying to eat, will they naturally start sticking together in a group (swarming), and does their hunger level change how tightly they huddle?

The Setup: The "Blind" Foragers

Think of these foragers like moths in a dark room.

The Sensors: Each robot has a few flashlights. If a flashlight hits a berry patch, it sees the berries. If it hits another robot, it sees a robot. But if a robot is standing behind another robot, or if the flashlight beam misses the gap between two objects, that robot is "invisible." This is called partial observability.
The Brain: Each robot has a tiny, simple brain (a neural network) that decides how fast to move and which way to turn based on what its flashlights see.
The Hunger Meter: Inside each robot is a "battery" representing its energy or food stores. If the battery is low, the robot is hungry. If it's high, the robot is full.

The Experiment: Evolution in a Video Game

The researchers didn't program the robots to swarm. Instead, they used a method similar to natural selection (like breeding the fastest horses).

They created 300 robots with slightly different "brains."
They let them loose in the berry field.
The robots that ate the most berries got to "reproduce" (their brain settings were copied and slightly tweaked for the next generation).
The robots that starved were discarded.
They repeated this for thousands of generations.

The Results: The Magic of Huddling

1. The "Sherlock Holmes" Effect

At first, the robots just wandered aimlessly. But after thousands of generations, they became experts. They learned to find the berry patches and stay there.
Here is the cool part: They started swarming.
Even though they weren't programmed to hold hands, they started clustering together. Why?

The Logic: If I see another robot standing still near a spot, it's a huge clue that there are berries there! Even if I can't see the berries myself (maybe they are hidden behind a hill), seeing a friend suggests, "Hey, food is over there."
The Metaphor: It's like walking into a dark room and seeing a group of people staring intently at a corner. You don't need to see the object they are looking at; you just know, "There must be something interesting over there," so you walk toward them.

2. The Hunger Factor (Risk-Sensitive Foraging)

The researchers then tested what happens when they remove the berries entirely. The robots still swarmed! But the tightness of the swarm changed based on how hungry they were.

The Full Robot: If a robot is full (high battery), it acts like a rich, cautious person. It doesn't need to crowd. It can afford to wander alone because it has a safety net. It keeps its distance from others.
The Hungry Robot: If a robot is starving (low battery), it acts like a desperate person. It needs to find food now. It clings tightly to the group. It thinks, "If everyone else is here, there must be food, and I can't afford to be left behind."
The Finding: The hungrier the robot, the tighter the swarm. The fuller the robot, the more space it keeps.

3. The "Urgency" Switch

To prove this wasn't just a coincidence, the researchers did a "brain surgery" on the simulation. They looked inside the robots' brains and found specific "neurons" (tiny switches) that tracked how hungry the robot was.

The Experiment: They took a robot that was actually full, but they forced its brain to think it was starving.
The Result: The robot immediately started rushing toward the other robots, acting desperate.
The Metaphor: It's like a person who just ate a huge meal but is given a fake "low battery" warning on their phone. Suddenly, they start running around looking for an outlet, even though they aren't actually tired. This proved that the "hunger signal" in the brain directly controls the urge to swarm.

Why Does This Matter?

This paper is a bridge between biology and robotics.

In Nature: It explains why animals (like fish or birds) might huddle together not just to protect themselves from predators, but because being near others is a smart way to find food when you can't see it all.
In Robotics: It shows that you don't need to program complex rules for a swarm of drones to work together. If you give them simple rules to "eat" and "survive," they will naturally figure out how to cooperate and huddle based on their own internal needs.

The Takeaway

The paper shows that swarming isn't just about following orders; it's about sharing information. When you are hungry, you stick close to the group because their presence is a signal that food is near. When you are full, you can afford to be independent. The researchers proved that this complex social behavior can emerge from simple, local rules without any central boss telling everyone what to do.

1. Problem Statement

The paper addresses two fundamental questions in multi-agent systems and active matter physics:

Emergence of Swarming without Explicit Coupling: Can coordinated swarming behavior emerge in a competitive foraging environment where agents only possess local passive sensing (no explicit communication or pre-programmed interaction rules like alignment or repulsion)?
Internal State Modulation: If swarming emerges, is its strength modulated by the agent's internal state (specifically, stored resource levels)? This relates to the biological concept of risk-sensitive foraging, where agents with low resources (high risk of starvation) might tolerate crowding to reduce search time, while well-fed agents might avoid aggregation.

The authors aim to demonstrate that swarming can arise purely from the optimization of a foraging objective under partial observability, and that the resulting behavior is dynamically regulated by the agent's internal energy reserves.

2. Methodology

A. Simulation Environment

Domain: A continuous 2D Cartesian plane containing $N$ foragers and $M$ stationary resource patches.
Foragers: Modeled as active particles (circular disks) that consume energy to move. They have no hard-body collisions; they can pass through each other.
Resource Patches: Circular disks with resources that regenerate at a constant rate. Resources are shared among overlapping foragers (scramble competition).
Sensing (Partial Observability):
- Foragers emit $R$ equiangular rays.
- Occlusion: Rays are blocked by other foragers or patches.
- Ray Data: Each ray detects the distance to the first object, the resource content of that object (if it's a patch or another forager), and whether the object is a forager or a patch.
- Internal State Input: The controller also receives the forager's velocity, angular velocity, current resource level, change in resource, and the count of overlapping objects.

B. Controller Architecture

Model: A Continuous-Time Recurrent Neural Network (CTRNN).
Function: Maps observations (ray data + internal state) to translational speed ( $s$ ) and rotational speed ( $u$ ).
Dynamics: The hidden state evolves via a differential equation resembling neuronal membrane potentials, allowing for smooth dynamics and evidence accumulation mechanisms.
Output: The network outputs control signals that drive the forager's kinematics.

C. Learning Algorithm

Objective: Maximize the terminal resource amount ( $e_T$ ) of a forager over a fixed horizon $T$ .
Optimizer: Covariance Matrix Adaptation Evolutionary Strategy (CMA-ES).
Concurrent Evaluation (Key Innovation): Instead of evaluating one policy per agent sequentially, the authors assign a unique policy sample to each forager within the same simulation rollout.
- This captures inter-agent interaction effects during the selection process.
- It eliminates the need for hand-crafted curricula to teach interaction.
- It significantly accelerates training by parallelizing the evaluation of the population.
Implementation: Built using ABMax (an agent-based modeling framework) and JAX for vectorized, GPU-accelerated simulation.

3. Key Contributions

Emergence of Swarming from Passive Sensing: The study demonstrates that swarming (aggregation) emerges spontaneously in a competitive foraging task without explicit alignment rules or communication, driven solely by the optimization of resource acquisition under partial observability.
Internal State-Dependent Modulation: The paper proves that the strength of the swarming behavior is inversely related to the forager's internal resource level. Agents with low resources aggregate more tightly, while well-fed agents remain dispersed.
Causal Link via Neural Analysis: Through targeted ablation and "clamping" experiments on the CTRNN hidden states, the authors establish a causal link between specific neural representations of internal resource levels and the onset of aggregation behavior.
Efficient Multi-Agent Training: The proposed concurrent evaluation strategy within CMA-ES offers a scalable method for evolving policies in multi-agent environments where interactions are critical.

4. Results

A. Adaptive Foraging

Training Convergence: The CMA-ES fitness curve showed rapid convergence.
Foraging Modes: The learned policy exhibited a spectrum of behaviors:
- Wait-and-Harvest: Orbiting a patch to let it regenerate.
- Opportunistic Traveler: Moving between patches.
- Agents dynamically switched between these modes based on local context (patch occupancy, resource intake).

B. Swarming Emergence

Validation: When resource patches were removed, agents still aggregated (swarmed).
Ablation: When inter-agent sensing (ray data regarding other foragers) was disabled, aggregation disappeared, and agents remained dispersed. This confirmed swarming relies on sensing neighbors as proxies for resource availability.

C. Internal State Modulation

Resource vs. Aggregation: The Mean Nearest Neighbor (MNN) distance was measured against clamped resource levels ( $\bar{e}$ $\overset{e}{ˉ}$ ).
- Low Resource ( $\bar{e}$ low): Agents aggregated tightly (low MNN).
- High Resource ( $\bar{e}$ high): Agents remained dispersed (high MNN).
Interpretation: This aligns with the Asset Protection Principle: well-provisioned agents are risk-averse and avoid the variance/cost of crowding, while starving agents take the risk of crowding to minimize search time.

D. CTRNN Hidden State Analysis

State Identification: In single-agent tests, specific hidden states (e.g., states 30 and 34) were found to monotonically track the internal resource level ( $\bar{e}$ ), while others remained saturated/indifferent.
Causal Perturbation: In a two-agent test (one free, one fixed), the hidden states of the free agent were clamped to values representing lower resource levels than their actual state.
- Result: The free agent approached the fixed agent significantly faster when the "low resource" states were artificially induced.
- Conclusion: This confirms that the neural representation of internal state directly drives the urgency of aggregation, acting as an "urgency-gating" mechanism.

5. Significance

Theoretical: The work bridges active matter physics, evolutionary robotics, and neuroscience. It provides a mechanistic explanation for how complex population-level behaviors (swarming) can emerge from simple local rules and internal state monitoring, without requiring complex global coordination.
Biological Plausibility: The findings support the hypothesis that animal swarming (e.g., fish schools, insect flocks) may be driven by individual risk assessment based on internal energy reserves, rather than just external geometric rules.
Technical: The concurrent evaluation method for CMA-ES offers a robust template for training multi-agent policies in complex, interactive environments, reducing the computational cost of capturing emergent social dynamics.
Future Directions: The authors suggest extending this to online adaptation (learning-to-learn) and exploring role differentiation (predator/prey dynamics) within the same controller framework.

In summary, the paper successfully demonstrates that internal state-modulated swarming is an emergent property of optimal foraging under partial observability, driven by a neural controller that uses internal resource levels to gate the urgency of aggregation.