Scalable Construction of Spiking Neural Networks using up to thousands of GPUs

This paper presents a novel MPI-based method for constructing and simulating large-scale spiking neural networks on multi-GPU clusters and exascale supercomputers, demonstrating efficient scaling for complex cortical models through optimized local connectivity and spike exchange strategies.

Original authors: Bruno Golosio, Gianmarco Tiddia, José Villamar, Luca Pontisso, Luca Sergi, Francesco Simula, Pooja Babu, Elena Pastorelli, Abigail Morrison, Markus Diesmann, Alessandro Lonardo, Pier Stanislao Paolucc
Published 2026-05-18
📖 5 min read🧠 Deep dive

Original authors: Bruno Golosio, Gianmarco Tiddia, José Villamar, Luca Pontisso, Luca Sergi, Francesco Simula, Pooja Babu, Elena Pastorelli, Abigail Morrison, Markus Diesmann, Alessandro Lonardo, Pier Stanislao Paolucci, Johanna Senk

Original paper licensed under CC BY 4.0 (http://creativecommons.org/licenses/by/4.0/). ⚕️ This is an AI-generated explanation of a preprint that has not been peer-reviewed. It is not medical advice. Do not make health decisions based on this content. Read full disclaimer

Imagine trying to simulate the human brain on a computer. The brain is a massive city of about 86 billion neurons, where each neuron is a house sending tiny electrical "text messages" (called spikes) to thousands of other houses every second. To simulate this, you need a supercomputer with thousands of graphics cards (GPUs) working together.

The problem is that these GPUs are like islands. They are fast, but they don't talk to each other easily. If one island wants to send a message to another, the "mailman" (the communication system) has to run back and forth, which slows everything down.

This paper introduces a new, much faster way to build the map of these connections before the simulation starts, so the GPUs can run the simulation without getting stuck in traffic.

Here is how they did it, explained simply:

1. The Old Way: Building the Map on the Mainland

Previously, when scientists wanted to simulate a brain network, they built the "connection map" on the slow, central computer (the CPU) first. Then, they had to copy this massive map over to the fast GPUs.

  • The Analogy: Imagine you are organizing a massive party. In the old method, you wrote down every single guest's name and who they know on a piece of paper in the kitchen (CPU), then ran to every single room (GPU) to hand them a copy of the list. This took a long time just to get ready.

2. The New Way: Building the Map Inside the Rooms

The authors developed a new method where each GPU builds its own part of the connection map directly inside its own memory, without waiting for the central computer.

  • The Analogy: Now, instead of writing the list in the kitchen, every room has its own notepad. As soon as the party starts, the guests in each room write down who they know right there. No running back and forth to the kitchen is needed.
  • The Result: This "onboard" construction is more than 10 times faster than the old way. In one test, it took 55 seconds to build the network instead of nearly 12 minutes.

3. Two Ways to Send Messages

Once the map is built, the GPUs need to exchange the "text messages" (spikes) during the simulation. The paper tested two different strategies for this, depending on how the network is organized:

  • Strategy A: The Direct Phone Call (Point-to-Point)

    • How it works: If a neuron in GPU #1 needs to talk to a specific neuron in GPU #2, it calls that specific GPU directly.
    • Best for: Networks where connections are uneven or specific (like a real brain where some areas talk a lot to each other, but not to everyone).
    • The Paper's Claim: They used this for a model of the monkey's visual cortex (32 different areas). It worked perfectly, proving the new map-building method is compatible with complex, real-world brain structures.
  • Strategy B: The Group Chat (Collective Communication)

    • How it works: Instead of calling individuals, a GPU shouts its messages to a whole group of GPUs at once. Everyone in the group hears the shout and checks if the message is for them.
    • Best for: Huge, random networks where everyone talks to everyone (like a balanced crowd).
    • The Paper's Claim: They tested this on a massive "balanced network" scaling up to 1,024 GPUs. This is a huge number of graphics cards working together. They showed that even with this many cards, the system scales up smoothly without crashing.

4. The "Memory Levels" Trick

GPUs have a lot of memory, but not infinite. Storing the connection maps for billions of neurons takes up a lot of space.

  • The Analogy: Imagine you have a small desk (GPU memory) and a huge warehouse (CPU memory).
  • The Solution: The authors created four "levels" of organization.
    • Level 0: Keep the maps in the warehouse (CPU) and only bring what you need to the desk. This saves desk space but is slower to fetch.
    • Level 3: Fill the desk with everything. This is the fastest but requires a bigger desk.
  • The Paper's Claim: They showed that by choosing the right level, they could run simulations on the Leonardo Booster supercomputer (which has 4,096 GPUs) and even predict that the upcoming JUPITER supercomputer could simulate a network with 230 million neurons and 2.5 trillion synapses. That is roughly the size of the human cortex!

Summary of What They Achieved

  • Speed: They made the "setup" phase of brain simulations 10x faster by building the network map directly on the graphics cards.
  • Scale: They proved this works on up to 1,024 GPUs simultaneously.
  • Flexibility: They showed two different ways to handle communication (direct calls vs. group chats) so scientists can choose the best method for their specific brain model.
  • Future Proof: Their methods are designed to work on the next generation of "Exascale" supercomputers, which will be powerful enough to simulate a full human brain with individual synapse details.

In short, they didn't just make the simulation run faster; they built a better "road system" for the data so the supercomputer doesn't get stuck in traffic before the race even begins.

Drowning in papers in your field?

Get daily digests of the most novel papers matching your research keywords — with technical summaries, in your language.

Try Digest →