Beyond Exascale: Dataflow Domain Translation on a Cerebras Cluster

This paper introduces the Domain Translation algorithm, which overcomes the limitations of traditional domain decomposition on Exascale systems by achieving unprecedented performance and perfect weak scaling (88% of peak) on a 64-node Cerebras CS-3 cluster to simulate planetary-scale tsunamis at 112 PFLOP/s.

Original authors: Tomas Oppelstrup, Nicholas Giamblanco, Delyan Z. Kalchev, Ilya Sharapov, Mark Taylor, Dirk Van Essendelft, Sivasankaran Rajamanickam, Michael James

Published 2026-02-24
📖 5 min read🧠 Deep dive

This is an AI-generated explanation of the paper below. It is not written or endorsed by the authors. For technical accuracy, refer to the original paper. Read full disclaimer

The Big Problem: The "Traffic Jam" in Supercomputing

Imagine you are trying to simulate a massive event, like a tsunami hitting a planet, or the weather changing over a year. To do this, supercomputers break the world into a giant grid of tiny squares (like a chessboard). Each square needs to talk to its neighbors to figure out what happens next.

The Old Way (Von Neumann Architecture):
Think of traditional supercomputers like a giant factory with a single, massive central warehouse (memory). All the workers (processors) have to run back and forth to this warehouse to get their instructions and data.

  • The Bottleneck: As the factory gets bigger (more processors), the workers spend more time running to the warehouse and less time working. This is called the "Memory Wall."
  • The Result: Even with super-fast computers, they spend a lot of time waiting. When you try to simulate a global event, the computers get stuck in traffic jams waiting for data to arrive from other parts of the cluster. They are fast, but they are inefficient.

The New Hardware: The "Wafer-Scale Engine"

The researchers used a special computer made by Cerebras Systems. Instead of a factory with a central warehouse, imagine a giant, flat city where every house (processor) has its own tiny pantry (memory) right in the kitchen.

  • No Running: The workers never leave their houses. They just pass ingredients to their immediate neighbors.
  • The Scale: This city is built on a single, massive silicon wafer (the size of a dinner plate), containing hundreds of thousands of these "houses."

The New Software: "Domain Translation" (The Moving Sidewalk)

Even with this amazing hardware, there was still a problem when connecting many of these cities together. If City A needs to send a message to City B, there is a tiny delay (latency) while the message travels across the internet.

In traditional computing, if you divide a simulation between City A and City B, the workers at the border have to stop and wait for the message to arrive before they can take their next step. This slows everything down.

The Solution: The Moving Sidewalk
The authors invented a clever trick called Domain Translation.

Imagine a long, moving sidewalk (like at an airport) that carries people from one side of a room to the other.

  1. The Old Way: You stand still, and the world moves around you. If you need to talk to someone on the other side, you wait for them to walk over to you.
  2. The New Way (Domain Translation): Instead of the data staying still and waiting, the data moves.
    • Imagine the "grid" of the simulation is printed on a giant conveyor belt.
    • As the simulation runs, the entire grid shifts one step to the right every second.
    • The workers (processors) stay in their fixed spots.
    • Because the grid is moving, a worker who was just talking to their neighbor on the left is now talking to a different neighbor on the right.
    • The Magic: The "message" (data) is always moving in the same direction as the conveyor belt. It never has to go "backwards" against the flow.

Why this is genius:
In a traditional setup, a worker at the edge of a computer chip has to wait for a message to travel all the way from the other chip (a delay of 10 microseconds).
With Domain Translation, the worker does 1,000 steps of work while the message is traveling. By the time the message finally arrives, the worker has already finished a huge chunk of work and is ready to use it immediately. The waiting time is completely hidden.

The Results: Breaking the Speed Limit

The researchers tested this on a cluster of 64 of these massive computer chips.

  1. Speed: They simulated a tsunami caused by an asteroid hitting the ocean. They achieved 1.6 million time steps per second. To put that in perspective, if you were simulating a year of weather, you could do it in a fraction of a second.
  2. Efficiency: They reached 88% of the computer's maximum theoretical speed. Most supercomputers only reach 1-5% of their max speed for these kinds of tasks because they are stuck waiting for data.
  3. Power: They did this while using very little electricity compared to other supercomputers. It's like driving a car that gets 100 miles per gallon while going 200 mph.

The Real-World Impact: The Asteroid Tsunami

To prove it worked, they simulated a terrifying scenario: A massive asteroid hitting the ocean.

  • They modeled the wave spreading across the entire planet.
  • They could see the wave hit San Francisco Bay in their simulation.
  • Because the computer was so fast, they could run these simulations in real-time or faster, which is crucial for predicting disasters or understanding climate change.

Summary Analogy

  • Old Supercomputers: A relay race where runners have to stop at a central post office to pick up the baton. The post office is far away, so the race is slow.
  • Cerebras + Domain Translation: A relay race where the baton is a ball rolling down a long, moving conveyor belt. The runners are standing on the belt. They just grab the ball as it passes them, do their job, and pass it to the next person. The ball never stops, and the runners never wait.

The Bottom Line: This paper shows that by changing how we move data (making the data move with the calculation) and using a new type of computer chip, we can finally unlock the true speed of supercomputers. We can now simulate complex physical events (like tsunamis and weather) with unprecedented speed and efficiency.

Drowning in papers in your field?

Get daily digests of the most novel papers matching your research keywords — with technical summaries, in your language.

Try Digest →