ODIN-Based CPU-GPU Architecture with Replay-Driven Simulation and Emulation

This paper presents a replay-driven validation methodology that utilizes deterministic waveform capture and replay across simulation and emulation to efficiently validate a complex, chiplet-based ODIN CPU-GPU architecture, significantly accelerating debug cycles and enabling successful end-to-end system integration within a single quarter.

Nij Dorairaj, Debabrata Chatterjee, Hong Wang, Hong Jiang, Alankar Saxena, Altug Koker, Thiam Ern Lim, Cathrane Teoh, Chuan Yin Loo, Bishara Shomar, Anthony Lester

Published 2026-03-18
📖 5 min read🧠 Deep dive

Imagine you are building a massive, high-tech city called ODIN. This city has two main districts:

  1. The Control Tower (CPU): This is the brain. It's great at making decisions, managing traffic lights, and handling urgent, one-off tasks.
  2. The Factory District (GPU): This is the muscle. It has thousands of workers who can do the exact same task simultaneously (like painting a million bricks at once).

The goal of this paper is to get these two districts to work together perfectly before they are actually built in real life (a process called "pre-silicon validation").

The Problem: The "Black Box" Chaos

When you try to connect the Control Tower and the Factory, things get messy.

  • Too Many Rules: They have to follow incredibly complex, secret handshake protocols to talk to each other.
  • The "What If" Nightmare: If you try to simulate their conversation on a computer, it's so slow that you can't watch a whole day of work happen. If you try to speed it up on special hardware (emulation), you can't see inside the conversation to find out why a mistake happened.
  • The "Blind" Test: Traditionally, engineers would write a script to tell the Factory what to do. But if the script is wrong, or if the Factory behaves unexpectedly, the whole city stops, and nobody knows who to blame. Was it the Control Tower? The Factory? Or the road between them?

The Solution: The "Replay Engine" (The Magic Tape Recorder)

The authors of this paper invented a clever trick called Replay-Driven Simulation.

Here is the analogy:
Imagine you are a director filming a movie. You have a very difficult scene where the Control Tower and the Factory have a perfect, complex conversation.

  • Old Way: Every time you want to test a new road layout, you have to hire the actors, write a new script, and hope they remember their lines perfectly. If they mess up, you have to guess why.
  • The New Way (Replay): You record the perfect conversation once on a high-quality tape recorder. Now, whenever you want to test a new road layout, you just play the tape.

The tape contains everything:

  1. What the Control Tower said.
  2. What the Factory said back.
  3. Exactly when they said it (the timing).

How It Works in the Paper

  1. The Recording (Capture): First, they take the GPU (Factory) and test it alone. They record every single signal it sends and receives during a complex task. This is the "Golden Tape."
  2. The Playback (Replay): When they connect the GPU to the CPU and the rest of the city (the SoC), they don't ask the GPU to "think" and generate new traffic. Instead, they plug in the "Golden Tape."
    • The tape plays the GPU's requests to the CPU.
    • The tape also plays the CPU's responses back to the GPU.
    • The Magic: The system acts exactly as if the real GPU and CPU were talking, but it's actually just a recording playing on a loop.

Why This is a Game-Changer

  • One Script for Two Stages: Usually, you need one script for slow computer simulations and a totally different one for fast hardware emulation. This method uses the same recording for both. It's like using the same movie reel for both a slow, detailed review and a fast-paced screening.
  • Instant Debugging: If the city crashes, they don't have to guess. They can rewind the tape to the exact second the crash happened and see exactly what signal caused it. It turns a "Where did the fire start?" mystery into a "Here is the spark" fact.
  • Speed: Because they don't have to write complex new scripts for every test, they got the whole system running and booting up in just one quarter (three months).

The Catch (The Fine Print)

Like any good magic trick, there are a few rules:

  • No Improvisation: The actors (the chips) can't make things up on the spot. They have to follow the tape exactly. This means they had to turn off some "randomness" features that are usually used to test for rare glitches.
  • Fixed Speed: The tape plays at a fixed speed. You can't speed it up or slow it down dynamically during the test.

The Bottom Line

This paper describes a method where engineers stopped trying to predict how a complex computer chip would behave and started recording how it behaves when it works right. Then, they used that recording to test the rest of the system.

It's like building a new car engine. Instead of trying to guess how the pistons will move, you record a perfect engine run, and then use that recording to test the new transmission, the new brakes, and the new tires. It saves time, reduces confusion, and ensures that when the car is finally built, the engine and the transmission know exactly how to dance together.

Drowning in papers in your field?

Get daily digests of the most novel papers matching your research keywords — with technical summaries, in your language.

Try Digest →