GCL-Sampler: Discovering Kernel Similarity for Sampled GPU Simulation via Graph Contrastive Learning

GCL-Sampler is a novel GPU workload sampling framework that utilizes Relational Graph Convolutional Networks and contrastive learning to automatically extract high-dimensional kernel similarities from trace graphs, achieving significantly higher speedups (258.94x) with minimal error (0.37%) compared to existing state-of-the-art methods.

Jiaqi Wang, Jingwei Sun, Jiyu Luo, Han Li, Guangzhong Sun

Published 2026-03-03
📖 4 min read☕ Coffee break read

Imagine you are an architect trying to design a new, super-fast car engine. To make sure your design works, you need to run thousands of simulations. But here's the problem: running a full simulation of the engine is like trying to drive the car at full speed on a test track that is one million miles long. It would take you weeks or even months to finish just one test. You'd never get any new designs built!

This is exactly the problem computer scientists face with GPU simulators. GPUs (the chips in your graphics cards that power AI and video games) are incredibly complex. Simulating them perfectly is so slow that researchers can't test new ideas fast enough.

The Old Way: Guessing and Checking

To speed things up, researchers used to try "sampling." Instead of driving the whole million-mile track, they'd pick a few short segments to test and assume the rest of the track is similar.

But the old methods were like a clumsy detective:

  1. The "Name-Tag" Detective: Some methods only looked at the name of the task. If two tasks had different names, they assumed they were totally different, even if they drove the engine exactly the same way. This meant they had to test almost everything, so they didn't save much time.
  2. The "Counting" Detective: Others just counted how many instructions a task had. But two tasks can have the same number of instructions but behave completely differently (like two people walking the same number of steps but one is sprinting and the other is dancing). This led to bad guesses and wrong results.

The New Solution: GCL-Sampler (The "Super-Intuitive" Detective)

The authors of this paper, Jiaqi Wang and his team, built a new tool called GCL-Sampler. Think of it as a detective with a superpower: Pattern Recognition.

Instead of looking at names or simple counts, GCL-Sampler looks at the entire story of how the GPU works.

1. Turning Code into a Map (The Graph)

Imagine every instruction the GPU runs is a city, and the data it moves between instructions are the roads connecting them.

  • Old methods just looked at the city names.
  • GCL-Sampler builds a giant, 3D map (a "Graph") showing every road, every traffic light, and every turn. It captures the structure and the meaning of the code, not just the surface details.

2. The "Contrastive Learning" Gym

Now, how does the computer learn to recognize similar maps?
Imagine you have a gym with two mirrors. You show the computer two slightly different photos of the same city (maybe one has a tree missing, or the lighting is different).

  • The computer learns: "Hey, even though these photos look slightly different, they are the same city!"
  • Then, it shows a photo of a totally different city (like a desert vs. a jungle).
  • The computer learns: "These are totally different."

This is called Contrastive Learning. The computer trains itself to ignore the small, unimportant details (noise) and focus on the deep, structural similarities. It learns to say, "These two GPU tasks are twins, even if they have different names!"

3. The Result: The Perfect Shortcut

Once the computer has learned this, it groups thousands of GPU tasks into "families" based on how they actually behave.

  • Instead of testing 10,000 different tasks, it picks one representative from each family.
  • It simulates just that one, and then mathematically scales the result to represent the whole family.

The Magic Numbers

The paper shows that this new "Super-Intuitive Detective" is a game-changer:

  • Speed: It makes the simulation 259 times faster. (Imagine finishing a 100-year project in less than a week!).
  • Accuracy: It is 99.6% accurate. The old methods either were fast but wrong (20% error) or accurate but slow. GCL-Sampler gets the best of both worlds.
  • Real World: They tested it on everything from scientific math problems to massive AI models (like the ones powering chatbots), and it worked great on different generations of computer chips.

The Bottom Line

GCL-Sampler is like having a time machine for computer architects. By using advanced AI to understand the "soul" of the code rather than just its "clothes" (names or counts), it allows researchers to skip the boring, repetitive parts of testing and focus on the important stuff. It turns a process that used to take weeks into something that takes minutes, without losing any accuracy.

Get papers like this in your inbox

Personalized daily or weekly digests matching your interests. Gists or technical summaries, in your language.

Try Digest →