Challenges and Design Considerations for Finding CUDA Bugs Through GPU-Native Fuzzing

This paper argues that current CPU-based testing methods fail to ensure memory safety in heterogeneous systems due to unfaithful translations, and it proposes a GPU-native fuzzing pipeline for CUDA programs as a more reliable solution to address the growing number of exploitable bugs in modern AI and scientific workloads.

Mingkai Li, Joseph Devietti, Suman Jana, Tanvir Ahmed Khan

Published Mon, 09 Ma
📖 6 min read🧠 Deep dive

Here is an explanation of the paper using simple language and creative analogies.

The Big Picture: The "Two-Engine" Car Problem

Imagine modern computers are like high-performance race cars. For a long time, these cars had just one engine: the CPU (the brain that does general thinking). Over the last few decades, engineers have built incredibly strong safety features into this engine. They have airbags, seatbelts, and automatic brakes. If you try to crash the car, the safety systems catch you.

But recently, to make these cars go faster, engineers added a second, super-powerful engine: the GPU (the muscle that handles graphics and heavy math, like training AI).

The Problem: While the CPU engine is safe and sound, the GPU engine is still under construction. It's like driving a car where the driver's seat has airbags, but the passenger seat has none. Because the GPU is so new and complex, it has many "bugs" (glitches) that can cause the car to crash, leak your private data, or let hackers take the wheel.

The Current Mistake: Testing the Engine in the Wrong Garage

Right now, when security experts try to find bugs in the GPU engine, they do something strange. They take the GPU code and try to translate it to run on the CPU engine just to test it.

The Analogy: Imagine you are a mechanic trying to test a jet engine. Instead of testing it on a plane, you take it apart, build a tiny model, and test it on a bicycle.

  • Why this fails: A jet engine and a bicycle engine work completely differently. If you test the jet engine on a bike, you might miss the fact that the real jet engine will explode at high speeds.
  • The Paper's Point: The authors say, "Stop testing the GPU on the CPU!" The differences between them are too big. If you don't test the GPU on the GPU, you will miss the dangerous bugs.

The Solution: A "Native" Fuzzing Pipeline

The authors propose a new way to test these systems called GPU-Native Fuzzing.

What is "Fuzzing"?
Think of fuzzing as a "stress test." Instead of driving the car normally, you throw random, crazy things at it to see what breaks. You might try to steer with a banana, drive at 200 mph, or hit a pothole made of jelly. If the car breaks, you found a bug!

The Challenge:
Doing this on a GPU is hard because:

  1. No Safety Net: The GPU doesn't have the "airbags" (sanitizers) that the CPU has to catch crashes.
  2. Confusing Inputs: You can't just throw random garbage at the GPU. It needs very specific instructions to start working. If you don't set it up right, it just ignores you.
  3. Blind Spots: It's hard to see which parts of the code the GPU actually used during the test.

How They Fixed It (The Design)

The team designed a new toolkit to test the GPU directly, using four main strategies:

1. The "Shadow Cop" (Address Sanitization)

They built a tool that runs inside the GPU. Imagine a shadow cop riding along with every piece of data. If the data tries to go where it shouldn't (like a buffer overflow), the cop immediately pulls the emergency brake. This happens directly on the GPU, so it catches bugs that other tools miss.

2. The "Context-Sensitive" Setup

You can't just walk up to a complex machine and start pressing buttons. You have to turn it on, load the fuel, and warm up the engine first.

  • The Fix: They created a system that sets up the GPU perfectly (the "context") before starting the stress test. They use open-source examples to learn exactly how to start these machines, so they don't waste time on tests that fail because the machine wasn't ready.

3. The "Smart Jester" (Type-Aware Mutations)

A normal "jester" (fuzzer) throws random things at the machine. But if the machine expects a number, and you throw a word, it just says "No."

  • The Fix: Their "Smart Jester" knows the rules.
    • If the machine expects a number, the jester tries the biggest number, the smallest number, or zero.
    • If it expects a list of items, the jester tries a list that is too long, too short, or empty.
    • This ensures the GPU actually tries to do the work, making it much more likely to find a hidden crash.

4. The "Flashlight" (Coverage Tracking)

When you throw a banana at the car, did it hit the engine or just the bumper? You need to know.

  • The Fix: They built a flashlight that shines on the code as it runs. It tells them, "Hey, we tested this part of the engine, but we never touched that rusty bolt over there." This helps them focus their stress tests on the parts they haven't checked yet.

The Results: Finding the Rusty Bolts

They tested their new system on 11 different parts of NVIDIA's famous library (cuBLAS).

  • The Old Way: When they used standard tests, they only explored about 26% of the code. It was like looking at a dark room with a tiny flashlight; most of the room was still in the dark.
  • The New Way: Their system is designed to light up the whole room, finding the rusty bolts (bugs) that were hiding in the dark corners.

Why This Matters (The Ethical Part)

The authors argue that this isn't just a technical problem; it's an ethical one.

  • We are putting the world's most important AI and scientific work on these vulnerable GPUs.
  • If we don't test them properly, we risk leaking private data or letting hackers control our systems.
  • It is the responsibility of the designers to test the GPU on the GPU, not to pretend it's something else.

In short: The paper says, "Stop testing the GPU engine on a bicycle. Build a better testing track for the jet engine itself, and we can make the future of computing safe."