Challenges and Design Considerations for Finding CUDA Bugs Through GPU-Native Fuzzing

Here is an explanation of the paper using simple language and creative analogies.

The Big Picture: The "Two-Engine" Car Problem

Imagine modern computers are like high-performance race cars. For a long time, these cars had just one engine: the CPU (the brain that does general thinking). Over the last few decades, engineers have built incredibly strong safety features into this engine. They have airbags, seatbelts, and automatic brakes. If you try to crash the car, the safety systems catch you.

But recently, to make these cars go faster, engineers added a second, super-powerful engine: the GPU (the muscle that handles graphics and heavy math, like training AI).

The Problem: While the CPU engine is safe and sound, the GPU engine is still under construction. It's like driving a car where the driver's seat has airbags, but the passenger seat has none. Because the GPU is so new and complex, it has many "bugs" (glitches) that can cause the car to crash, leak your private data, or let hackers take the wheel.

The Current Mistake: Testing the Engine in the Wrong Garage

Right now, when security experts try to find bugs in the GPU engine, they do something strange. They take the GPU code and try to translate it to run on the CPU engine just to test it.

The Analogy: Imagine you are a mechanic trying to test a jet engine. Instead of testing it on a plane, you take it apart, build a tiny model, and test it on a bicycle.

Why this fails: A jet engine and a bicycle engine work completely differently. If you test the jet engine on a bike, you might miss the fact that the real jet engine will explode at high speeds.
The Paper's Point: The authors say, "Stop testing the GPU on the CPU!" The differences between them are too big. If you don't test the GPU on the GPU, you will miss the dangerous bugs.

The Solution: A "Native" Fuzzing Pipeline

The authors propose a new way to test these systems called GPU-Native Fuzzing.

What is "Fuzzing"?
Think of fuzzing as a "stress test." Instead of driving the car normally, you throw random, crazy things at it to see what breaks. You might try to steer with a banana, drive at 200 mph, or hit a pothole made of jelly. If the car breaks, you found a bug!

The Challenge:
Doing this on a GPU is hard because:

No Safety Net: The GPU doesn't have the "airbags" (sanitizers) that the CPU has to catch crashes.
Confusing Inputs: You can't just throw random garbage at the GPU. It needs very specific instructions to start working. If you don't set it up right, it just ignores you.
Blind Spots: It's hard to see which parts of the code the GPU actually used during the test.

How They Fixed It (The Design)

The team designed a new toolkit to test the GPU directly, using four main strategies:

1. The "Shadow Cop" (Address Sanitization)

They built a tool that runs inside the GPU. Imagine a shadow cop riding along with every piece of data. If the data tries to go where it shouldn't (like a buffer overflow), the cop immediately pulls the emergency brake. This happens directly on the GPU, so it catches bugs that other tools miss.

2. The "Context-Sensitive" Setup

You can't just walk up to a complex machine and start pressing buttons. You have to turn it on, load the fuel, and warm up the engine first.

The Fix: They created a system that sets up the GPU perfectly (the "context") before starting the stress test. They use open-source examples to learn exactly how to start these machines, so they don't waste time on tests that fail because the machine wasn't ready.

3. The "Smart Jester" (Type-Aware Mutations)

A normal "jester" (fuzzer) throws random things at the machine. But if the machine expects a number, and you throw a word, it just says "No."

The Fix: Their "Smart Jester" knows the rules.
- If the machine expects a number, the jester tries the biggest number, the smallest number, or zero.
- If it expects a list of items, the jester tries a list that is too long, too short, or empty.
- This ensures the GPU actually tries to do the work, making it much more likely to find a hidden crash.

4. The "Flashlight" (Coverage Tracking)

When you throw a banana at the car, did it hit the engine or just the bumper? You need to know.

The Fix: They built a flashlight that shines on the code as it runs. It tells them, "Hey, we tested this part of the engine, but we never touched that rusty bolt over there." This helps them focus their stress tests on the parts they haven't checked yet.

The Results: Finding the Rusty Bolts

They tested their new system on 11 different parts of NVIDIA's famous library (cuBLAS).

The Old Way: When they used standard tests, they only explored about 26% of the code. It was like looking at a dark room with a tiny flashlight; most of the room was still in the dark.
The New Way: Their system is designed to light up the whole room, finding the rusty bolts (bugs) that were hiding in the dark corners.

Why This Matters (The Ethical Part)

The authors argue that this isn't just a technical problem; it's an ethical one.

We are putting the world's most important AI and scientific work on these vulnerable GPUs.
If we don't test them properly, we risk leaking private data or letting hackers control our systems.
It is the responsibility of the designers to test the GPU on the GPU, not to pretend it's something else.

In short: The paper says, "Stop testing the GPU engine on a bicycle. Build a better testing track for the jet engine itself, and we can make the future of computing safe."

Here is a detailed technical summary of the paper "Challenges and Design Considerations for Finding CUDA Bugs Through GPU-Native Fuzzing".

1. Problem Statement

The paper addresses the widening security gap in heterogeneous computing systems (CPU + GPU). While CPU software stacks have benefited from decades of memory safety hardening (static/dynamic analysis, safe languages), the GPU software stack remains immature and vulnerable.

The Ethical Gap: As AI and scientific workloads increasingly rely on GPUs, the lack of security in these components poses a critical risk. Exploitable bugs (CVEs) in GPUs are rising rapidly, leading to data leakage, silent corruption, and bypasses of host security mechanisms.
Failure of Current Mitigations: Existing solutions often attempt to detect GPU bugs by translating heterogeneous GPU programs into homogeneous CPU programs for testing. The authors argue this approach is fundamentally flawed because:
- It fails to capture architectural differences between CPUs and GPUs.
- It leads to unfaithful program behavior, resulting in false positives/negatives.
- It cannot effectively detect bugs specific to GPU parallelism or memory models.
The Core Challenge: There is a lack of GPU-native tools for fuzzing CUDA programs. Specifically, existing frameworks lack:
1. Sanitization: Address sanitizers (like ASan) do not generalize to GPUs.
2. Input Mutation: Techniques for mutating inputs to trigger edge cases in highly parallel CUDA kernels are missing.
3. Coverage Tracking: Code coverage guidance is difficult to implement on GPUs due to execution complexity.
4. Fuzzing Harnesses: CUDA programs require specific context initialization (JIT compilation, CPU-GPU collaboration) that standard CPU harnesses cannot provide.

2. Methodology: GPU-Native Fuzzing Pipeline

The authors propose a GPU-native fuzzing pipeline that moves testing logic directly onto the GPU hardware using Dynamic Binary Instrumentation (DBI) via NVIDIA's NVBit tool. The design consists of four key components:

A. GPU-Native Address Sanitization & Coverage Tracking

Implementation: Instead of translating code to CPU, the authors instrument the CUDA kernels directly on the GPU using NVBit.
Address Sanitization: The tool maintains metadata for every unit (e.g., 4 bytes) of GPU global, local, and shared memory, as well as pointer metadata. During execution, it checks memory accesses against this metadata to detect buffer overflows and use-after-frees. This leverages GPU parallelism to reduce overhead.
Coverage Tracking: A software-only profiler instruments control flow instructions. It maintains metadata for each control flow instruction to count executions, providing feedback to the fuzzer to guide input mutation.
Scope: Unlike previous works, this approach supports both open-source and closed-source CUDA kernels on commodity NVIDIA GPUs.

B. Context-Sensitive Fuzzing

Challenge: Many NVIDIA libraries are closed-source and abstract low-level kernels behind high-level interfaces. Directly fuzzing kernels is impossible without the correct call stack context. Additionally, Just-In-Time (JIT) compilation of PTX to SASS code incurs high overhead if repeated for every test case.
Solution: The authors propose dividing library examples into three phases:
1. Initialization: Setup contexts, allocate memory, and copy data (CPU $\to$ GPU).
2. Computation: Execute the target kernel.
3. Termination: Sync, free memory, and copy results (GPU $\to$ CPU).
Optimization: The fuzzing loop wraps only the Computation phase. The expensive Initialization and Termination phases are amortized across many iterations of the computation phase. This significantly reduces the overhead of JIT compilation and context setup.

C. Type-Aware Mutations

Standard byte-level mutations fail to bypass shallow type checks in ML kernels. The authors propose type-aware mutations based on observed CVEs:

Integers: Mutate values across zero, maximum positive, and minimum negative limits to trigger overflows.
Floating Points: Mutate specific components (sign bit, mantissa, exponent) by flipping bits or adding/subtracting values.
Arrays:
- Value Mutation: Use extreme values or mismatched dimensions (e.g., 2D array for a 1D kernel).
- Pointer Mutation: Point to unexpected memory spaces (e.g., passing a local/shared memory pointer where global memory is expected).

3. Key Contributions

Identification of the "Faithfulness" Gap: The paper argues that security validation for heterogeneous systems must be native to the hardware to ensure behavioral faithfulness, rejecting CPU-translated testing as insufficient.
Comprehensive Design for GPU Fuzzing: A novel architecture combining DBI (NVBit), context-sensitive execution management, and type-aware mutation strategies specifically tailored for CUDA.
Support for Closed-Source Ecosystems: The proposed sanitizer and profiler work on proprietary, closed-source NVIDIA libraries (e.g., cuBLAS), overcoming a major limitation of previous research.
Efficiency Optimization: The "amortization" strategy for fuzzing harnesses (separating setup/teardown from the fuzzing loop) addresses the high overhead of JIT compilation in GPU testing.

4. Preliminary Experimental Results

The authors evaluated their coverage profiler on 11 samples from NVIDIA's cuBLAS library (a proprietary, closed-source linear algebra library).

Setup: Linux server with Ubuntu 22.04, two AMD EPYC CPUs, and two NVIDIA A100 GPUs.
Findings:
- Standard CUDA library samples achieve very low code coverage when used as fuzzing seeds.
- Geometric Mean Coverage: Only 25.98% of basic blocks were hit across the 11 samples.
- Variation: Coverage ranged from a high of 64.29% (asum) to a low of 9.09% (rotm).
Implication: There is a massive gap in state exploration for complex GPU programs, validating the need for a specialized, GPU-native fuzzing pipeline to uncover hidden bugs.

5. Significance

Security Impact: By enabling faithful bug detection on the actual hardware, this approach can uncover critical memory safety vulnerabilities (ROP attacks, data leaks) that current CPU-based translation methods miss.
Ethical Responsibility: The paper frames this technical work as an ethical imperative, arguing that system designers must validate GPU programs natively to protect the integrity of critical AI and scientific infrastructure.
Scalability: The design is applicable to commodity hardware and closed-source ecosystems, making it a practical solution for industry adoption rather than just a theoretical concept requiring custom hardware.