Model2Kernel: Model-Aware Symbolic Execution For Safe CUDA Kernels

Model2Kernel is the first practical system that automatically verifies the memory safety of CUDA kernels in LLM inference by combining model-aware dynamic analysis with specialized symbolic execution to detect bugs caused by variable tensor layouts and user-controlled inputs, successfully identifying 353 previously unknown vulnerabilities with minimal false positives.

Mengting He, Shihao Xia, Haomin Jia, Wenfei Wu, Linhai Song

Published 2026-03-27
📖 5 min read🧠 Deep dive

Imagine you have built a massive, high-speed factory (a Large Language Model) that produces answers to your questions. This factory runs on a super-fast assembly line made of specialized machines called CUDA kernels. These machines are the "muscle" that does the heavy lifting, moving data around at lightning speed on your computer's graphics card (GPU).

However, because these machines are so fast and complex, they are prone to a specific kind of glitch: Memory Safety Bugs.

Think of a memory bug like a forklift driver in a warehouse who gets confused about where the boxes are.

  • Buffer Overflow: The driver tries to put a box in a shelf that doesn't exist, knocking over everything nearby.
  • Integer Overflow: The driver's clipboard runs out of numbers, so instead of writing "1,000,000," it loops back to "0," causing them to drive into a wall.
  • Data Race: Two drivers try to grab the same box at the exact same time, dropping it or smashing it.

In the world of AI, these glitches don't just crash the factory; they can corrupt the AI's "brain" (its weights), shut down the service, or even let hackers take control of the machine.

The Problem: Why is this hard to fix?

Traditionally, checking these machines for bugs has been like trying to inspect a moving train while it's speeding by.

  1. Dynamic Shapes: The factory changes its layout depending on how many questions you ask. Sometimes the warehouse is small; sometimes it's huge. Old tools can't handle this flexibility.
  2. Thousands of Workers: These machines use thousands of workers (threads) simultaneously. Checking every single worker's path is impossible for humans and too slow for computers.
  3. The "Black Box" of Models: The factory manager (the AI model) tells the machines what to do, but the rules change based on the specific model being used. Old tools didn't understand the manager's instructions.

The Solution: Model2Kernel

The authors of this paper built a new system called Model2Kernel. Think of it as a super-intelligent safety inspector that doesn't just watch the factory run; it simulates every possible scenario the factory could ever face, without actually needing the real factory to run.

It has two main parts, working like a perfect team:

1. HFProbe: The "Model Detective"

Imagine you want to test a new car, but you don't know how the driver will behave. HFProbe is the detective that watches the driver (the AI model) in a simulation.

  • It runs the model on a "fake" GPU (no real hardware needed).
  • It figures out exactly which buttons the driver presses and which parts of the factory they visit.
  • Crucially, it tells the inspector: "Hey, the driver always keeps the warehouse width at 7,168 boxes, but the number of people entering changes every time."
  • This helps the inspector know what is fixed (safe) and what is variable (risky).

2. cuKLEE: The "Parallel Time-Traveler"

Once the detective gives the clues, cuKLEE takes over. This is the symbolic execution engine.

  • Instead of testing the factory with just one set of numbers (e.g., 100 people), cuKLEE uses magic variables. It asks, "What if there are 100 people? What if there are 10,000? What if there are 1 million?"
  • It simulates all these scenarios at once.
  • It treats the thousands of workers (threads) as a single group, checking if any of them would crash the system under any condition.
  • If it finds a path where a forklift hits a wall, it stops and says, "Here is the exact combination of inputs that will break the machine."

The Results: A Safety Revolution

The team tested Model2Kernel on real-world AI factories (like vLLM and Hugging Face models).

  • The Findings: They discovered 353 previously unknown bugs. Most were "integer overflows" (the clipboard running out of numbers) and "out-of-bounds" errors (reaching for a non-existent shelf).
  • The Accuracy: They only had 9 "false alarms" (mistakenly thinking a safe spot was dangerous).
  • The Comparison: When they tried to use older tools to find these same bugs, the older tools found almost nothing. It was like trying to find a needle in a haystack with a magnet, while Model2Kernel used a metal detector.

Why Does This Matter?

As AI becomes part of our daily lives (chatbots, search engines, self-driving cars), the "factories" running them are getting bigger and more complex.

  • Safety: Model2Kernel ensures these factories won't crash when you ask them a tricky question.
  • Security: It stops hackers from exploiting these memory glitches to steal data or hijack the AI.
  • Efficiency: It does this automatically, without needing expensive hardware or slowing down the development process.

In short, Model2Kernel is the ultimate quality control system for the engines powering the AI revolution, ensuring that as these models get smarter, they also get safer.