FP-Predictor - False Positive Prediction for Static Analysis Reports

This paper presents FP-Predictor, a Graph Convolutional Network model that leverages Code Property Graphs to effectively predict false positives in Static Application Security Testing reports, achieving high accuracy on benchmarks while demonstrating security-aware reasoning despite limitations in interprocedural control-flow representation.

Tom Ohlmer, Michael Schlichtig, Eric Bodden

Published Thu, 12 Ma
📖 4 min read☕ Coffee break read

Imagine you are a detective trying to solve crimes in a massive city (the software code). You hire a very fast, very strict robot assistant (a Static Analysis Tool) to scan the city and flag every suspicious activity.

The problem? This robot is paranoid. It sees a person carrying a pocket knife and screams "MURDERER!" It sees someone wearing a trench coat and yells "SPY!" It sees a locked door and shouts "BURGLARY IN PROGRESS!"

In the real world, these are just people with knives for camping, coats for warmth, and locked doors for privacy. But because the robot flags so many innocent things as crimes, the human detectives (the developers) get exhausted. They spend all day chasing ghosts, ignoring the real criminals, and eventually, they stop trusting the robot altogether.

This is the problem of False Positives in software security.

The Solution: The "Smart Filter" (FPPredictor)

The authors of this paper, Tom, Michael, and Eric, built a new tool called FPPredictor. Think of it as a Senior Detective who reviews the Robot's list of suspects before the human detectives go out to investigate.

Here is how it works, using simple analogies:

1. The Map (Code Property Graphs)

To understand a crime, you can't just look at one sentence of a report. You need to see the whole neighborhood.

  • The Old Way: The robot just looks at a single line of code and says, "This looks bad."
  • The New Way: FPPredictor builds a 3D Map of the code. It connects the dots between:
    • What the code says (The syntax).
    • How the code moves (If you press this button, does it go here or there?).
    • How data flows (Did this secret password get passed to a stranger?).
    • This map is called a Code Property Graph (CPG). It's like a subway map showing every possible route a criminal (or a bug) could take.

2. The Brain (Graph Convolutional Network)

FPPredictor uses a special type of AI brain called a Graph Convolutional Network (GCN).

  • Imagine the AI is a student who has studied thousands of "True Crimes" (real vulnerabilities) and "False Alarms" (harmless code).
  • When the robot flags a new suspect, the AI looks at the 3D Map of that code.
  • It asks: "Does this look like the patterns I've seen in real crimes, or does it look like the innocent people I've seen before?"

3. The Results: A Surprise Twist

The team tested their "Senior Detective" on a new set of cases.

  • The Initial Score: At first, the results looked bad. The AI seemed to miss many of the "False Alarms" the robot made. It only caught about 4% of them.
  • The Deep Dive: The researchers then manually checked the cases the AI got "wrong." They discovered something fascinating: The AI was actually right, and the test data was wrong.
    • The test data (the "Ground Truth") said some code was safe.
    • But the AI looked at the code and said, "No, this is actually dangerous because it uses a weak lock, even if the test said it was fine."
    • The AI was being a better security expert than the test itself! It was spotting "bad habits" that the test data had missed.

Once they realized the AI was actually being more careful and security-conscious than the test, the success rate jumped from 3.7% to over 96%.

Why This Matters

  • Saving Time: Instead of developers chasing 100 ghosts, the AI filters them down to the 5 real threats.
  • Building Trust: Developers can finally trust the security tools again because the "noise" is gone.
  • Better Security: The AI is so good at spotting subtle dangers that it sometimes catches risks that human testers missed.

The Catch (Limitations)

The AI isn't perfect yet.

  • It's a bit narrow: It's great at looking at one room in the house, but it sometimes misses clues that connect two different rooms (inter-procedural connections).
  • It needs more training: It learned from a specific set of examples. If it sees a totally new type of crime it hasn't seen before, it might get confused.

The Bottom Line

FPPredictor is like a highly trained, security-conscious intern who reviews the paranoid robot's report. It doesn't just say "Yes" or "No"; it looks at the whole picture, understands the context, and helps developers focus only on the real dangers, saving them hours of wasted effort and making software safer for everyone.