Extracting Recurring Vulnerabilities from Black-Box LLM-Generated Software

This paper introduces FSTab, a framework that demonstrates how LLM-generated software exhibits predictable, recurring vulnerabilities by enabling black-box attacks based on frontend features and quantifying the consistency of these flaws across different domains and model variations.

Tomer Kordonsky, Maayan Yamin, Noam Benzimra, Amit LeVi, Avi Mendelson

Published 2026-03-10
📖 6 min read🧠 Deep dive

Here is an explanation of the paper "Extracting Recurring Vulnerabilities from Black-Box LLM-Generated Software," translated into simple language with creative analogies.

The Big Idea: The "Bad Habit" of AI Coders

Imagine you hire a very talented, super-fast chef (the AI) to cook thousands of different meals for a restaurant. The chef is amazing at speed and creativity, but they have a strange quirk: they always use the same slightly dangerous knife technique whenever they chop onions, no matter what dish they are making.

If you only look at the finished plate (the front of the website), you see a beautiful salad. You don't see the knife. But if you know which chef made it, you can predict with high certainty that the salad was cut with that dangerous technique, even if you never saw the kitchen.

This paper introduces a tool called FSTab (Feature–Security Table) that does exactly this for software. It proves that when AI models write code, they don't just make random mistakes. They develop predictable bad habits that repeat over and over again.


The Problem: The "Black Box" Kitchen

Usually, when security experts check software, they need to look inside the code (the kitchen) to find bugs. This is like needing a master key to walk into the kitchen and check the knives.

But in the real world, many companies use AI to build software, and they often don't give you the source code (the recipe book). You only see the website or app (the finished meal). This is called a "Black Box."

The researchers asked: If we can't see the code, can we still guess where the security holes are just by looking at what the app does?

The Solution: The "Cheat Sheet" (FSTab)

The researchers built a "Cheat Sheet" called FSTab. Here is how it works, step-by-step:

1. The Training Phase (Learning the Chef's Habits)

First, the researchers asked an AI to write 1,000 different websites (like a bakery, a bank, a social media site). They then looked at the code to see where the AI messed up.

  • The Discovery: They found that whenever the AI was asked to build a "Login Page," it almost always forgot to put a lock on the door. Whenever it built a "File Upload," it almost always left a window open.
  • The Pattern: The AI wasn't making random mistakes. It was following a specific, flawed template for every specific feature.

2. The Attack Phase (Using the Cheat Sheet)

Now, imagine a hacker wants to break into a new website. They don't have the source code.

  • Step 1: They look at the website's front page. They see a "Login" button and a "Search" bar.
  • Step 2: They check the Cheat Sheet (FSTab) for the specific AI model that built the site (e.g., "GPT-5.2").
  • Step 3: The Cheat Sheet says: "If you see a Login button on a GPT-5.2 site, there is a 90% chance the backend has a specific type of security hole."
  • Result: The hacker knows exactly where to strike without ever seeing the code.

The Four "Fingerprints" of AI Mistakes

The paper measures how stubborn these bad habits are using four creative concepts:

  1. Feature Recurrence (The "Same Song, Different Lyrics"):
    Does the AI make the same mistake every time it builds a "Login" feature, even if the rest of the code looks different? Yes. It's like a singer who always hits the same wrong note on the word "love," no matter what song they are singing.

  2. Rephrasing Persistence (The "Stubborn Chef"):
    If you ask the AI to "Build a login" vs. "Create a sign-in page" vs. "Make a user entry system," does it still make the same mistake? Yes. The AI is so stuck in its ways that changing the words you use doesn't change the bad code it writes.

  3. Domain Recurrence (The "Specialty Shop"):
    Does the AI make the same mistakes in a "Banking App" as it does in a "Blog"? Sometimes. It has specific bad habits for specific types of tasks (like handling money), but it might be safer when writing a blog.

  4. Cross-Domain Transfer (The "Universal Bad Habit"):
    This is the scariest part. The researchers found that if they learned the AI's bad habits from a "Blog," they could use that knowledge to hack a "Banking App" it built later. The bad habits are so deep in the AI's brain that they travel across completely different types of software.

The Results: The "Universality Gap"

The study tested top AI models (like GPT-5.2, Claude, and Gemini). They found something shocking:

  • The AI is more predictable than a human. A human programmer might make a mistake once and learn from it. The AI, however, seems to have a "hardwired" flaw.
  • High Success Rate: Using their Cheat Sheet, the researchers could predict hidden security holes with up to 94% accuracy, even when they had never seen that specific type of software before.

Why This Matters (The Takeaway)

Think of AI-generated software like a mass-produced toy. If a toy factory has a defect in its mold, every single toy coming off the line will have that same defect. You don't need to inspect every toy individually; you just need to know which mold was used.

The paper warns us:

  1. AI isn't just "randomly" bad. It has specific, repeatable security flaws.
  2. We can predict these flaws. Just by looking at the outside of an app, we can guess what's broken inside if we know which AI built it.
  3. We need new defenses. We can't just rely on checking the code after it's written. We need to fix the "molds" (the AI models) themselves so they stop baking these dangerous patterns into every piece of software they create.

In short: The paper shows that AI coders have "muscle memory" for making mistakes, and we can now use that knowledge to find the weak spots in software without ever needing to see the code.