Can ChatGPT Generate Realistic Synthetic System Requirement Specifications? Results of a Case Study

This case study demonstrates that while ChatGPT can generate realistic synthetic system requirement specifications across multiple industries using iterative prompt engineering, the resulting artifacts still contain significant flaws that necessitate thorough expert evaluation rather than relying solely on LLM-based quality assessments.

Alex R. Mattukat, Florian M. Braun, Horst Lichter

Published Wed, 11 Ma
📖 4 min read☕ Coffee break read

Imagine you are a chef trying to teach a new cooking class. You need to show your students how to make a "perfect lasagna." But there's a problem: you can't show them a real lasagna because the recipe is a top-secret family heirloom, and the real ingredients are locked in a vault.

So, you ask a very smart, but slightly hallucinating, robot assistant (let's call him "ChatGPT") to write a recipe for a fake lasagna that looks and tastes just like the real thing. The question is: Can this robot write a fake recipe that is so good, even a professional chef can't tell the difference?

This paper is the story of a team of researchers who tried exactly that, but instead of lasagna, they were writing System Requirement Specifications (SyRSs).

What is a "System Requirement Specification"?

Think of a SyRS as the blueprint for a building. Before you build a skyscraper, you need a massive document that says:

  • "The building must have 50 floors."
  • "It must hold 1,000 people."
  • "It must survive an earthquake."
  • "The elevators must be fast."

In the real world, these blueprints are written by humans for real companies. But researchers who want to study how to build better blueprints can't always see the real ones because companies keep them secret (confidentiality).

The Experiment: The Robot Architect

The researchers wanted to know if they could use a "Black-Box" AI (like ChatGPT) to generate Synthetic System Requirement Specifications (SSyRSs). These are fake blueprints that look real but are made up.

They set up a strict challenge:

  1. No Real Blueprints: The AI couldn't look at any real documents.
  2. No Human Experts: The AI couldn't ask a human architect for help.
  3. 10 Different Industries: They asked the AI to write blueprints for 10 different worlds: E-commerce, Healthcare, Logistics, Finance, etc.

How They Did It (The "Prompt" Game)

The researchers didn't just say, "Write a blueprint." That would be like asking a child to "draw a house" and expecting a skyscraper. Instead, they used a game of refinement:

  1. The Template: They gave the AI a strict form to fill out (like a fill-in-the-blank worksheet).
  2. The Persona: They told the AI, "Pretend you are a grumpy, experienced construction manager."
  3. The Critic: After the AI wrote a blueprint, the AI had to grade itself. It had to say, "Is this realistic? If not, why?"
  4. The Loop: They did this 10 times. Every time the AI made a mistake, the researchers tweaked the instructions (the "prompt") to make the next version better.

The Results: The "Uncanny Valley" of Blueprints

After generating 300 fake blueprints, they invited 87 real human experts (actual architects and engineers) to take a look.

The Good News:

  • 62% of the experts said, "Hey, this looks pretty realistic! I could almost use this."
  • The structure was perfect. The AI knew exactly where to put the "Safety Requirements" and the "Budget Constraints."
  • It was great at sounding confident and professional.

The Bad News (The Catch):

  • The "Hallucination" Trap: While the blueprints looked real, they were often full of hidden nonsense.
    • Analogy: Imagine a blueprint that says, "The building must be made of glass that is also fireproof," but then later says, "The building must be built in a volcano." A quick glance looks fine, but a closer look reveals it's impossible.
  • The Robot's Confidence: The AI was overconfident. It would write a fake rule with 100% certainty, even if the rule made no sense.
  • The Robot's Self-Grading was Flawed: When the AI graded its own work, it was inconsistent. One time it gave a blueprint a 9/10, and the next time (with the same blueprint), it gave it a 4/10. It couldn't be trusted to check its own work.

The Big Takeaway

The researchers concluded that ChatGPT is a fantastic "draftsman," but a terrible "inspector."

  • Can it generate realistic-looking specs? Yes, to a certain extent. It can create a document that looks like a professional blueprint.
  • Can we trust it without a human? Absolutely not.

The paper warns us that if you use AI to write these important documents, you must have a human expert double-check it. The AI is like a student who has memorized the textbook perfectly but has never actually built a house. They can write a perfect essay about building a house, but if you ask them to actually build one, they might try to use glue instead of cement.

Summary in One Sentence

ChatGPT can write a fake blueprint that looks real enough to fool a quick glance, but it's full of hidden errors and lies, so a human expert must always check the work before you start building.