LLM-Assisted Repository-Level Generation with Structured Spec-Driven Engineering

This paper proposes Structured Spec-Driven Engineering (SSDE), a paradigm that utilizes structured specifications to overcome the ambiguity and quality limitations of natural language prompts, thereby enabling high-quality, verifiable code generation at the repository level.

Original authors: Shuzhao Feng, Boqi Chen, Brett H Meyer, Gunter Mussbacher

Published 2026-05-06✓ Author reviewed
📖 4 min read☕ Coffee break read

Original authors: Shuzhao Feng, Boqi Chen, Brett H Meyer, Gunter Mussbacher

Original paper licensed under CC BY 4.0 (http://creativecommons.org/licenses/by/4.0/). This is an AI-generated explanation of the paper below. It is not written by the authors. For technical accuracy, refer to the original paper. Read full disclaimer

Imagine you are trying to teach a very talented, but slightly scatterbrained, apprentice chef how to cook a massive, complex banquet for a whole city.

The Problem: The "Vague Order"
Right now, if you ask a top-tier AI (the apprentice) to write code for a whole software system, you usually just give it a long, natural language description like, "Make a website where people can book meetings." This is like telling the chef, "Make a delicious meal."

The paper argues that while the AI is great at chopping a single onion (writing a small function), it gets lost when asked to cook the whole banquet (a full software repository). Natural language is too fuzzy. The AI might guess wrong, forget a step, or create a dish that looks good but doesn't taste right. Worse, because the instructions were vague, it's hard to prove why the meal failed.

The Solution: The "Structured Recipe Book"
The authors propose a new way of working called Structured Spec-Driven Engineering (SSDE). Instead of a vague conversation, they suggest giving the AI a strict, structured "recipe book."

In this paper, they use two types of structured recipes:

  1. Gherkin Specifications: Think of these as "If-Then" test cases. Instead of saying "Make it work," you write: "IF a user clicks 'Book', THEN the room must be marked 'Occupied'." It's a checklist of exact behaviors.
  2. Domain Models: These are like architectural blueprints or a map of the ingredients. They show how different parts of the system (like "Users," "Rooms," and "Dates") connect to each other.

The Experiment: The Taste Test
The researchers set up a pilot study. They acted as the head chefs and gave five different AI models (the apprentices) the task of building the "business logic" (the cooking rules) for three different software systems.

They tested different combinations:

  • The Control Group: Just the vague natural language description.
  • The Test Groups: The vague description PLUS the structured "recipe book" (the blueprints and the "If-Then" checklists).

The Results: Structure Wins
The findings were clear:

  • Better Accuracy: When the AI had the structured "recipe book" (the blueprints and checklists), it made far fewer mistakes than when it only had the vague description.
  • The "Blueprint" Boost: Giving the AI the specific code signatures (the exact list of ingredients and tools) along with the blueprints helped it the most. It was like giving the chef not just the recipe, but the exact brand of flour and the specific size of the pan to use.
  • Still Room to Grow: While the structured approach was much better, the AI still made some errors. However, the researchers found that over 70% of these errors were simple, detectable mistakes — things like referencing a variable that doesn't exist, or making a Python syntax error. These don't even need a test oracle (i.e. running the code with example inputs to see what comes out): a standard compiler or linter would catch them.

The Future Roadmap
The paper suggests that to make this work perfectly, we need to:

  1. Add a Feedback Loop: Instead of just asking the AI once, we should let it write the code, check it against the "recipe book," and fix its own mistakes automatically.
  2. Build Better Datasets: We need more examples of these structured recipe books to train the AI better.
  3. Handle Changes: Real software changes all the time. We need to teach the AI how to update just one part of the banquet (like swapping the dessert) without ruining the whole meal.

The Bottom Line
The paper concludes that if we stop treating AI like a magic wand that works on vague wishes, and start treating it like a skilled worker following a strict, structured blueprint, we can get it to build entire software systems reliably. It turns the AI from a "creative guesser" into a "precise builder."

Drowning in papers in your field?

Get daily digests of the most novel papers matching your research keywords — with technical summaries, in your language.

Try Digest →