Theory of Code Space: Do Code Agents Understand Software Architecture?

This paper introduces Theory of Code Space (ToCS), a benchmark demonstrating that AI code agents exhibit significant, model-dependent variability in their ability to maintain coherent architectural beliefs and utilize active exploration or self-scaffolding during multi-file software engineering tasks.

Grigory Sapunov

Published Mon, 09 Ma
📖 5 min read🧠 Deep dive

Imagine you are trying to understand a massive, mysterious city. You don't have a map, and you can't see the whole thing at once. You have to walk around, peek into buildings, read street signs, and try to figure out how the subway lines connect to the power grid.

This is exactly what AI "code agents" (smart computer programs that write and fix software) are supposed to do. But there's a problem: while these AIs are great at solving small, isolated puzzles, they often get completely lost when asked to navigate a whole software project. They might fix one file but accidentally break three others because they don't understand how the pieces fit together.

This paper introduces a new test called TOCS (Theory of Code Space) to see if AI agents can actually build a mental map of a software city, or if they are just wandering around blindly.

Here is the breakdown of their findings using simple analogies:

1. The "Active vs. Passive" Paradox

The Analogy: Imagine two students trying to learn a new city.

  • Student A (Active): Walks around, opens doors, and asks locals questions.
  • Student B (Passive): Is handed a giant, 500-page atlas of the city all at once.

Usually, we think the student with the atlas (Passive) should win. But the paper found something weird: It depends on the student.

  • One AI model (GPT) actually did better when it walked around and explored step-by-step. When it was handed the whole atlas at once, it got overwhelmed and confused (like trying to drink from a firehose).
  • Another AI model (Gemini) did the opposite. It got lost when it had to walk around, but it understood the city perfectly when it was given the whole atlas at once.

The Lesson: "Active exploration" isn't just a default skill; it's a specific talent that some AIs have and others lack.

2. The "Memory Blackout" (Belief Instability)

The Analogy: Imagine you are drawing a map of the city on a piece of paper. Every few steps, you have to stop and show your map to a teacher.

  • The Stable Student: Draws a little bit, shows the map, adds a little more, shows the map again. The map keeps getting better and never loses what was already drawn.
  • The Unstable Student: Draws a great map, shows it to the teacher, then erases half of it before drawing the next part. By the end, they've forgotten the streets they discovered in the first half of the tour.

The Lesson: The paper found that some AI models suffer from "catastrophic forgetting." A smaller model (Gemini 2.5 Flash) kept a perfect, stable map. A much larger, "smarter" sibling (Gemini 2.5 Pro) built a good map, then suddenly forgot everything it had learned in a single step. Size doesn't always equal stability.

3. The "Self-Scaffolding" Effect

The Analogy: Imagine you are building a tower of blocks.

  • Method A: You build a block, put it down, and forget it. Then you build the next one.
  • Method B: You build a block, write down exactly where it is on a sticky note, and stick that note to the wall. Then you use that note to help you place the next block.

The Lesson: When the AI was allowed to keep its "notes" (the JSON map it wrote) in its memory while it kept working, one model (GPT) got significantly better. It used its own previous notes to help it understand the future. However, for another model, keeping the notes didn't help at all. This means the ability to "use your own notes to help you think" is a skill that varies wildly between different AIs.

4. The "Hidden Rules" Test

The Analogy: In a real city, there are rules like "No trucks allowed on this bridge" or "All water pipes must go through the main valve." These aren't written on the street signs; you have to infer them by looking at the pipes and valves.
The paper planted these "hidden rules" in the code.

  • The Result: Most AIs missed them. But when the researchers gave the AI a better "exam prompt" (a clearer set of instructions on what to look for), the AIs suddenly got much better at finding these hidden rules. This proves that sometimes the AI can understand the architecture, but it just needed a clearer question to answer.

The Big Takeaway

The paper concludes that AI code agents are not yet ready to be independent architects.

They are like very talented interns who are great at writing code for a single room but often fail to understand the blueprint of the whole building. Some interns are good at exploring new rooms, while others are better if you hand them the blueprints first. Some forget what they learned five minutes ago, while others remember everything perfectly.

Why does this matter?
If we want AI to fix complex software bugs or build new systems, we can't just ask them to "go fix it." We need to design systems that:

  1. Help them remember what they've seen (like a digital sticky note).
  2. Teach them how to explore efficiently without getting overwhelmed.
  3. Give them clearer instructions on what "rules" to look for.

The authors released their test (TOCS) as open-source so that other researchers can use it to build better, more reliable AI architects.