Process-Centric Analysis of Agentic Software Systems

This paper introduces Graphectory, a graph-based framework for analyzing the stochastic execution trajectories of agentic software systems, which reveals that richer prompts and stronger models yield more complex reasoning patterns while enabling real-time monitoring and intervention that significantly improves problem resolution rates and efficiency.

Shuyang Liu, Yang Chen, Rahul Krishna, Saurabh Sinha, Jatin Ganhotra, Reyhan Jabbarvand

Published Tue, 10 Ma
📖 5 min read🧠 Deep dive

Imagine you are hiring a team of brilliant but very new interns to fix a massive, complex software bug in a company's codebase.

In the old way of doing things (which the paper calls "Outcome-Centric"), you would only look at the final result. Did they hand you a fixed file? Yes? Great, they get a gold star. No? They get fired. You don't care how they got there. Did they spend three days staring at the wrong file? Did they delete the wrong code and then try to put it back? Did they panic and run in circles? If the final file was fixed, you never knew.

This paper introduces a new way to watch these "AI Interns" (called Agentic Systems) work. It's called "Process-Centric Analysis." Instead of just checking the final homework, the authors built a tool called Graphectory to watch the entire movie of how the AI thinks and acts.

Here is a simple breakdown of the paper's ideas using everyday analogies:

1. The Problem: The "Black Box" of AI

Current AI agents (like SWE-agent or OpenHands) are like magic boxes. You give them a problem, and they give you a solution. But inside the box, they might be:

  • Wandering aimlessly: Looking in the wrong drawers for a tool.
  • Spinning in circles: Trying the same fix over and over when it doesn't work.
  • Skipping steps: Fixing the code but forgetting to test it.

If they get lucky and fix it, we celebrate. If they fail, we just say "it didn't work." We miss the chance to learn why they struggled.

2. The Solution: Graphectory (The "Flight Recorder")

The authors created Graphectory. Think of this as a flight recorder (black box) for the AI's brain, but instead of just recording "crash" or "safe landing," it maps out the entire flight path.

  • The Map: It turns the AI's linear list of actions (Step 1, Step 2, Step 3...) into a 3D map.
  • The Connections: It draws lines between actions.
    • Time Lines: "I looked at the file, then I edited it."
    • Structure Lines: "I looked at the main folder, then I went deeper into a sub-folder."
  • The Result: You can instantly see if the AI is taking a direct highway to the solution or if it's stuck in a traffic jam of loops, backtracking, and wrong turns.

3. The "Language" of the Journey: Langutory

Looking at a giant map can be overwhelming. So, the authors also created Langutory.

  • Analogy: Imagine the AI's journey is a long sentence. Langutory is the summary of that sentence.
  • Instead of reading 50 steps, Langutory says: "The agent spent 5 minutes Looking (Localization), then 2 minutes Fixing (Patching), then 1 minute Checking (Validation)."
  • This lets researchers quickly spot if an AI skipped the "Checking" part or spent 90% of its time just "Looking" without ever fixing anything.

4. What They Discovered (The "Aha!" Moments)

The team watched 4,000 of these AI journeys and found some surprising things:

  • Success isn't always efficient: Even when the AI fixed the bug, it often took a very long, winding road. It was like a driver who finally arrived at the destination but drove through three different states and got lost twice along the way.
  • The "Smart" AI gets lost more: Surprisingly, the "smarter" AI models (the ones with bigger brains) often explored more and took longer paths. They were like detectives who read every single book in the library before finding the clue. While this sometimes helped them solve harder problems, it was often a waste of time.
  • Failure looks messy: When the AI failed, the map showed chaotic patterns: lots of loops (spinning in circles), going back and forth between files, and trying to fix things that weren't broken.

5. The Superpower: Real-Time Intervention

The coolest part of the paper is that they didn't just watch the movies; they built a Live Coach.

  • How it works: As the AI is working, the Graphectory system watches in real-time.
  • The Intervention: If the system sees the AI starting to spin in circles or skip a crucial step (like testing), it pauses the AI and sends a diagnostic message: "Hey, you've been looking at this file for 10 minutes without changing anything. You might be stuck. Try running a test instead."
  • The Result: When they used this "Live Coach," the AI fixed more bugs (up to 23% more) and did it much faster. It was like having a GPS that says, "You're going in circles, turn around now," instead of waiting until the driver crashes.

Summary

This paper is about moving from asking "Did it work?" to asking "How did it work?"

By turning the AI's messy thought process into a clear map (Graphectory) and a simple summary (Langutory), the researchers can spot bad habits, fix them while the AI is working, and build smarter, more efficient software agents for the future. It's the difference between judging a chef only by the taste of the final dish versus watching them chop, cook, and season to see where they can improve their technique.