From Features to Actions: Explainability in Traditional and Agentic AI Systems

This paper argues that traditional attribution-based explainability methods, while effective for static predictions, fail to diagnose failures in agentic AI systems, necessitating a shift toward trace-based diagnostics that reveal state tracking inconsistencies as a primary cause of execution breakdowns.

Sindhuja Chaduvula, Jessee Ho, Kina Kim, Aravind Narayanan, Mahshid Alinoori, Muskan Garg, Dhanesh Ramachandram, Shaina Raza

Published 2026-03-09
📖 6 min read🧠 Deep dive

The Big Picture: From a Snapshot to a Movie

Imagine you are trying to understand why a car crashed.

Traditional AI is like looking at a single photograph of the car right after the crash. You can see the crumpled hood and the broken headlight. You can point to the damage and say, "Ah, the front bumper hit the tree." This is what current AI explainability tools (like SHAP or LIME) do. They look at the final answer and tell you which words or numbers in the input were most important for that specific result.

Agentic AI (the new kind of AI) is like a full movie of the car driving for an hour before it crashed. The car didn't just hit the tree; it took a wrong turn, ignored a stop sign, got confused by a detour, and then finally crashed. If you only look at the final photo (the crash), you miss the whole story. You don't know why the driver took that wrong turn in the first place.

This paper argues that our current tools for explaining AI are stuck in the "photograph" era, but we need to upgrade to the "movie" era to understand these new, complex AI agents.


The Problem: The "Snapshot" vs. The "Journey"

1. The Old Way (Static Predictions)
Think of a traditional AI like a fortune teller. You give it a crystal ball (the input), and it tells you your future (the output).

  • The Explanation: If the fortune teller says "You will lose your job," a traditional explanation tool looks at the crystal ball and says, "It was because you mentioned 'layoffs' and 'budget cuts'."
  • The Flaw: This works fine for a one-time prediction. But it doesn't explain how the fortune teller got there if they had to ask you five questions, check a newspaper, and call a friend first.

2. The New Way (Agentic Systems)
Think of a modern AI agent like a travel agent planning a complex trip for you.

  • The Process: The agent doesn't just give you a ticket. It searches for flights, checks your passport validity, books a hotel, realizes the hotel is full, switches to a different one, tries to book a car, fails because the credit card is declined, and then tries a different card.
  • The Failure: If the trip fails, it's not because of one bad word in your request. It's because the agent forgot your passport number in step 3, or it picked the wrong credit card in step 5.
  • The Gap: Traditional tools try to explain the failure by looking at your original request ("You said 'cheap flight'"). But the real problem happened three steps ago when the agent got confused. The old tools can't see the "movie" of the journey; they only see the "snapshot" of the final failure.

The Experiment: Testing the Tools

The researchers ran two experiments to prove their point:

Experiment A: The Static Test (The Photo)
They used a simple AI to classify job postings as "IT" or "Non-IT."

  • Result: The traditional tools worked great. They could consistently point out which words (like "software" or "accounting") decided the answer. It was like a reliable photo analysis.

Experiment B: The Agentic Test (The Movie)
They used advanced AI agents to perform complex tasks, like booking an airline flight or navigating a website.

  • Result: The traditional tools failed miserably. They couldn't tell you why the agent failed to book the flight. They couldn't see that the agent had forgotten the passenger's name in step 2, leading to a crash in step 10.
  • The New Solution: The researchers used a new method called "Trace-Based Diagnostics." Instead of looking at the input, they watched the entire movie (the execution trace). They created a checklist (a "rubric") to see exactly where the agent messed up.

The Key Discoveries

1. The "State Drift" Problem
In the airline booking task, the agents often failed because they lost track of their own "memory."

  • Analogy: Imagine a chef cooking a complex meal. They chop the onions, then walk away to answer the phone. When they come back, they forget they already chopped the onions and chop them again, or they forget to add salt because they lost their place in the recipe.
  • Finding: The agents didn't fail because they were "dumb"; they failed because they lost track of the state of the world. Traditional tools couldn't see this "forgetfulness."

2. The "Wrong Turn" Problem
In the web-navigation task, agents failed because they picked the wrong tool immediately.

  • Analogy: Imagine trying to open a door. If you pick the wrong key (the wrong tool) on the first try, you can't open the door, no matter how hard you try later.
  • Finding: These were "fast failures." One wrong decision early on doomed the whole mission.

3. The "Minimal Explanation Packet" (MEP)
The authors propose a new standard for explaining AI. Instead of just giving a reason, we need a packet that includes:

  • The Artifact: The explanation (e.g., "The agent failed").
  • The Evidence: The proof (e.g., "Here is the log showing the agent forgot the passenger's name at 2:03 PM").
  • The Verification: A check to make sure the explanation is true (e.g., "We replayed the video, and yes, the agent did forget the name").

Why This Matters

If you are a doctor using an AI to diagnose a patient, or a bank using an AI to approve loans, you don't just want to know what the AI decided. You need to know how it got there.

  • Old AI: "I denied your loan because your credit score was low." (Snapshot)
  • New AI (with this paper's method): "I denied your loan because I tried to call your bank, got a timeout error, assumed you had no income, and then made a decision based on that wrong assumption. Here is the log of the error." (Movie)

The Takeaway

We are moving from an era of Static Predictors (AI that answers a single question) to Agentic Systems (AI that takes actions over time).

To trust these new agents, we can't just look at the final answer. We need to watch the whole movie, check the script, and verify the actor's memory at every step. The paper provides the tools to build that "movie camera" for AI, ensuring that when things go wrong, we know exactly where and why the plot twisted.

Get papers like this in your inbox

Personalized daily or weekly digests matching your interests. Gists or technical summaries, in your language.

Try Digest →