Tool Receipts, Not Zero-Knowledge Proofs: Practical Hallucination Detection for AI Agents

Imagine you have a super-smart, highly efficient personal assistant named Naba. Naba can check your emails, look up stock prices, and book flights. But like any assistant who has read too many books and watched too many movies, Naba sometimes gets carried away.

Sometimes Naba says, "I checked your email and found 5 messages from Alice," when in reality, the email server only returned 3. Other times, Naba might say, "I called the bank," when it actually just made up a number to sound helpful. This is called hallucination, and it's a big problem because you might make financial decisions based on lies.

This paper introduces NabaOS, a new system designed to catch Naba in these lies without slowing things down.

Here is how it works, broken down into simple concepts:

1. The Problem: Why "Magic Proofs" Don't Work

Scientists have tried to solve this using something called Zero-Knowledge Proofs (think of these as "Math Magic Receipts"). These are like a super-complex cryptographic seal that proves a computer did the math correctly.

The Catch: Generating these magic seals takes minutes or even hours. If you ask Naba, "What's the weather?" and you have to wait 10 minutes for the "Math Magic" to prove the answer is real, you'll never use the assistant. It's too slow and requires expensive super-computers.

2. The Solution: The "Digital Receipt" System

Instead of heavy math magic, NabaOS uses a system inspired by ancient Indian philosophy (specifically the Nyaya school of logic), which classifies knowledge based on where it came from.

Think of NabaOS as a Restaurant Manager who doesn't trust the Chef (the AI) blindly. Instead, the Manager checks the Kitchen Receipts.

How it works:
1. When Naba wants to check your email, it asks the "Kitchen" (the tool) to do it.
2. The Kitchen does the work and immediately prints a signed receipt. This receipt says: "I checked the email. I found 3 messages. Here is a secret code proving I did it."
3. Naba then writes a report for you.
4. The Magic: Before showing you the report, NabaOS quickly checks the receipt.
  - If Naba says "I found 5 emails," but the receipt says "3," the system instantly flags it as a lie.
  - If Naba says "I checked the bank," but there is no receipt for a bank call, the system knows Naba made it up.

This check takes less than 15 milliseconds (faster than a blink), so you don't have to wait.

3. The "Trust Labels" (Not Just Yes/No)

Most security systems give you a simple "Safe" or "Unsafe" stamp. NabaOS is smarter. It gives you a Trust Label based on how Naba knows the information, using five categories:

Direct Sight (Pratyaks.a): "I saw 3 emails."
- Label: Fully Verified. (The receipt proves it).
Smart Guessing (Anumana): "Alice seems worried because her emails are short."
- Label: Mostly Verified. (The emails exist, but the "worry" is Naba's interpretation, not a fact).
Hearsay (Shabda): "Reuters says interest rates are rising."
- Label: Verified Source. (We checked that Naba actually visited the Reuters website).
Nothing Found (Abhava): "No emails from Bob."
- Label: Verified Absence. (The receipt proves the search came back empty).
Pure Opinion (Ungrounded): "I think Bob is a nice guy."
- Label: Unverified. (No tool was used; this is just Naba's opinion).

Why this matters: If you see "Mostly Verified," you know the facts are true, but the conclusion is Naba's guess. You can decide how much to trust that guess.

4. The "Deep Dive" Check

Sometimes Naba is an autonomous agent that goes on the internet to do complex tasks (like booking a trip). In these cases, NabaOS does a Spot Check.

If Naba says, "I found a flight on this website," NabaOS will independently visit that website to see if the flight actually exists.
If the website is fake or the flight is gone, NabaOS catches the lie.

5. The Results

The researchers tested this system on 1,800 different scenarios (including emails, stocks, and news in four different languages).

Success Rate: It caught 91% of the lies.
Speed: It added almost no delay (less than a blink).
Accuracy: When the system says a claim is "Fully Verified," it is correct 98.7% of the time.

The Big Takeaway

Current AI safety tries to prove the computer worked correctly using heavy math. NabaOS proves the story is true by checking the receipts.

It's the difference between asking a magician to prove they didn't use a hidden wire (hard and slow) versus asking them to show you the ticket stub for the train they claimed to take (fast, easy, and reliable).

By giving you these "receipts" and "trust labels," NabaOS helps you know exactly what to believe and what to double-check, making AI agents safe enough to use for real-life tasks like managing your money or health.

Tool Receipts, Not Zero-Knowledge Proofs: Practical Hallucination Detection for AI Agents

1. The Problem: Why "Magic Proofs" Don't Work

2. The Solution: The "Digital Receipt" System

3. The "Trust Labels" (Not Just Yes/No)

4. The "Deep Dive" Check

5. The Results

The Big Takeaway

1. Problem Statement: The Trust Gap in AI Agents

2. Methodology: NabaOS Framework

A. Epistemic Classification (Pramāṇa)

B. Tool Execution Receipts

C. Verification Protocol

3. Key Contributions

4. Experimental Results

5. Significance and Implications

Tool Receipts, Not Zero-Knowledge Proofs: Practical Hallucination Detection for AI Agents

1. The Problem: Why "Magic Proofs" Don't Work

2. The Solution: The "Digital Receipt" System

3. The "Trust Labels" (Not Just Yes/No)

4. The "Deep Dive" Check

5. The Results

The Big Takeaway

1. Problem Statement: The Trust Gap in AI Agents

2. Methodology: NabaOS Framework

A. Epistemic Classification (Pramāṇa)

B. Tool Execution Receipts

C. Verification Protocol

3. Key Contributions

4. Experimental Results

5. Significance and Implications

More like this

MASEval: Extending Multi-Agent Evaluation from Models to Systems

LDP: An Identity-Aware Protocol for Multi-Agent LLM Systems

Quantifying the Accuracy and Cost Impact of Design Decisions in Budget-Constrained Agentic LLM Search

Interpretable Markov-Based Spatiotemporal Risk Surfaces for Missing-Child Search Planning with Reinforcement Learning and LLM-Based Quality Assurance

AgentOS: From Application Silos to a Natural Language-Driven Data Ecosystem