ESAA-Security: An Event-Sourced, Verifiable Architecture for Agent-Assisted Security Audits of AI-Generated Code

Here is an explanation of the ESAA-Security paper, translated into everyday language with creative analogies.

The Big Problem: The "Magic" That Can't Be Trusted

Imagine you hire a super-smart, magical intern (an AI) to write code for your bank. It works incredibly fast. But here's the catch: the intern might accidentally leave the front door unlocked, hide a spare key under the mat, or forget to lock the vault.

In the past, when we asked AI to check its own work, we just said, "Hey, look for bugs!" The AI would chat back, "I found three things!" and write a paragraph about them.
The problem?

It's a black box: We don't know how it found them. Did it guess? Did it hallucinate?
It's messy: If you ask it again tomorrow, it might give you a different answer.
No proof: If the bank gets hacked, you can't prove the AI actually checked the door or just made it up.

The Solution: ESAA-Security (The "Flight Recorder" for Code)

The authors propose ESAA-Security. Think of this not as a chat with a smart intern, but as a strictly regulated construction site with a "Flight Recorder" (like on an airplane) that never deletes anything.

Here is how it works, broken down into simple concepts:

1. The "Black Box" vs. The "Event Log"

Old Way: You ask the AI, "Is this code safe?" It thinks for a second and says, "Yes, mostly." You have to take its word for it.
ESAA Way: The AI doesn't just "think." It has to write down every single step it takes in a permanent, unchangeable diary (called an Event Log).
- Analogy: Imagine a detective solving a crime. In the old way, the detective just tells you the verdict. In ESAA, the detective must tape-record every clue they find, every door they open, and every fingerprint they lift. If the recording stops, the case is invalid.

2. The "Orchestrator" (The Strict Foreman)

In this system, the AI (the agent) isn't the boss. There is a Foreman (the Orchestrator).

The AI wants to say, "I found a hole in the wall!"
The Foreman stops it and says, "Hold on. Did you follow the rules? Did you check the right spot? Is your report in the correct format?"
If the AI tries to skip steps or write a messy report, the Foreman rejects it. Nothing gets saved unless it passes the rules.

3. The "Replay" Feature (The Time Machine)

Because every step is recorded in that permanent diary, you can hit "Rewind" at any time.

Analogy: Imagine a video game where you can replay a level. If the AI says, "I checked the password security," you can rewind the tape, watch the AI check the password, and verify it actually did the job correctly.
This makes the audit reproducible. You aren't trusting the AI's memory; you are trusting the recorded evidence.

How the Audit Happens (The 4-Step Assembly Line)

Instead of a free-flowing conversation, ESAA-Security turns security checking into a factory assembly line with four distinct stations:

Reconnaissance (The Scout): The AI maps out the building. "Okay, we have a front door, a back window, and a server room."
Domain Audit (The Inspectors): Specialized teams check specific areas. One team checks the locks (Authentication), another checks the windows (Input Validation), another checks the wiring (Cryptography).
Risk Classification (The Graders): All the findings are sorted. "This is a critical hole (CRITICAL)," "This is a loose screw (LOW)." They create a "Risk Matrix" (a map of danger).
Final Reporting (The CEO Briefing): The system generates a final report. It's not just a chat; it's a structured document with a score (0–100), a list of fixes, and an executive summary.

Why This Matters (The "So What?")

The paper argues that in the age of AI-generated code, trust is the new currency.

Before: We trusted the AI because it sounded confident.
Now: We trust the process because the AI had to follow a strict contract, leave a paper trail, and pass a "replay" test.

If the AI makes a mistake, the system catches it. If the AI tries to cheat, the system rejects it. The final report isn't just an opinion; it's a verifiable fact built on a chain of evidence.

The Bottom Line

ESAA-Security is like turning a chaotic, unregulated chat with a genius into a courtroom trial.

The AI is the witness.
The "Event Log" is the court transcript.
The "Orchestrator" is the judge.
The "Replay" is the appeal process.

The result? A security audit you can actually trust, even when the work was done by a machine.

ESAA-Security: An Event-Sourced, Verifiable Architecture for Agent-Assisted Security Audits of AI-Generated Code

The Big Problem: The "Magic" That Can't Be Trusted

The Solution: ESAA-Security (The "Flight Recorder" for Code)

1. The "Black Box" vs. The "Event Log"

2. The "Orchestrator" (The Strict Foreman)

3. The "Replay" Feature (The Time Machine)

How the Audit Happens (The 4-Step Assembly Line)

Why This Matters (The "So What?")

The Bottom Line

1. Problem Statement

2. Methodology: The ESAA-Security Architecture

A. Architectural Principles

B. The Execution Protocol

C. Scope and Coverage

3. Key Contributions

4. Results and Outputs

5. Significance and Implications

ESAA-Security: An Event-Sourced, Verifiable Architecture for Agent-Assisted Security Audits of AI-Generated Code

The Big Problem: The "Magic" That Can't Be Trusted

The Solution: ESAA-Security (The "Flight Recorder" for Code)

1. The "Black Box" vs. The "Event Log"

2. The "Orchestrator" (The Strict Foreman)

3. The "Replay" Feature (The Time Machine)

How the Audit Happens (The 4-Step Assembly Line)

Why This Matters (The "So What?")

The Bottom Line

1. Problem Statement

2. Methodology: The ESAA-Security Architecture

A. Architectural Principles

B. The Execution Protocol

C. Scope and Coverage

3. Key Contributions

4. Results and Outputs

5. Significance and Implications

More like this

MASEval: Extending Multi-Agent Evaluation from Models to Systems

LDP: An Identity-Aware Protocol for Multi-Agent LLM Systems

Quantifying the Accuracy and Cost Impact of Design Decisions in Budget-Constrained Agentic LLM Search

Interpretable Markov-Based Spatiotemporal Risk Surfaces for Missing-Child Search Planning with Reinforcement Learning and LLM-Based Quality Assurance

AgentOS: From Application Silos to a Natural Language-Driven Data Ecosystem