Session Risk Memory (SRM): Temporal Authorization for Deterministic Pre-Execution Safety Gates

The Big Problem: The "Good Step" Trap

Imagine you hire a very strict security guard to watch a bank vault. This guard has a rulebook: "If you try to open the vault door with a crowbar, stop you immediately."

This works great for obvious crimes. But what if a thief doesn't use a crowbar? What if they do something like this:

Step 1: Ask for the map of the building (Totally legal).
Step 2: Ask for the schedule of the guards (Totally legal).
Step 3: Ask for the combination to the back door (Totally legal).
Step 4: Ask for the master key (Totally legal).
Step 5: Walk out with the money.

If your security guard only looks at one step at a time, they will let every single request pass because each one looks innocent on its own. The thief wins because the guard is "blind" to the story the steps are telling when put together.

In the world of AI, this is exactly what happens. AI agents (digital workers) are given safety "gates" that check if a single action is safe. But if a bad actor tricks the AI into doing a long, slow sequence of safe-looking actions to steal data, the old safety gates miss it.

The Solution: Session Risk Memory (SRM)

The paper introduces a new system called Session Risk Memory (SRM). Think of SRM not as a new guard, but as a detective who keeps a notebook.

While the original security guard (the "Stateless Gate") only looks at the person standing in front of them right now, the detective (SRM) looks at the whole story of what that person has been doing for the last few minutes.

Here is how it works, broken down into simple concepts:

1. The "Two-Part" Safety System

The authors say safety has two dimensions:

Spatial Safety (The Guard): "Is this specific action okay?" (e.g., "Can I print this document?")
Temporal Safety (The Detective): "Is this sequence of actions okay?" (e.g., "Why did you print 500 pages, then email them to a stranger, then delete the logs?")

SRM adds the "Temporal" layer without changing the "Spatial" guard. They work together.

2. How the Detective Keeps Score

SRM uses a clever trick called a "Semantic Centroid." Imagine the AI's behavior is a cloud of dust.

The Baseline: When the AI starts its shift, the detective watches for a few minutes to see what "normal" looks like. If the AI is a "Security Analyst," it's normal for them to look at sensitive files. If it's a "Salesperson," looking at sensitive files is weird. SRM learns this baseline so it doesn't panic at normal behavior.
The Drift: As the AI takes more steps, the detective checks: "Is the AI moving away from its normal path?"
- If the AI is just doing its job, the path stays steady.
- If the AI starts doing a "slow burn" attack (like the thief in the bank example), the path starts to drift.

3. The "Risk Accumulator"

Every time the AI takes a step that feels slightly "off" compared to its recent history, the detective adds a tiny drop of "Risk Water" to a bucket.

The Bucket: This bucket represents the total risk of the session.
The Threshold: If the bucket fills up past a certain line, the detective yells, "STOP! This whole session is suspicious!" and shuts it down.
The Magic: Because the bucket fills up slowly, the system ignores tiny, harmless mistakes. But if someone is trying to sneak a big attack through many small steps, the bucket eventually overflows, and the attack is caught.

Why This is a Big Deal

The paper tested this on 80 different scenarios (some good, some bad). Here is what happened:

The Old Guard (Stateless Gate): Caught 100% of the bad guys, but it also accidentally stopped 5% of the good guys (False Positives). It was too jumpy.
The New Team (Guard + Detective): Caught 100% of the bad guys AND stopped stopping the good guys (0% False Positives).

The Analogy:
The old system was like a bouncer who kicks you out if you sneeze too loudly. The new system is a bouncer who knows that if you sneeze once, it's fine. But if you sneeze, then cough, then run to the fire exit, then try to open the safe, that's when they call the police.

The Bottom Line

This paper proves that we don't need complex, slow, "black box" AI to catch smart hackers. We can use simple, fast, math-based rules to remember the history of an AI's actions.

Speed: It's incredibly fast (less than a quarter of a millisecond per check).
Simplicity: It doesn't need to be "taught" or trained; it just does the math.
Safety: It stops "slow-burn" attacks that sneak past traditional safety checks, while making sure we don't accidentally block legitimate workers.

In short: SRM teaches the AI safety system to look at the whole movie, not just a single frame.

1. Problem Statement

Autonomous AI agents are increasingly deployed in complex, multi-turn enterprise workflows. While deterministic pre-execution safety gates (like the ILION framework) effectively evaluate individual actions for semantic compatibility with an agent's role, they suffer from a critical structural limitation: statelessness.

The Blind Spot: Stateless gates evaluate each action in isolation. They are effective at blocking overtly malicious single-step actions but are vulnerable to distributed "slow-burn" attacks.
The Attack Vector: Adversaries can decompose harmful intent (e.g., data exfiltration, privilege escalation) into a sequence of individually compliant steps. Each step passes the per-action gate, but the cumulative trajectory constitutes a security violation.
The Gap: There is a lack of a lightweight, deterministic mechanism to evaluate temporal authorization consistency (coherence of the action trajectory over time) without introducing probabilistic inference, model training, or significant latency.

2. Methodology: Session Risk Memory (SRM)

The paper introduces Session Risk Memory (SRM), a deterministic, temporal extension module that operates alongside existing stateless gates. SRM does not replace the underlying gate but adds a session-level risk accumulation layer.

Core Architecture

SRM maintains a compact state for each agent session consisting of:

Semantic Centroid ( $c_t$ ): A smoothed summary of the agent's behavioral trajectory, updated via Exponential Moving Average (EMA).
Baseline Risk ( $b_t$ ): A session-specific estimate of the "normal" risk level, derived during a warmup phase to prevent false positives for roles with inherently high-risk vocabularies (e.g., security analysts).
Accumulated Risk ( $R_t$ ): A running risk score that decays over time but spikes when deviations occur.

Mathematical Formulation

For each turn $t$ in a session:

Drift Calculation: Computes the cosine distance ( $\Delta_t$ ) between the current action vector ( $v_t$ ) and the previous session centroid ( $c_{t-1}$ ).
Baseline Subtraction: The raw gate risk ( $g_t$ $g_{t}$ ) is adjusted by subtracting the frozen baseline risk ( $b_K$ $b_{K}$ ) established during the first $K$ $K$ turns (warmup). This ensures SRM detects escalation rather than absolute risk magnitude.
- $\hat{g}_t = \max(0, g_t - b_K)$
Risk Signal: Combines drift and adjusted gate risk:
- $r_t = w_\delta \cdot \Delta_t + w_g \cdot \hat{g}_t$
Accumulation: The session risk $R_t$ $R_{t}$ is updated via EMA:
- $R_t = \lambda \cdot R_{t-1} + (1 - \lambda) \cdot r_t$
Decision: The session is blocked if $R_t \geq \tau$ (threshold).

Design Principles

Determinism: All operations are closed-form algebraic; no probabilistic inference or training.
Complementarity: SRM is an additive layer; it does not modify the underlying gate's thresholds or logic.
Latency: Designed to operate with sub-millisecond overhead (<250 $\mu$ s per turn).
Modularity: Can be enabled/disabled per deployment without altering the core gate.

3. Key Contributions

Conceptual Framework: Introduces a decomposition of authorization safety into two orthogonal dimensions:
- Spatial Consistency: Per-action compatibility (handled by stateless gates).
- Temporal Consistency: Trajectory coherence over time (handled by SRM).
Deterministic Mechanism: A novel, training-free algorithm for session-level risk accumulation using EMA and baseline subtraction.
Baseline Correction: A specific mechanism to normalize risk across different agent roles, eliminating false positives caused by role-specific high-risk vocabularies.
Empirical Validation: Demonstrated on a custom 80-session benchmark covering slow exfiltration, privilege escalation, and compliance drift.

4. Experimental Results

The system was evaluated on ILION-SRM-Bench v1, comprising 80 sessions (40 benign, 40 attack) across three attack categories.

Metric	Stateless ILION	ILION + SRM
Detection Rate (Recall)	100%	100%
False Positive Rate (FPR)	5%	0%
Precision	0.9524	1.0000
F1 Score	0.9756	1.0000
Avg. Detection Turn	4.05	4.45
Overhead	<1 ms	<250 $\mu$ s

False Positive Elimination: SRM completely eliminated the 5% false positive rate of the stateless gate, which occurred because certain benign roles (e.g., security analysts) naturally triggered high-risk scores on individual actions.
Attack Detection: SRM maintained 100% recall. While the average detection turn was slightly later (4.45 vs. 4.05) due to the conservative accumulation requirement, SRM successfully detected 12.5% of attack sessions earlier than the stateless gate in specific "slow-burn" scenarios where no single action triggered a block.
Latency: The per-turn overhead was measured at 239.9 microseconds, confirming suitability for real-time pre-execution gating.

5. Significance and Implications

Defense in Depth: The paper establishes that robust agentic safety requires both spatial (per-action) and temporal (per-trajectory) verification. SRM provides the missing temporal dimension without compromising the speed or determinism of the base system.
Zero-Training Safety: SRM proves that sophisticated multi-turn attack detection can be achieved without Large Language Models (LLMs), fine-tuning, or probabilistic classifiers, relying instead on geometric vector operations and algebraic accumulation.
Practical Deployment: The "baseline subtraction" mechanism is a critical innovation for enterprise adoption, allowing safety gates to be deployed across diverse roles (from HR to Security) without tuning thresholds for each specific job function.
Future Directions: The framework is designed to be compatible with higher-dimensional continuous embeddings (e.g., from transformers), which would likely make the "semantic drift" ( $\Delta_t$ ) component more informative than it was with the current 21-dimensional keyword-based vectors.

In conclusion, Session Risk Memory offers a principled, deterministic solution to the "slow-burn" attack problem in AI agents, achieving perfect precision and recall in controlled benchmarks while maintaining the low-latency requirements of enterprise systems.