AOI: Turning Failed Trajectories into Training Signals for Autonomous Cloud Diagnosis

AOI is a secure, trainable multi-agent framework that automates Site Reliability Engineering by leveraging Group Relative Policy Optimization and a read-write separated architecture to distill expert knowledge into local models and convert failed trajectories into corrective signals, achieving state-of-the-art performance on the AIOpsLab benchmark while ensuring data privacy and safe execution.

Pei Yang, Wanyi Chen, Asuka Yuxi Zheng, Xueqian Li, Xiang Li, Haoqin Tu, Jie Xiao, Yifan Pang, Dongdong Zhang, Fuqiang Li, Alfred Long, Bill Shi, Lynn Ai, Eric Yang

Published 2026-03-06
📖 5 min read🧠 Deep dive

Imagine you have a very smart, but slightly reckless, digital assistant named AOI (Autonomous Operations Intelligence). Its job is to act as a "Site Reliability Engineer" (SRE)—basically, the digital firefighter who keeps massive cloud systems (like Netflix or Amazon) running smoothly when things break.

The problem is that giving a super-intelligent AI direct control over a live server is dangerous. If it guesses wrong, it could accidentally delete data or crash the whole system. Also, most companies can't let AI see their secret internal data to learn how to fix things.

This paper introduces a clever new way to train this AI so it becomes a safe, expert-level mechanic without ever needing to see the company's secrets or risk breaking anything.

Here is how AOI works, explained through a simple analogy:

1. The "Three-Person Team" (Safety First)

Imagine a high-stakes surgery. You wouldn't let the person who decides the plan also hold the scalpel and the person who checks the patient also be the one who operates. You need separation of duties.

AOI splits the work into three specialized roles:

  • The Observer (The Brain): This is the planner. It looks at the symptoms (error logs) and decides what to do next. Crucially, it cannot touch the live system. It's like a doctor who can only look at X-rays and write a prescription, but cannot perform surgery.
  • The Probe (The Detective): This agent is allowed to look and ask questions (read-only). It can check logs, ask "How many users are online?" or "Is the server hot?" but it cannot change anything. It gathers evidence for the Observer.
  • The Executor (The Surgeon): This agent is the only one allowed to change things (write actions), like restarting a server or deleting a file. But it has a strict rule: It only acts if the Observer gives it a specific, verified order. It's like a surgeon who only cuts when the lead doctor explicitly says, "Cut here, now."

Why this matters: Even if the AI gets confused or hallucinates, it can't accidentally delete the database because the "Brain" and the "Hands" are separated. The "Hands" only move when the "Brain" is 100% sure.

2. Learning from Mistakes (The "Evolver")

Usually, when an AI fails a task, we just throw that attempt in the trash. "Oh well, try again."

AOI does something smarter. It has a special component called the Evolver.

  • The Analogy: Imagine a student taking a driving test. They fail because they hit a cone. Instead of just saying "You failed," the Evolver is like a super-coach who watches the video of the crash, figures out exactly where the student went wrong, and rewrites the driving instructions to say, "Next time, turn left before the cone, not after."
  • The Magic: The Evolver takes these "failed attempts" and turns them into corrected training guides. It teaches the AI: "Don't do that specific mistake again."
  • The Result: The AI gets better not just by practicing success, but by systematically learning from its failures. It turns "bad data" into "gold."

3. The Training Method (GRPO)

How does the AI learn these lessons? The paper uses a method called GRPO (Group Relative Policy Optimization).

  • The Analogy: Imagine a teacher giving a student 4 different answers to a math problem. The teacher doesn't just say "Right" or "Wrong." Instead, the teacher says, "Answer A is okay, but Answer B is great, and Answer C is terrible."
  • The AI looks at all 4 answers it generated, compares them, and learns to favor the "great" ones over the "terrible" ones. It doesn't need a human to grade every single step; it learns by comparing its own ideas against each other. This allows it to learn from a small amount of data very quickly.

The Results: Why is this a big deal?

The researchers tested this system on a benchmark called AIOpsLab, which simulates 86 different cloud disasters.

  1. Safety: Because of the "Three-Person Team," the AI could explore dangerous scenarios without actually breaking anything.
  2. Performance:
    • Without any special training, the system was already 24% better than the previous best AI.
    • After training on just a few examples, the system (using a smaller, cheaper 14-billion parameter model) beat the massive, expensive "Claude Sonnet 4.5" model (which is one of the smartest AIs in the world) at solving these problems.
  3. Reliability: By using the Evolver to fix failed attempts, the system became much more consistent. It didn't just get lucky once; it learned how to solve the problem reliably every time.

The Bottom Line

This paper solves the "Trust vs. Capability" problem in AI.

  • Old Way: Use a huge, expensive AI, but keep it on a tight leash so it doesn't break things. It stays dumb because it can't learn from its mistakes safely.
  • AOI Way: Build a system where safety is built into the architecture (separating thinking from acting) and where mistakes are treated as valuable lessons. This allows a smaller, cheaper AI to become an expert engineer that is safer and smarter than the giants.

In short: AOI teaches AI to be a cautious, learning mechanic that gets better every time it drops a wrench, rather than a reckless genius that breaks the car.

Get papers like this in your inbox

Personalized daily or weekly digests matching your interests. Gists or technical summaries, in your language.

Try Digest →