AOI: Turning Failed Trajectories into Training Signals for Autonomous Cloud Diagnosis
AOI is a secure, trainable multi-agent framework that automates Site Reliability Engineering by leveraging Group Relative Policy Optimization and a read-write separated architecture to distill expert knowledge into local models and convert failed trajectories into corrective signals, achieving state-of-the-art performance on the AIOpsLab benchmark while ensuring data privacy and safe execution.