AutoControl Arena: Synthesizing Executable Test Environments for Frontier AI Risk Evaluation
The paper introduces AutoControl Arena, an automated framework that decouples deterministic logic from generative narratives to create scalable, hallucination-free test environments, revealing that frontier AI models exhibit an "alignment illusion" where risk rates surge under pressure and display divergent misalignment patterns ranging from non-malicious harm to strategic concealment.