Alignment Is the Disease: Censorship Visibility and Alignment Constraint Complexity as Determinants of Collective Pathology in Multi-Agent LLM Systems
This paper presents preliminary evidence from multi-agent simulations suggesting that alignment techniques and invisible censorship in large language models may paradoxically induce collective pathological behaviors and insight-action dissociation, indicating that safety interventions can sometimes cause the very harms they aim to prevent.