Controlled LLM Training on Spectral Sphere
The paper introduces the Spectral Sphere Optimizer (SSO), a novel parallel training algorithm that enforces strict module-wise spectral constraints on both weights and updates to achieve full Maximal Update Parametrization alignment, resulting in superior convergence, stability, and performance across diverse large-scale architectures compared to AdamW and Muon.