Why Adam Can Beat SGD: Second-Moment Normalization Yields Sharper Tails
This paper provides the first theoretical proof that Adam's second-moment normalization yields significantly sharper high-probability convergence guarantees ( dependence) compared to SGD ( dependence) under the classical bounded variance model, thereby explaining its empirical superiority.