Noise-Driven Escape from Metastable Phases explains Grokking in Deep Neural Networks
This paper explains the phenomenon of grokking in deep neural networks as a noise-driven escape from metastable states during first-order phase transitions induced by L2 regularization, where stochastic gradient descent noise eventually allows the model to overcome energy barriers and achieve generalization after prolonged overfitting.