Learnability Window in Gated Recurrent Neural Networks
This paper establishes a statistical theory demonstrating that the maximal temporal horizon for learning in gated recurrent neural networks is determined by the interplay between the decay rate of an effective learning rate envelope and the concentration properties of heavy-tailed gradient noise, yielding distinct logarithmic, polynomial, or exponential scaling regimes for learnability.