Memorization capacity of deep ReLU neural networks characterized by width and depth
This paper establishes the optimal trade-off between width and depth for deep ReLU neural networks to memorize separated data points, proving that the product of the squared width and squared depth must scale as .