Covering Numbers for Deep ReLU Networks with Applications to Function Approximation and Nonparametric Regression

This paper establishes tight lower and upper bounds on the covering numbers of various deep ReLU networks, thereby filling a gap in the literature to provide fundamental insights into network capacity, improve nonparametric regression rates by removing a log6(n)\log^6(n) factor, and unify the relationship between function approximation and statistical estimation.

Weigutian Ou, Helmut Bölcskei

Published 2026-03-04
📖 5 min read🧠 Deep dive

Imagine you are trying to build a machine that can learn to draw any picture, recognize any face, or predict the weather. You have a giant box of Lego bricks (neural networks) to build this machine. But in the real world, you can't use an infinite number of bricks. You have limits:

  • Size: You can only use a certain number of layers (depth) and width.
  • Precision: Your bricks can only be cut to specific sizes (quantization).
  • Connectivity: You can't connect every brick to every other brick; some connections must be cut (sparsity).
  • Strength: The force you can apply with a brick is limited (bounded weights).

This paper is like a master architect's guidebook. It answers a very specific question: "Given these strict limits, how many different 'shapes' (functions) can my machine actually make?"

To answer this, the authors use a concept called Covering Numbers.

The "Blanket" Analogy: What is a Covering Number?

Imagine you have a giant, messy pile of sand (all the possible things your neural network could try to do). You want to cover this pile with a finite number of small, identical blankets (representing specific, pre-defined network settings).

  • The Goal: You want to use as few blankets as possible, but they must be big enough that no part of the sand is left uncovered.
  • The Count: The number of blankets you need is the Covering Number.
  • The Insight: If you need a million blankets, your network is very complex and can do many different things. If you only need ten, it's very simple and limited.

The paper's main achievement is figuring out the exact minimum and maximum number of blankets needed for different types of Lego sets (networks). Before this, people only knew the "maximum" (upper bound) but didn't know the "minimum" (lower bound). It was like knowing you might need a million blankets, but not knowing if you could get away with just ten. The authors proved that the number is actually right in the middle, tightly bounded.

The Three Big Discoveries

The paper explores three specific scenarios, using some clever metaphors:

1. The "Precision" Game (Quantization)

The Scenario: Imagine you are painting a picture, but you can only use 8 specific shades of blue instead of a full rainbow.
The Finding: The authors found a "tipping point."

  • If you want a very rough sketch (large error allowed), the fact that you have limited colors doesn't matter much; you can still make almost anything.
  • But if you want a hyper-realistic painting (tiny error allowed), the limited colors suddenly become a huge bottleneck. The network hits a "wall" where it simply cannot create more detail, no matter how many layers you add.
  • The Metaphor: It's like trying to write a novel using only the letters A, B, and C. For a short note, it's fine. For a 500-page book, you run out of combinations very quickly.

2. The "Sparse" Game (Connectivity)

The Scenario: Imagine a social network where everyone could talk to everyone, but to save money, you cut 90% of the phone lines.
The Finding: The authors showed that cutting connections (sparsity) is a very efficient way to shrink the network without losing too much ability.

  • They proved that the "complexity" of the network scales with the number of active connections, not the total possible connections.
  • The Metaphor: It's like a city with a massive grid of roads. If you close most of the side streets but keep the main highways open, the city still functions almost as well as before. The "traffic" (information) can still flow where it needs to.

3. The "Prediction" Game (Regression)

The Scenario: You are trying to predict the stock market based on past data. You want to know how many data points (samples) you need to make a good guess.
The Finding: This is the paper's "cherry on top." By knowing the exact complexity of the network (the blanket count), they could prove that deep neural networks are optimal at learning.

  • Previous research said, "You need NN data points, plus a huge penalty factor involving logs (like log6N\log^6 N)." It was like saying, "To learn this, you need 100 apples, plus a tax of 50 extra apples."
  • This paper removed the "tax." They proved you only need the 100 apples.
  • The Metaphor: They found a shortcut. They showed that deep networks are the most efficient learners possible, stripping away the unnecessary "overhead" that previous theories thought was required.

Why Does This Matter?

Think of this paper as the Physics of AI.

  • For Engineers: It tells you exactly how much you can compress a model (make it smaller/faster) before it starts breaking. It answers: "Can I run this AI on a tiny smartwatch?" The math says, "Yes, but here is the exact limit."
  • For Scientists: It unifies different theories. It shows that the ability to approximate a function (draw a curve) and the ability to learn from data (predict the future) are two sides of the same coin. If you understand one, you automatically understand the other.

The Bottom Line

This paper took a messy, complicated field of mathematics and cleaned it up. It replaced vague estimates with precise, tight boundaries.

  • Before: "We think the network is complex, but we aren't sure how complex."
  • After: "We know exactly how complex it is, how much we can compress it, and exactly how much data it needs to learn."

It's the difference between guessing how many grains of sand are on a beach and having a formula that tells you the exact number, down to the grain. This allows us to build better, faster, and more reliable AI systems.

Get papers like this in your inbox

Personalized daily or weekly digests matching your interests. Gists or technical summaries, in your language.

Try Digest →