An Approximation Theory Perspective on Machine Learning

This paper reviews the historical disconnect between approximation theory and machine learning practice, discusses emerging trends like deep networks and transformers, and introduces novel research enabling function approximation on unknown manifolds without requiring explicit manifold feature learning.

Hrushikesh N. Mhaskar, Efstratios Tsoukanis, Ameya D. Jagtap

Published 2026-03-05
📖 6 min read🧠 Deep dive

Imagine you are trying to teach a robot to recognize different animals. You show it thousands of pictures of cats, dogs, and birds, and it learns to label them correctly. This is the core of Machine Learning.

But there's a hidden problem: How does the robot actually "learn"? Is it just memorizing the pictures, or is it truly understanding the shape of a cat?

This paper, written by a team of mathematicians, argues that we've been teaching robots the wrong way. We've been relying on "trial and error" (guessing and checking) instead of using the rigorous math of Approximation Theory—a field that has been studying how to build the best possible models for centuries.

Here is the paper explained through simple analogies:

1. The Problem: The "Black Box" vs. The Blueprint

Currently, machine learning is like building a house by throwing bricks at a wall until a door appears. We know it works (the door is there!), but we don't fully understand why it works or if it will hold up in a storm (will it work on new data?).

The authors say: "Stop guessing! Let's use the blueprints."

  • Approximation Theory is the blueprint. It tells us exactly how many bricks (parameters) we need to build a wall that looks like a specific shape (the target function).
  • The Gap: Machine learning ignores these blueprints. It assumes that if we have enough data and a big enough computer, the answer will magically appear. The authors say this is dangerous because we don't know if our "house" will collapse when we move it to a new neighborhood (unseen data).

2. The "Curse of Dimensionality": The Infinite Maze

Imagine you are looking for a specific needle in a haystack.

  • Low Dimension (Easy): The haystack is a small square box. You can find the needle easily.
  • High Dimension (The Curse): Now imagine the haystack is a cube, then a hyper-cube with 100 dimensions. The volume of the space grows so fast that the "needle" becomes invisible. To find it, you would need to check more points than there are atoms in the universe.

In machine learning, data often lives in these high-dimensional spaces (like a photo with millions of pixels). Traditional math says: "You can't solve this; it's impossible."
The Paper's Insight: Real-world data isn't scattered randomly in this infinite maze. It's actually hiding on a tiny, hidden path (a "manifold") inside that huge space.

  • Analogy: Imagine a 3D room filled with fog. The fog represents all possible data points. But the actual data (the people in the room) are only walking on a single, thin wire suspended in the middle. If you know they are on the wire, you don't need to search the whole room; you just follow the wire.

3. The New Solution: Learning Without "Learning" the Map

Usually, to find that hidden wire (the manifold), we try to map the whole room first. We calculate the shape of the wire, its curves, and its twists. This is slow and error-prone.

The authors propose a New Paradigm:

  • Old Way: "Let's map the wire first, then walk on it." (This is like trying to learn the geometry of the data before solving the problem).
  • New Way: "Let's just throw a net over the wire and pull it up."
    • They developed a method to approximate the function (the wire) directly from the data points without ever needing to calculate the shape of the wire itself.
    • Analogy: Instead of trying to draw a perfect map of a winding mountain road, you just place a series of stepping stones (kernels) along the path. You can walk from start to finish without ever knowing the road's exact curvature.

4. Classification as "Signal Separation"

How do we tell a cat from a dog?

  • Old View: Draw a line between the two groups.
  • New View (Signal Separation): Imagine you are at a party with three different music bands playing at once. Your goal isn't to draw a line between the bands; it's to separate the sounds so you can hear each band clearly.
    • The authors suggest that classifying data is like separating these audio tracks. You don't need to know exactly where the "cat" ends and the "dog" begins; you just need to isolate the "cat signal" from the "dog signal."
    • This allows the system to learn with very few examples (labels), because it's looking for the structure of the sound, not just memorizing the notes.

5. Transformers and Attention: The "Spotlight"

You've heard of AI like ChatGPT using "Transformers." These use an "attention mechanism" to decide which words in a sentence are important.

  • The paper explains that this "attention" is actually just a fancy version of a local kernel.
  • Analogy: Imagine you are reading a book. A "local kernel" is like a magnifying glass that focuses on a specific paragraph. The "attention mechanism" is just a very smart magnifying glass that knows exactly which paragraph to focus on based on the context. The authors show that this isn't magic; it's just a specific type of mathematical tool we've known about for a long time, applied in a clever way.

6. The "Physics" of AI (PINNs)

Sometimes, we want AI to solve physics problems (like how air flows over a wing).

  • Old Way: Feed the AI millions of simulation results and hope it learns the laws of physics.
  • New Way (Physics-Informed): Tell the AI the laws of physics (the equations) and say, "Your answer must obey these rules."
    • Analogy: Instead of letting a student guess the answer to a math problem by looking at a thousand examples, you give them the formula and say, "Use this formula." The AI becomes much more efficient and accurate because it's not just guessing; it's following the rules of the universe.

Summary: What Should We Take Away?

This paper is a call to action for the math community and the AI community to shake hands.

  1. Stop treating AI like a black box. We need to use the rigorous math of approximation theory to understand why AI works.
  2. Don't fear high dimensions. Real data lives on hidden, low-dimensional paths. We can find them without mapping the whole universe.
  3. Simplify the process. We can build models that learn directly from data without needing to first "learn the shape" of the data.
  4. Think differently about classification. Instead of drawing lines, think about separating signals.

The Bottom Line: Machine learning has been running on a powerful engine (data and compute), but it's been driving without a map. This paper provides the map, showing us how to navigate the complex world of data using the timeless tools of mathematics.

Get papers like this in your inbox

Personalized daily or weekly digests matching your interests. Gists or technical summaries, in your language.

Try Digest →