Expressive Power of Implicit Models: Rich Equilibria and Test-Time Scaling

This paper theoretically proves and empirically validates that implicit models achieve rich equilibria and scale their expressive power with test-time compute, allowing compact architectures to match or exceed the performance of larger explicit networks across diverse domains.

Jialin Liu, Lisang Ding, Stanley Osher, Wotao Yin

Published 2026-03-03
📖 6 min read🧠 Deep dive

The Big Idea: The "Infinite Staircase" vs. The "Tall Tower"

Imagine you want to build a machine that can solve a very difficult puzzle.

The Old Way (Explicit Models):
Think of a standard AI model (like the ones in your phone or laptop) as a Tall Tower. To make the tower solve harder puzzles, you have to keep adding more floors (layers). If you want it to be super smart, you need a skyscraper.

  • The Problem: Building a skyscraper is expensive. It takes a lot of memory (bricks) and time to build. Once the tower is built, its height is fixed. If you want it to do something even harder later, you have to tear it down and build an even taller one.

The New Way (Implicit Models):
This paper introduces a different approach: The Infinite Staircase.
Instead of building a tall tower, you build a single, simple room with a staircase inside. You start at the bottom, take a step, look at the view, take another step, look again, and keep going until you reach the perfect spot (the "fixed point").

  • The Magic: You only built one room (one set of parameters), but you can climb as many steps as you want. The more steps you take (more "test-time compute"), the more complex the view becomes. You don't need to build a bigger room; you just need to walk further.

What Did the Authors Prove?

The authors asked two big questions:

  1. Can this simple staircase do everything the tall tower can do? (Yes.)
  2. Can the staircase do things the tower can't do without getting huge? (Yes!)

They proved mathematically that this "Infinite Staircase" (Implicit Model) can represent incredibly complex, jagged, and difficult functions (like a cliff with a sudden drop) using a very smooth, simple operator.

The Analogy of the "Smooth Painter":
Imagine you want to paint a picture of a jagged, lightning-bolt-shaped mountain.

  • The Explicit Tower: To paint the sharp, jagged edges, you need a massive, complex brush with thousands of tiny bristles (parameters).
  • The Implicit Staircase: You use a simple, smooth brush. But, you don't just swipe once. You swipe, then look at the result, adjust your hand slightly, and swipe again. You repeat this 100 times.
    • Step 1: The brush makes a smooth curve.
    • Step 10: The curve gets sharper.
    • Step 100: The curve looks exactly like the jagged lightning bolt.

The paper proves that by repeating this simple action enough times, you can create any shape, even ones that are mathematically "impossible" for a single smooth stroke. The complexity comes from the repetition, not the size of the tool.

Why Does This Matter? (The "Test-Time Scaling" Secret)

In the old world, if you wanted a smarter AI, you had to train a bigger model (more parameters). This is like buying a bigger car to go faster.

In this new world, you can keep the model small (the same car) but drive it longer (more iterations).

  • Test-Time Scaling: This is the fancy term for "spending more time thinking at the moment of answering."
  • The Result: A small, cheap model can outperform a giant, expensive model if you let the small model "think" for a few more seconds (iterations).

Real-World Examples from the Paper

The authors tested this "Staircase" idea in four different fields to prove it works:

  1. Image Restoration (Fixing Blurry Photos):

    • The Task: Take a blurry photo and make it sharp.
    • The Result: The implicit model started with a blurry guess. With every "step" (iteration), the image got sharper and sharper. Eventually, it produced a clearer image than a much larger, traditional model.
  2. Scientific Computing (Fluid Dynamics):

    • The Task: Predict how air or water flows around an object (like a plane wing).
    • The Result: The model started with a rough guess of the wind. As it "walked" up the stairs, the wind patterns became more detailed and accurate, matching complex physics equations better than larger models.
  3. Operations Research (Solving Math Puzzles):

    • The Task: Solving complex logistics problems (like how to deliver packages to 1,000 stores efficiently).
    • The Result: The model treated the problem as a graph. By iterating, it found better and better solutions, eventually beating larger models that were trained specifically for this.
  4. LLM Reasoning (AI Chatbots):

    • The Task: Answering tricky questions that require deep thinking (e.g., "What is the difference between 'charge' in physics vs. 'charge' in banking?").
    • The Result: At first, the AI just repeated the question. But as it "thought" longer (more iterations), it realized the context shifted from physics to finance and gave the correct, nuanced answer. The "thinking" process allowed it to separate the meanings.

The "Secret Sauce": Why It Works

The paper explains that the "Simple Operator" (the single room) is designed to be stable and smooth. It doesn't try to be complex immediately.

  • The Trap: If you force the operator to be complex from the start, it becomes unstable and hard to train.
  • The Solution: Keep the operator simple. Let the iterations do the heavy lifting. The complexity "emerges" naturally as you keep walking up the stairs.

The Takeaway for Everyone

This paper tells us that we don't always need bigger AI models.
Sometimes, we just need to let the models think longer.

  • Old Mindset: "Make the model bigger to make it smarter."
  • New Mindset: "Keep the model small and efficient, but let it iterate (think) more times when it needs to solve a hard problem."

It's the difference between hiring a giant team of people to solve a problem instantly versus hiring one very smart person who takes their time to think it through step-by-step. The paper proves that the "one smart person taking their time" can often do a better job than the giant team, using fewer resources.

Get papers like this in your inbox

Personalized daily or weekly digests matching your interests. Gists or technical summaries, in your language.

Try Digest →