Last-Iterate Convergence of Randomized Kaczmarz and SGD with Greedy Step Size

This paper establishes a O(1/t3/4)O(1/t^{3/4}) last-iterate convergence rate for SGD with greedy step size over smooth quadratics in the interpolation regime, improving upon the previous O(1/t1/2)O(1/t^{1/2}) bound by introducing stochastic contraction processes and analyzing them through a discrete-to-continuous reduction.

Original authors: Michał Derezinski, Xiaoyu Dong

Published 2026-04-14
📖 5 min read🧠 Deep dive

This is an AI-generated explanation of the paper below. It is not written or endorsed by the authors. For technical accuracy, refer to the original paper. Read full disclaimer

Imagine you are trying to find the perfect spot to park your car in a massive, crowded parking lot. You can't see the whole lot at once; you can only look at one row at a time.

This is the problem Stochastic Gradient Descent (SGD) solves. It's the engine behind how computers learn from data (like training AI to recognize cats or solve math problems). Instead of looking at the entire dataset to make a move, it takes a tiny, random peek, makes a guess, and adjusts its position.

The specific algorithm this paper focuses on is called Randomized Kaczmarz. Think of it as a game of "Hot and Cold" where you are trying to solve a giant puzzle of linear equations. You pick one clue (equation) at random, adjust your guess to satisfy that clue, and then move on to the next random clue.

The Big Question: "Last-Iterate" vs. "Average"

For decades, mathematicians have known that if you take the average of all your guesses over time, you will eventually get very close to the perfect answer. It's like saying, "If I take 1,000 guesses at the parking spot and average them out, I'll be right in the middle of the spot."

But in the real world, we don't want to wait until the end to average everything out. We want to know: Is the very last guess I made (the "last iterate") actually good?

Imagine you are walking down a hallway trying to find a specific door.

  • The Old Way (Average): You walk back and forth, leaving a trail of footprints. At the end, you measure the center of all your footprints. That's where the door is.
  • The New Way (Last Iterate): You want to know if the spot where you are standing right now (after your last step) is already close to the door.

For a long time, we didn't know if the "Last Step" was good enough when using a specific, aggressive strategy called the "Greedy Step Size." This is like taking the biggest possible step you can without overshooting the target. It's the most efficient way to move, but it's also the riskiest. Previous research suggested that if you took this big step, your last guess might be a bit shaky, only getting better at a slow pace (like 1/t1/\sqrt{t}).

The Breakthrough: A Faster Finish Line

The authors of this paper, Michał Dereziński and Xiaoyu Dong, proved that you don't need to average your guesses. If you use this "Greedy Step Size," your very last guess is actually much better than we thought.

They showed that the error shrinks at a rate of 1/t3/41/t^{3/4}.

To put that in perspective:

  • If the old rate was like walking up a gentle hill, the new rate is like walking up a steeper, faster hill.
  • If you double the number of steps you take, your accuracy improves significantly more than before.

The Secret Sauce: The "Stochastic Contraction Process"

How did they figure this out? They invented a new way of looking at the problem, which they call a Stochastic Contraction Process.

The Analogy of the Stretchy Rubber Sheet:
Imagine your current guess is a point on a giant, stretchy rubber sheet. Every time you pick a random clue (equation), you pull the sheet in a specific direction to snap your point closer to the truth.

  • Sometimes you pull hard.
  • Sometimes you pull gently.
  • Sometimes the pull is in a direction that makes the point wobble a bit before settling.

The authors realized that instead of tracking the messy, random wobbles of the point, they could track the shape of the rubber sheet itself. They turned this chaotic, random process into a deterministic equation (a predictable math formula).

They found that the "wobbles" of the rubber sheet actually follow a hidden rhythm. Some parts of the sheet oscillate wildly (like a guitar string being plucked), while others move smoothly. By mathematically "unifying" these two behaviors, they could predict exactly how fast the point would settle down.

The "Discrete-to-Continuous" Magic Trick

The hardest part of their proof was bridging the gap between discrete steps (taking one step at a time, like counting 1, 2, 3) and continuous flow (like water flowing in a river).

Think of it like watching a movie.

  • Discrete: You see individual frames (1, 2, 3...).
  • Continuous: You see the smooth motion of the actor.

The authors developed a clever mathematical trick to turn their "frame-by-frame" analysis into a smooth "movie." They translated their problem into a Differential Equation (the math used to describe how things change smoothly over time, like the speed of a car). By solving this smooth equation, they could prove exactly how fast the "last step" converges.

Why Does This Matter?

  1. It's Faster: For problems like solving massive systems of equations (used in engineering, physics, and AI), this means we can stop the algorithm sooner and still get a great answer. We don't need to run it as long.
  2. It's More Realistic: In real-world machine learning, we often use the "last guess" (the final model) rather than an average. This paper proves that the "greedy" approach we use in practice is actually mathematically sound and very efficient.
  3. It Solves a Mystery: It answers a question that has puzzled researchers for years: "Is the last step of the Kaczmarz algorithm actually good?" The answer is a resounding yes, and it's better than anyone expected.

Summary

The paper takes a classic, slightly chaotic method for solving math problems (Randomized Kaczmarz) and proves that if you take big, bold steps, your final answer is incredibly accurate. They did this by inventing a new mathematical lens that turns random chaos into a predictable pattern, showing that the "last step" is not just a guess, but a highly optimized solution.

Drowning in papers in your field?

Get daily digests of the most novel papers matching your research keywords — with technical summaries, in your language.

Try Digest →