Stochastic and incremental subgradient methods for convex optimization on Hadamard spaces

This paper introduces a novel subgradient definition based on Busemann functions for convex optimization on Hadamard spaces, enabling the generalization of stochastic and incremental subgradient methods with guaranteed complexity bounds to nonpositively curved metric spaces such as BHV tree space.

Ariel Goodwin, Adrian S. Lewis, Genaro López-Acedo, Adriana Nicolae

Published Wed, 11 Ma
📖 5 min read🧠 Deep dive

Imagine you are trying to find the perfect meeting spot for a group of friends who are scattered across a strange, warped landscape. In our normal, flat world (like a city grid), finding the "middle" point is easy: you just average the coordinates. But what if the world isn't flat? What if it's a Hadamard space—a mathematical landscape that curves away from you like the surface of a saddle or a hyperbolic plane, or even a tree-like structure where paths branch out and never loop back?

This paper tackles the problem of optimization (finding the best spot) in these weird, curved worlds. Specifically, it introduces a new, simpler way to take steps toward the solution without getting lost in complex math.

Here is the breakdown using everyday analogies:

1. The Problem: The "Flat World" Tools Don't Work

In our normal, flat world (Euclidean space), if you want to find the best spot, you use a tool called a subgradient. Think of a subgradient as a "slope sign" on a hill. It tells you: "If you want to go downhill, walk in this direction."

But in these curved, tree-like worlds, there are no straight lines and no flat "slopes" in the traditional sense. The old tools rely on linear math (like vectors and angles) that simply don't exist here. Trying to use the old tools is like trying to navigate a forest using a compass that only works on a flat map; it just doesn't fit the terrain.

2. The New Tool: The "Busemann Subgradient"

The authors invent a new kind of "slope sign" called a Busemann subgradient.

  • The Old Way (The Horizon): Imagine standing on a hill. In the old method, you had to look at the horizon and imagine a flat plane touching the hill. This is hard to do in a curved world.
  • The New Way (The Infinite Ray): Instead, imagine a ray of light shooting out from your feet toward infinity. In this new method, the "slope" isn't a flat plane; it's a direction and a speed.
    • Direction: Which way should you walk? (The ray).
    • Speed: How fast should you walk? (The "speed" factor).

This new tool is "primal," meaning it works directly on the ground you are standing on, rather than trying to project the problem onto a flat, imaginary map. It's like having a GPS that tells you, "Walk North at 3 miles per hour," rather than trying to calculate the angle of the slope relative to a flat horizon.

3. The Strategy: The "Stochastic" and "Incremental" Walks

The paper proposes two ways to use this new tool to find the best spot (the median or mean of a group of points):

  • The Stochastic Method (The Random Picker): Imagine you have a list of friends (data points) and you want to find the spot that minimizes the total walking distance to all of them.

    • Old way: You calculate the distance to everyone at once, then take a step. This is slow and computationally heavy.
    • New way: You close your eyes, pick one friend at random, ask them, "Which way is downhill for me?" and take a small step in that direction. Then you pick another random friend and repeat.
    • Why it works: Even though you are only listening to one person at a time, over thousands of steps, you naturally drift toward the perfect center. It's like finding your way through a dark room by tapping a few random walls; eventually, you find the exit.
  • The Incremental Method (The Round-Robin): Instead of picking randomly, you go through your list of friends one by one in a circle. You listen to Friend A, take a step. Then Friend B, take a step. Then Friend C.

    • This is often faster because you systematically cover all the information without the "noise" of randomness.

4. The Real-World Application: The "Tree Space"

The authors test this on BHV Tree Space. This is a fancy way of describing a map of evolutionary trees (phylogenetic trees).

  • Imagine you have 100 different family trees for a group of species.
  • You want to find the "average" family tree that represents the group best.
  • In this space, "distance" means how much you have to rearrange branches to turn one tree into another.
  • The authors used their new algorithm to find the "median tree" (the one that is, on average, closest to all the others). They showed that their method works just as well as the old, heavy-duty methods but is much simpler to implement and understand.

5. Why This Matters

  • Simplicity: The new method doesn't require complex "dual" math (which is like trying to solve a puzzle by looking at its shadow). It works directly with the geometry of the space.
  • Speed: It guarantees that you will find a good enough answer within a predictable number of steps (complexity bounds), just like the best methods in flat worlds.
  • Versatility: It works not just for trees, but for any curved space, including hyperbolic geometry (used in AI and network analysis) and spaces of positive-definite matrices (used in medical imaging).

The Takeaway

The authors essentially said: "Stop trying to force flat-world math onto curved landscapes. Instead, give the walker a compass (a ray) and a pace (a speed), and let them take small, smart steps one friend at a time. It's simpler, faster, and works everywhere."

They proved that this "Busemann subgradient" approach is the key to unlocking efficient optimization in the strange, beautiful, and curved geometries of the modern mathematical world.