Riemannian Gradient Method with Momentum

Imagine you are a hiker trying to find the lowest point in a vast, foggy, and strangely shaped valley. This isn't a flat field; the ground is curved, bumpy, and twists in ways that defy straight lines. In the world of math and computer science, this "curved valley" is called a Riemannian Manifold, and the goal is to find the absolute bottom (the minimum) of a function.

This paper introduces a new, smarter way for hikers (algorithms) to navigate this terrain. Here is the breakdown in simple terms:

1. The Problem: Getting Stuck in the Fog

Standard hiking methods (optimization algorithms) usually take a step downhill, look around, and take another step.

The Issue: If you just look at the slope right under your feet, you might zigzag wildly, taking tiny steps back and forth. It's like trying to walk down a steep hill by taking one step left, then one step right, then left again. It works, but it's incredibly slow.
The Goal: We want to get to the bottom faster without getting lost or stuck in a small dip that isn't the true bottom.

2. The Solution: The "Momentum" Hiker

The authors propose a method called Riemannian Gradient Method with Momentum (RGMM).

Think of Momentum like a skateboarder or a skier.

Without Momentum: You stop completely at every step to check the ground before moving again.
With Momentum: You carry your speed from the previous step. If you were sliding down a slope, you keep that forward energy. If you hit a bump, your momentum helps you glide over it rather than stopping dead.

In this new algorithm, the computer doesn't just look at the current slope; it remembers where it came from and uses that "inertia" to take a more direct path toward the bottom.

3. The Tricky Part: The Curved World

The hard part is that this "valley" is curved. In a flat world (Euclidean space), you can just subtract your old position from your new one to see how you moved. But on a curved surface (like the surface of the Earth), you can't just subtract coordinates; the "straight lines" don't exist the same way.

The Analogy: Imagine trying to draw a straight line between New York and London on a flat map versus a globe. On the flat map, it's a straight line. On the globe, it's a curve.
The Fix: The authors invented a clever way to "transport" the memory of the previous step onto the current curved surface. They use a mathematical tool called Vector Transport (think of it as a magical teleportation device that moves your "momentum vector" from one spot on the curve to the next without breaking it).

4. The Safety Net: The "Restart" Button

Even with momentum, things can go wrong. Sometimes the math gets messy, and the "momentum" might push you the wrong way or make you spin in circles.

The authors added a Safeguarding Rule (a restart strategy):

Imagine your hiker has a compass. If the compass starts spinning wildly or points in a direction that doesn't make sense, the hiker stops, ignores the momentum, and simply takes a direct step straight downhill (the negative gradient).
Once the hiker is back on a stable path, they turn the momentum back on. This ensures the algorithm never gets stuck or fails, even in the worst-case scenarios.

5. The Results: Faster and Smarter

The authors tested this new method against the best existing "hiking guides" (solvers) available in a popular toolbox called Manopt.

The Race: They ran 75 different complex problems (like finding the best shape for a satellite dish or organizing data).
The Winner: The new momentum method (RGMM) was often the fastest. It reached the bottom of the valley in fewer steps and less computer time than the competition.
Reliability: It rarely failed. Even when the terrain was extremely tricky, the "safety net" kicked in, and it kept moving forward.

Summary

In short, this paper teaches computers how to run down a curved, bumpy hill much faster by:

Carrying momentum (using past steps to speed up).
Respecting the curve (using special math to handle the weird geometry).
Having a safety net (knowing when to stop and take a simple step if things get weird).

It's like upgrading from a hiker who stops to check a map every 5 feet to a skier who knows how to use their speed to glide smoothly down the mountain, only braking when absolutely necessary.

Here is a detailed technical summary of the paper "Riemannian Gradient Method with Momentum" by Filippo Leggio and Diego Scuppa.

1. Problem Statement

The paper addresses the problem of minimizing a smooth, non-convex function $f$ defined on a Riemannian submanifold $\mathcal{M}$ embedded in a finite-dimensional Euclidean space $\mathcal{E}$ :
$\min_{x \in \mathcal{M}} f(x)$
This class of problems arises in various applications, including machine learning, radar communication, low-rank matrix completion, and shape analysis. While standard unconstrained optimization methods (like Conjugate Gradient, Trust-Region, and L-BFGS) have been adapted to Riemannian manifolds, the authors specifically target the development of a first-order gradient method with momentum that offers rigorous global convergence guarantees and competitive computational complexity.

2. Methodology

The proposed algorithm, Riemannian Gradient Method with Momentum (RGMM), extends a recent Euclidean momentum method (Lapucci et al., [18]) to the Riemannian setting. The core methodology involves the following components:

A. Search Direction Construction

At each iteration $k$ , the search direction $d_k \in T_{x_k}\mathcal{M}$ (the tangent space at $x_k$ ) is constructed as a linear combination of the Riemannian gradient $g_k$ and a momentum term $s_k$ :
$d_k = -\alpha_k g_k + \beta_k s_k$

Riemannian Gradient: $g_k = \text{grad } f(x_k) = \text{proj}_{x_k}(\nabla f(x_k))$ .
Momentum Term: Unlike the Euclidean case where $s_k = x_k - x_{k-1}$ , the Riemannian manifold does not support direct vector subtraction. Instead, the previous search direction is transported to the current tangent space using a vector transport (specifically, orthogonal projection):
$s_k = \text{proj}_{x_k}(\eta_{k-1} d_{k-1})$
Coefficients ( $\alpha_k, \beta_k$ ): These are determined by minimizing a local quadratic model of the function. The problem is reformulated as a 2-dimensional unconstrained quadratic minimization:
$\min_{u \in \mathbb{R}^2} T_k^\top u + \frac{1}{2} u^\top H_k u$
where $u = [\alpha, \beta]^\top$ .

B. Operator Approximation ( $B_k$ )

To define the quadratic model, a symmetric positive-definite operator $B_k$ (approximating the Riemannian Hessian) is required.

Challenge: Explicitly computing the Riemannian Hessian or using finite differences requires expensive retractions and gradient evaluations.
Solution: The authors adopt a memoryless BFGS update strategy adapted for manifolds. They define a vector $y_k = g_k - \text{proj}_{x_k}(g_{k-1})$ and enforce the secant equation $B_k[s_k] = y_k$ .
Implementation: The operator $B_k$ is defined implicitly via a scaled identity and rank-one updates, requiring only inner products and vector operations in the tangent space. This avoids extra function or gradient evaluations.

C. Safeguarding and Restart Strategy

To ensure global convergence, the search direction must be gradient-related (i.e., sufficiently aligned with the negative gradient and bounded in norm).

Restart Mechanism: If the computed direction $d_k$ fails to satisfy the gradient-related conditions (specifically $\langle g_k, d_k \rangle \leq -c_1 \|g_k\|^2$ ), the algorithm restarts by setting $d_k = -\lambda_k g_k$ , where $\lambda_k$ is a Barzilai-Borwein step size.
Curvature Check: If the curvature condition $\langle s_k, y_k \rangle \leq 0$ is violated (indicating the BFGS update might not be positive definite), the method defaults to a scaled gradient step.

D. Line Search

The algorithm employs a standard Armijo line search with a monotone strategy to determine the step size $\eta_k$ , ensuring sufficient decrease in the objective function.

3. Key Contributions

Novel Algorithm: The development of RGMM, a first-order momentum method specifically designed for Riemannian manifolds, bridging the gap between recent Euclidean momentum methods and Riemannian optimization.
Theoretical Guarantees:
- Global Convergence: The algorithm is proven to converge to a stationary point ( $\text{grad } f(x^*) = 0$ ).
- Complexity Bound: Under standard assumptions (boundedness of $f$ , Lipschitz-type conditions on the retraction), the method finds an $\epsilon$ -stationary point with a worst-case iteration complexity of $O(\epsilon^{-2})$ . This matches the best-known rates for Riemannian gradient and trust-region methods.
Efficient Implementation: The method avoids explicit Hessian computations and extra function evaluations by using a memoryless BFGS approximation adapted via vector transport.
Robustness: The inclusion of a restart strategy ensures that the algorithm remains robust even when the momentum direction becomes unstable or non-descent.

4. Experimental Results

The authors implemented RGMM in MATLAB and compared it against state-of-the-art solvers from the Manopt package:

Comparators: Riemannian Barzilai-Borwein (RBB), Riemannian Conjugate Gradient (RCG), Riemannian Trust-Region (RTR), and Riemannian L-BFGS (RLBFGS).
Benchmarks: 15 problem types (75 instances total) covering manifolds such as Stiefel, Grassmann, SPD matrices, and products of spheres.
Performance Metrics: CPU time, number of iterations, function evaluations, and success rates.

Key Findings:

Speed: RGMM was the fastest solver (best CPU time) in 33.4% of instances and achieved the highest performance profile for $\tau \in [1, 8]$ , indicating superior robustness across a wide range of tolerances.
Efficiency: It required the fewest iterations in 52.0% of cases and the fewest function evaluations in 49.3% of cases.
Reliability: The failure rate was negligible (comparable to RBB and RLBFGS), with RTR being the only solver to solve 100% of instances (though RTR is generally more computationally expensive per iteration).
Stability: The curvature condition $\langle s_k, y_k \rangle > 0$ was violated in less than 0.5% of iterations, confirming that the safeguards act primarily as theoretical protections rather than frequent operational switches.

5. Significance

This work provides a meaningful extension of momentum-based optimization to the Riemannian setting. Its significance lies in:

Practicality: It offers a "plug-and-play" alternative to complex second-order methods (like Trust-Region) with lower computational overhead per iteration while maintaining competitive convergence rates.
Theoretical Rigor: It establishes that momentum methods, often viewed as heuristic accelerators, can be rigorously analyzed and proven to have $O(\epsilon^{-2})$ complexity on manifolds under mild assumptions.
Broad Applicability: The method is effective across diverse manifold structures (Stiefel, Grassmann, SPD), making it a strong candidate for modern machine learning and signal processing applications involving manifold constraints.

In conclusion, RGMM represents a robust, efficient, and theoretically sound advancement in Riemannian optimization, successfully combining the acceleration benefits of momentum with the stability of gradient-related search directions.

Riemannian Gradient Method with Momentum

1. The Problem: Getting Stuck in the Fog

2. The Solution: The "Momentum" Hiker

3. The Tricky Part: The Curved World

4. The Safety Net: The "Restart" Button

5. The Results: Faster and Smarter

Summary

1. Problem Statement

2. Methodology

A. Search Direction Construction

B. Operator Approximation (BkB_kBk​)

C. Safeguarding and Restart Strategy

D. Line Search

3. Key Contributions

4. Experimental Results

5. Significance

More like this

Hybrid Approximate Message Passing

Zero-Noise Limit for High-Dimensional ODE with Measurable Drift

The spanning method and the Lehmer totient problem

P-adic L-functions for GL(3)

On quotients of bounded homogeneous domains by unipotent discrete groups

B. Operator Approximation ( $B_k$ )