Original authors: Hantao Nie, Bin Gao, Andi Han, Pratik Jawanpuria, Bamdev Mishra, Zaiwen Wen

Published 2026-05-15

📖 4 min read🧠 Deep dive

Original authors: Hantao Nie, Bin Gao, Andi Han, Pratik Jawanpuria, Bamdev Mishra, Zaiwen Wen

Original paper licensed under CC BY 4.0 (http://creativecommons.org/licenses/by/4.0/). ✨ This is an AI-generated explanation of the paper below. It is not written or endorsed by the authors. For technical accuracy, refer to the original paper. Read full disclaimer

Imagine you are trying to navigate a complex, curved landscape, like the surface of the Earth or a twisted mountain range. In mathematics and machine learning, this landscape is called a manifold. To make decisions on this landscape—like finding the lowest point (optimization) or understanding the shape of the terrain (analysis)—you need to look at the "flat" ground right beneath your feet. This flat ground is called the tangent space.

The problem is that in high-dimensional data (like medical images or complex signals), this flat ground is huge. Calculating the exact rules for moving around on it is like trying to read every single page of a library to find one specific sentence. It takes too much time and memory.

This paper introduces a clever shortcut called the Riemannian Nyström Approximation. Here is how it works, using simple analogies:

1. The Problem: The "Full Library" vs. The "Summary"

Imagine you have a massive, complex map of a city (the operator on the tangent space). To plan the perfect route, you usually need to study the entire map in high definition. But the map is so big that your computer crashes trying to hold it all in memory.

The authors say: "We don't need the whole map. We just need a good summary that keeps the most important features."

2. The Solution: The "Sampling Sketch"

The paper proposes a method to create this summary by looking at only a small, random sample of the map.

The Old Way: In flat, simple math (Euclidean space), you might just pick random coordinates (like picking random street addresses) to guess the layout.
The New Way (This Paper): Since we are on a curved surface, you can't just pick "coordinates" because the surface doesn't have a fixed grid. Instead, the authors invented a "Haar–Grassmann Sketching" method.
- Analogy: Imagine you are blindfolded on a curved hill. Instead of guessing where North is based on a fixed compass (which doesn't exist here), you spin around randomly and pick a direction. The math ensures that no matter how you spin, your random choice is statistically fair and represents the whole hill perfectly. This is "coordinate-free," meaning it doesn't rely on a specific map grid.

3. The Magic Trick: "Transporting" the Sketch

When you take a step forward on a curved surface, the ground beneath your feet changes direction. Usually, you'd have to throw away your old summary and build a brand new one from scratch for the new spot. That is slow.

The authors show that you can "transport" your old summary to the new spot.

Analogy: Imagine you have a sketch of a room drawn on a piece of flexible rubber. If you move the rubber to a new room that looks similar, you can stretch and slide the rubber to fit the new room without redrawing everything. The paper proves that if you move your "random sample" correctly (using something called isometric vector transport), the statistical rules still hold true. This saves a massive amount of computing power.

4. The Result: Faster Optimization

The authors used this shortcut to build a Newton-type method.

The Goal: Find the bottom of a valley (the best solution) as fast as possible.
The Method: Instead of calculating the exact steepness of the whole valley (which is slow), they calculate the steepness of just the random sample they picked.
The Outcome: They proved mathematically that this "sampled" path is almost as good as the "exact" path, but it is much faster.

5. Real-World Tests

The team tested this on two specific types of curved landscapes:

SPD Manifolds: These are used to analyze data like medical images (e.g., MRI scans) where the data points are shapes that must stay "positive" and "symmetric."
Grassmann Manifolds: These are used for things like finding the main directions in a dataset (Principal Geodesic Analysis), similar to how you might find the main trends in a pile of documents.

The Findings:

Memory: They used only 4% to 10% of the memory required by the traditional, exact method.
Accuracy: Despite using so little memory, the results were nearly identical to the expensive method. The "summary" was accurate enough to solve the problem correctly.
Speed: The calculations were significantly faster, especially when the data was huge.

Summary

In short, this paper teaches computers how to navigate complex, curved data landscapes by taking smart, random "snaps" of the terrain instead of trying to map the whole thing. It proves that these snaps are statistically reliable, can be carried over to new locations without redrawing, and allow computers to solve difficult problems much faster and with less memory, without losing accuracy.

Technical Summary: Riemannian Nyström Approximation on Manifolds

1. Problem Statement

Many large-scale problems in machine learning and signal processing are naturally formulated with manifold constraints. While the manifold structure allows for faithful modeling of complex geometries, the associated linear algebra operations on tangent spaces often become computational bottlenecks, particularly in high-dimensional scenarios.

Specifically, iterative methods on Riemannian manifolds frequently require constructing a tangent-space operator $H_x: T_xM \to T_xM$ (typically self-adjoint and positive semidefinite, such as the Riemannian Hessian) and computing its inverse or pseudoinverse. Examples include solving linear systems for Newton-type optimization and performing Principal Geodesic Analysis (PGA) on covariance tensors. Explicitly forming and inverting these operators is often prohibitive or intractable. Existing techniques, such as spectral truncation, multigrid solvers, and Riemannian quasi-Newton methods, attempt to mitigate this but raise the question of whether one can construct an efficient approximation with provable error bounds that preserves the intrinsic geometric properties of the operator.

2. Methodology

2.1 Riemannian Nyström Approximation

The authors propose a coordinate-free Riemannian Nyström approximation for self-adjoint positive semidefinite (PSD) operators on a $d$ -dimensional Riemannian manifold $(M, g)$ .

Given a point $x \in M$ and an operator $H_x$ , the approximation $\hat{H}_{x, B, \Xi}$ is constructed using two $\ell$ -dimensional subspaces $B, \Xi \subset T_xM$ (where $\ell \leq d$ ) and a full-rank linear map $F: B \to \Xi$ .

Sketching Operator: A sketching operator $P_{x, B, \Xi}: T_xM \to T_xM$ is defined as $P_{x, B, \Xi}[v] = F \Pi_B [v]$ , where $\Pi_B$ is the orthogonal projection onto $B$ . Its adjoint is $P^*_{x, B, \Xi}[u] = F^* \Pi_\Xi [u]$ .
Approximation Formula: The Riemannian Nyström approximation is defined as:
$\hat{H}_{x, B, \Xi}[u] = \left( H_x P_{x, B, \Xi} (P^*_{x, B, \Xi} H_x P_{x, B, \Xi})^\dagger P^*_{x, B, \Xi} H_x \right)[u]$
where $(\cdot)^\dagger$ denotes the Moore–Penrose pseudoinverse. This formulation effectively compresses the linear system into the low-dimensional subspace $B$ , solves it there, and lifts the result back to the tangent space.

2.2 Sketching Conditions

To enable randomized error analysis, the paper introduces two sketching conditions:

Gaussian Sketching: Analogous to the Euclidean case, where the map $F$ induces a Gaussian distribution on the inner products.
Haar–Grassmann Sketching: A novel, intrinsic condition defined directly in terms of geometry without relying on a specific coordinate system. It requires:
1. The subspace $\Xi$ to be Haar-uniform on the Grassmann manifold $\text{Gr}(\ell, T_xM)$ .
2. The isometric component of the polar decomposition of $F$ to be Haar-uniform on the set of linear isometries between $B$ and $\Xi$ .
3. The radial factor of $F$ to satisfy specific moment bounds.

This condition is proven to be transport-compatible: if a sketching operator satisfies the Haar–Grassmann condition at $x$ , its transport to a nearby point $x'$ via an isometric vector transport also satisfies the condition. This allows for "lazy refresh" strategies in iterative algorithms where the sketch is transported rather than regenerated at every step.

2.3 Optimization Algorithm

The authors propose a Randomized Riemannian Nyström Cubic Newton (RRNCN) method.

The Riemannian Hessian $H_x$ is replaced by its Nyström approximation $\hat{H}_{x, B, \Xi}$ .
The search direction is computed by solving a reduced linear system in the subspace $B$ .
To ensure global convergence, the method incorporates cubic regularization, solving a subproblem of the form:
$\min_{v \in B} \left( \langle h, v \rangle_x + \frac{1}{2}\langle J[v], v \rangle_x + \frac{\sigma}{6}\|v\|_x^3 \right)$
where $h$ and $J$ are the sketched gradient and Hessian components.

3. Key Contributions

Intrinsic Construction: The development of a coordinate-free Riemannian Nyström approximation that preserves fundamental operator properties, including positive semidefiniteness, self-adjointness, and Loewner order monotonicity.
Haar–Grassmann Sketching: The introduction of a geometric sketching condition that generalizes Gaussian sketching to manifolds. This condition is intrinsic, coordinate-free, and compatible with isometric vector transport, addressing the lack of canonical coordinates on manifolds.
Approximation Error Bounds: The establishment of spectral approximation error bounds in the operator norm. Under the Haar–Grassmann condition, the expected error is bounded by a function of the eigenvalues of $H_x$ and the sketch size $\ell$ . Specifically, the error depends on the tail of the spectrum and the "stable rank" of the operator.
Transport Compatibility: A proof that the Haar–Grassmann condition is preserved under isometric vector transport, enabling efficient reuse of sketching structures across iterations in optimization algorithms.
Optimization Framework: The proposal of a randomized Newton-type method utilizing the Riemannian Nyström approximation, accompanied by global complexity analysis ( $O(d/(\ell \epsilon))$ ) and local linear convergence rates under strong geodesic convexity.

4. Results

The paper validates the proposed methodology through numerical experiments on Symmetric Positive Definite (SPD) and Grassmann manifolds.

Principal Geodesic Analysis (PGA): Experiments on the HDM05 dataset (SPD matrices of size $93 \times 93$ $93 \times 93$ ) demonstrate that the Nyström approximation preserves downstream statistical performance.
- Accuracy: Multiclass classification accuracy (using logistic regression, SVM, and MLP) and Hotelling's $T^2$ statistics remained comparable to exact PGA across sketch sizes $\ell \in \{20, 40, 80\}$ .
- Efficiency: The method reduced memory usage significantly, requiring only 4.30%–9.64% of the memory increase associated with the exact operator, while maintaining competitive statistical quality.
Optimization on SPD Manifolds: Experiments on a geodesically convex optimization problem for covariance estimation showed that intermediate sketch sizes (e.g., $\ell=80$ ) offered the best trade-off between approximation quality and per-iteration computational cost, achieving faster convergence in wall-clock time compared to the exact cubic Newton method.
Transported Sketching on Grassmann Manifolds: Experiments on the Grassmann manifold ( $n=20000, p=20$ ) demonstrated that using transported sketching (refreshing every 2–3 iterations) achieved nearly identical convergence in terms of iterations to the fully refreshed Nyström method but provided significant runtime speed-ups by avoiding the cost of regenerating the sketch at every step.

5. Significance and Claims

The paper claims to extend classical Euclidean Nyström theory to the setting of tangent-space operators on Riemannian manifolds. Its primary significance lies in providing a low-rank approximation technique that is intrinsically constructed, preserving the geometric structure (PSD property, self-adjointness) and offering provable approximation errors under a geometrically natural sketching condition (Haar–Grassmann).

The authors emphasize that this approach allows for the efficient computation of tangent-space operators and their inverses without forming the full operator, making high-dimensional manifold optimization and analysis (such as PGA) computationally feasible. The transport compatibility of the sketching condition is highlighted as a key enabler for practical iterative algorithms, reducing computational overhead while maintaining theoretical guarantees. The work bridges the gap between randomized linear algebra and Riemannian optimization, offering a scalable alternative to exact second-order methods.

Nyström Approximation on Manifolds