Learning Kalman Policy for Singular Unknown Covariances via Riemannian Regularization

Imagine you are trying to navigate a ship through a thick fog. You have a map (the system model) that tells you how the ship should move, but you can't see the water clearly. You have a noisy radar (the measurements) that gives you hints about where you are, but the radar is broken in specific ways—it sometimes gives you no signal at all (singular noise), and the static is unpredictable.

Your goal is to build a "navigator" (a Kalman filter) that combines your map and your broken radar to guess your true location as accurately as possible.

The Problem: The "Broken" Compass

In the old days, mathematicians had a perfect formula (the Kalman Filter) to build this navigator, but it required knowing exactly how "noisy" the radar and the water were.

In the real world, we often don't know these noise levels. Worse, sometimes the noise is "singular," meaning the radar might be completely blind in certain directions (like a camera with a dead pixel that never changes).

When researchers tried to teach computers to learn this navigator using data (without knowing the noise levels), they hit a wall. The mathematical landscape they were climbing looked like a flat, endless plain with no hills or valleys to guide them. Standard learning algorithms (like Gradient Descent) are like hikers who need a slope to walk down; if the ground is flat, they get stuck and wander aimlessly. This is especially true when the noise is "rank-deficient" (the broken radar), making the problem mathematically "ill-posed" or broken.

The Solution: Redrawing the Map with Geometry

The authors of this paper came up with a clever trick. Instead of trying to climb the flat, broken landscape, they redrew the map using a technique called Riemannian Regularization.

Think of it this way:

The Old Way (Euclidean): Imagine trying to walk in a straight line on a flat, foggy field. If you take a step, you have no idea if you're getting closer to the treasure or just walking in circles. If the ground is uneven (singular noise), you might fall into a hole.
The New Way (Riemannian): The authors realized that the "space" where the navigator lives has a hidden, curved geometry. They added a special "magnetic field" (the regularization) to the map. This field doesn't change the location of the treasure (the optimal solution), but it reshapes the terrain around it.

Suddenly, the flat plain becomes a smooth, curved bowl. Even if the radar is broken, this new geometry ensures that:

There are always slopes: No matter where you start, there is a clear path downhill toward the best solution.
The path is stable: You won't get stuck in a local trap or wander off into infinity.

How the Algorithm Works

The paper proposes a step-by-step learning process (Algorithm 1) that works like a smart, iterative training session:

Start with a "Soft" Constraint: They begin by adding a strong "magnetic pull" (a high regularization factor) that forces the navigator to stay in a safe, easy-to-learn area.
Learn from Data: The computer looks at a bunch of noisy radar readings (data) and tries to adjust the navigator to minimize prediction errors. Because of the "magnetic pull," the math works smoothly, and the computer learns quickly.
Gradually Relax: Once the navigator gets good, the authors slowly turn down the "magnetic pull." This is like a teacher slowly removing training wheels.
Reach the Goal: As the pull gets weaker, the navigator moves closer and closer to the true optimal solution, even though the original problem (the broken radar) was mathematically impossible to solve directly.

Why It's Better Than the Old Way

The paper compares their method to a standard "Euclidean" approach (the old way of adding a simple penalty for complexity).

The Old Way: Imagine trying to find a specific house in a city by only being told "don't go too far from the city center." If the house you want is actually far away, this rule forces you to stop halfway, and you never find the house.
The New Way: The Riemannian method is like a GPS that understands the shape of the city. It knows that even if the house is far away, there is a specific curved road that leads directly to it. It doesn't just penalize distance; it respects the geometry of the problem.

The Result

The authors proved mathematically that this method works and tested it on computers.

It converges fast: The learning process is guaranteed to reach the solution in a predictable amount of time.
It handles broken data: It works even when the noise is "singular" (the radar has blind spots).
It's robust: It doesn't get confused by the choice of learning speed (step size).

In a Nutshell

This paper solves a decades-old headache in engineering: How do you teach a computer to filter noise when you don't know what the noise looks like, and sometimes the noise is completely broken?

They did it by realizing that the problem isn't "flat" and broken; it just needed to be viewed through the right geometric lens. By adding a "curved" mathematical structure to the learning process, they turned an impossible, flat plain into a smooth, downhill slide that leads straight to the perfect solution.

1. Problem Statement

The paper addresses the challenge of learning the optimal steady-state Kalman gain for linear Gaussian systems when the process noise covariance ( $Q$ ) and measurement noise covariance ( $R$ ) are unknown and potentially singular (rank-deficient).

Context: Traditional Kalman filtering requires known $Q$ and $R$ . While adaptive filtering exists, many existing data-driven methods (e.g., Bayesian, Maximum Likelihood, Covariance Matching) suffer from high computational costs, practical bias, or a lack of non-asymptotic convergence guarantees.
The Gap: Existing policy optimization (PO) approaches for estimation, which treat the Kalman gain as a policy to be learned from data, rely on structural properties like coercivity and gradient dominance to guarantee convergence. These properties typically fail when $Q$ and $R$ are singular, rendering the optimization landscape ill-conditioned and causing standard first-order methods (like Stochastic Gradient Descent) to fail or converge poorly.
Objective: To learn the optimal steady-state Kalman gain $L^*$ directly from measurement data without knowledge of $Q$ and $R$ , even in the presence of singular noise covariances.

2. Methodology

The authors propose a framework that combines Control-Estimation Duality, Policy Optimization, and Riemannian Geometry.

A. Problem Formulation

System Model: Linear time-invariant (LTI) system $x(t+1) = Ax(t) + \xi(t)$ , $y(t) = Hx(t) + \omega(t)$ .
Surrogate Objective: Since the true state $x(t)$ is unobservable, the authors minimize the mean-squared prediction error of the output:
$J_{MSE}(L) = \lim_{T \to \infty} E[\|y(T) - \hat{y}_L(T)\|^2]$
where $\hat{y}_L(T)$ is the predicted output based on a constant gain $L$ .
Parameterization: The estimation policy is restricted to the Kalman filter structure with a constant gain $L$ . The set of stabilizing gains is denoted as $\mathcal{S} = \{L \mid \rho(A-LH) < 1\}$ .

B. Riemannian Regularization

The core innovation is the introduction of a Riemannian regularization to reshape the optimization landscape.

Geometric Structure: The authors define a Riemannian metric on the manifold of stabilizing gains $\mathcal{S}$ , inspired by the Lyapunov equation and the observability Gramian. The metric is defined as $\langle V, W \rangle_{Y_L} = \text{tr}[V W^\top Y_L]$ , where $Y_L$ is the solution to a specific Lyapunov equation involving the system matrices.
Regularized Cost: They introduce a regularized cost function $J_R(L, \gamma)$ :
$J_R(L, \gamma) = J_{MSE}(L) + \gamma \left\| \begin{bmatrix} I \\ L \end{bmatrix} \right\|_{Y_L}^2$
where $\gamma > 0$ is a regularization factor.
Effect: This regularization restores coercivity (the cost goes to infinity as $L$ approaches the boundary of stability or infinity) and the Polyak-Lojasiewicz (PL) property (gradient dominance), even when $Q$ and $R$ are singular. This ensures that the optimization problem is well-conditioned.

C. Algorithm: Riemannian-Regularized Kalman Policy Optimization

The authors propose a continuation method (Algorithm 1) to solve the problem:

Initialization: Start with a large regularization factor $\gamma_0$ .
Inner Loop: Perform gradient descent on the regularized cost $J_R(L, \gamma_k)$ using a data-driven gradient oracle.
Gradient Oracle: Since $Q$ and $R$ are unknown, the gradient is estimated stochastically from measurement sequences. The oracle computes an unbiased estimate of the gradient using the prediction error and the system dynamics.
Outer Loop (Continuation): Gradually decrease $\gamma$ (geometric decay) to approach the unregularized solution. The algorithm terminates when $\gamma$ reaches a small threshold $\gamma_{min}$ .

3. Key Contributions

Riemannian Regularization: The paper introduces a novel geometric regularization that recovers coercivity and gradient dominance for estimation problems with singular noise covariances, a regime where Euclidean regularization fails.
Data-Driven Gradient Oracle: A computationally efficient, unbiased stochastic gradient estimator is constructed that relies solely on measurement data, eliminating the need for explicit knowledge of noise covariances.
Non-Asymptotic Convergence Guarantees: The authors provide rigorous theoretical proofs showing that the proposed algorithm converges linearly to the optimal solution. They quantify the bias and variance of the gradient estimates and establish sample complexity bounds (dependence on batch size $M$ and trajectory length $T$ ).
Scalability: The method is designed for high-dimensional settings and utilizes first-order methods, making it scalable compared to second-order or Bayesian approaches.

4. Results

Numerical Simulations:
- The algorithm was tested on a linear time-invariant system with $n=4$ states and $m=3$ outputs, where $Q$ , $R$ , and $H^\top H$ were deliberately chosen to be singular.
- Convergence: The results demonstrated an initial phase of linear convergence, consistent with theoretical guarantees, followed by sublinear behavior as the solution approached the optimum due to stochastic noise.
- Comparison: The proposed Riemannian regularization was compared against standard Euclidean $\ell_2$ $ℓ_{2}$ -regularization.
  - Euclidean Failure: In cases where the optimal gain $L^*$ was far from the origin (large norm), Euclidean regularization failed to converge efficiently because the penalty $\|L\|_F^2$ pushed the solution toward zero, away from the true optimum.
  - Riemannian Success: The Riemannian approach converged directly to the optimal gain regardless of its magnitude, demonstrating superior robustness and alignment with the problem's intrinsic geometry.
Robustness: The method showed robustness to stepsize selection and performed well even with rank-deficient noise structures.

5. Significance

Theoretical Breakthrough: This work bridges a critical gap in data-driven control and estimation theory by extending policy optimization to ill-posed (singular) estimation problems. It proves that geometric regularization can tame the non-convexity and ill-conditioning inherent in these settings.
Practical Impact: Many real-world systems (e.g., aerospace, robotics) involve unmodeled dynamics or structured disturbances that result in singular noise covariances. This method provides a reliable, data-driven tool for designing optimal filters in such scenarios without requiring precise system identification of noise statistics.
Future Directions: The framework opens avenues for handling model uncertainty, time-varying dynamics, and more general stochastic settings by leveraging the geometric structure of the policy space.

In summary, the paper successfully reformulates the learning of Kalman gains as a geometric policy optimization problem, using Riemannian regularization to ensure convergence and stability in the challenging regime of unknown and singular noise covariances.