Accelerating Sampling-Based Control via Learned Linear Koopman Dynamics

Imagine you are trying to teach a robot dog to walk across a room without bumping into anything. To do this, the robot's brain needs to constantly ask itself: "If I move my leg this way, where will I be in 0.1 seconds? What if I move it that way? What if I jump?"

This process is called sampling. The robot tries out thousands of "what-if" scenarios in its head every second to decide the best move.

The Problem: The "Slow Calculator"

The paper describes a common problem with this approach. Real robots have complex, wobbly, non-linear physics. Think of a robot dog like a gymnast on a trampoline. Predicting exactly where it will land after a flip is incredibly hard math.

The traditional method (called MPPI) is like a student trying to solve a math problem by doing the full, complicated calculation from scratch for every single "what-if" scenario.

The Good News: It's very accurate.
The Bad News: It's painfully slow. The robot's brain gets overwhelmed, and it can't react fast enough to real-world surprises. It's like trying to drive a race car while doing long division on a calculator.

The Solution: The "Cheat Sheet" (Koopman Dynamics)

The authors, Wenjian Hao and his team, came up with a clever trick. They realized that while the robot's movement looks chaotic and complex, it actually follows hidden, simpler patterns if you look at it from a different angle.

They used a mathematical concept called Koopman Operator Theory.

The Analogy: Imagine you are watching a messy pile of tangled headphones. From the outside, it looks impossible to untangle. But if you could magically lift the headphones into a "higher dimension" (like a 3D hologram), you might see that the tangles are actually just simple loops that can be straightened out with a single, easy pull.
The "Deep Koopman" (DKO): The team trained a neural network (a type of AI) to learn this "magic angle." The AI learned how to translate the messy, real-world movements into a linear (straight-line) math problem.

How It Works: MPPI-DK

They combined this "magic angle" with the robot's decision-making process to create MPPI-DK.

Learning Phase: First, they let the robot move around and collect data. The AI learns the "cheat sheet" (the linear map) that turns complex moves into simple math.
Control Phase: When the robot needs to move, instead of doing the hard, slow calculations for every "what-if," it uses the cheat sheet.
- Old Way: "If I push the leg, the physics say... [2 hours of calculation]... I will be here."
- New Way (MPPI-DK): "If I push the leg, the cheat sheet says... [instant multiplication]... I will be here."

The Results: Speed vs. Accuracy

The team tested this on three things:

A Balancing Stick: Like a child trying to balance a broom on their hand.
A Boat: Steering a boat through water currents.
A Real Robot Dog: A Unitree Go1 walking on a lab floor.

The findings were impressive:

Speed: The new method was much faster. On a computer chip (GPU), it was like switching from a bicycle to a Ferrari. It could run the "what-if" simulations so quickly that the robot could react in real-time.
Accuracy: Even though they used a "simplified" math model, the robot still moved just as well as the one using the super-slow, perfect math. It was like using a GPS shortcut that saves time but still gets you to the exact same destination.

The Big Picture

Think of this paper as teaching a robot to stop overthinking.

Instead of trying to calculate the physics of the entire universe for every tiny step, the robot learns a simplified "rule of thumb" (the linear model) that is good enough to get the job done, but fast enough to let it run, jump, and dance in real-time.

This is a huge step forward for making robots that can move quickly and safely in our messy, unpredictable world without needing a supercomputer strapped to their backs.

Here is a detailed technical summary of the paper "Accelerating Sampling-Based Control via Learned Linear Koopman Dynamics" by Hao et al.

1. Problem Statement

The paper addresses the computational bottleneck inherent in Model Predictive Path Integral (MPPI) control when applied to complex, high-dimensional nonlinear robotic systems.

The Challenge: While MPPI is a powerful stochastic optimal control method capable of handling nonconvex costs and constraints via Monte Carlo sampling, its performance is limited by the need to repeatedly propagate nonlinear system dynamics during trajectory rollouts.
The Bottleneck: For systems with expensive dynamics models (e.g., deep neural networks or complex physics simulators), the repeated forward simulation required for thousands of trajectory samples creates a significant computational overhead. This limits the achievable control frequency and hinders real-time deployment on resource-constrained hardware.
The Goal: Develop a framework that accelerates MPPI sampling by replacing nonlinear dynamics propagation with a computationally efficient linear approximation, without sacrificing control performance.

2. Methodology: MPPI-DK

The authors propose MPPI-DK (MPPI with Deep Koopman dynamics), a framework that integrates Deep Koopman Operator (DKO) theory with the MPPI algorithm.

A. Deep Koopman Dynamics (DKO)

Instead of modeling the system $x(t+1) = f(x(t), u(t))$ directly, the method learns a linear representation in a "lifted" feature space.

Lifting: A nonlinear mapping $g(x, \theta)$ (parameterized by a Deep Neural Network) lifts the original state $x \in \mathbb{R}^n$ to a higher-dimensional space $z \in \mathbb{R}^r$ ( $r \geq n$ ).
Linear Evolution: In this lifted space, the dynamics are approximated as linear:
$g(x(t+1), \theta^*) = A^* g(x(t), \theta^*) + B^* u(t)$
$x(t+1) = C^* g(x(t+1), \theta^*)$
where $A^*, B^*, C^*$ are constant matrices.
Learning: The parameters ( $A^*, B^*, C^*, \theta$ ) are learned from interaction data (state-input-next-state tuples) by minimizing a reconstruction loss that enforces both the linear evolution in the lifted space and the accurate reconstruction of the original state.

B. The MPPI-DK Algorithm

The core innovation lies in how MPPI utilizes this learned model:

Trajectory Propagation: During the Monte Carlo sampling phase, instead of evaluating the complex nonlinear function $f$ or the full DNN $g$ at every step, the algorithm propagates the lifted state $g$ using simple matrix multiplications ( $A^*$ and $B^*$ ).
State Reconstruction: The physical state $x$ is recovered only when necessary (e.g., for cost calculation) via the linear map $C^*$ .
Efficiency Gain: This substitution drastically reduces the computational cost per rollout, especially when the lifting function $g$ is complex, as matrix multiplication is significantly faster than forward-passing a deep neural network.
Parallelization: The linear structure naturally lends itself to parallel computation, allowing for massive speedups when deployed on GPU hardware.

3. Key Contributions

Koopman-Accelerated MPPI Formulation: The paper introduces a novel controller that replaces repeated nonlinear dynamics evaluations with efficient linear propagation in a learned Koopman lifted space.
Efficient Sampling via Lifted-State Propagation: The method demonstrates that propagating lifted states using learned linear operators ( $A^*, B^*$ ) is computationally superior to re-evaluating deep neural networks during sampling, particularly for high-dimensional or complex nonlinear systems.
Comprehensive Evaluation: The framework is validated across three distinct domains:
- Simulation (Inverted Pendulum): Analyzing the impact of network architecture and training data.
- Simulation (Surface Vehicle): Comparing computational efficiency and tracking error against true dynamics MPPI and MPC.
- Hardware (Quadruped Robot): Real-world reference tracking on a Unitree Go1 robot.
GPU Acceleration: The authors demonstrate that the MPPI-DK framework leverages GPU parallelism to achieve significant speedups, making real-time control feasible.

4. Experimental Results

A. Inverted Pendulum Balancing

Findings: Increasing the number of neurons in the lifting network improved convergence speed and control aggressiveness.
Observation: Interestingly, increasing the lifting dimension or adding expert demonstrations did not consistently improve performance, suggesting that the network capacity (neuron count) is a more critical factor than dimensionality for this specific task.

B. Surface Vehicle Navigation

Performance: MPPI-DK achieved tracking performance comparable to MPPI using the true nonlinear dynamics.
Efficiency:
- CPU: MPPI-DK was faster than classic MPPI but slower than a standard MPC using the same DKO model (due to the sampling overhead of MPPI vs. optimization of MPC).
- GPU: With GPU acceleration, MPPI-DK significantly outperformed both classic MPPI and MPC in terms of per-step computation time, demonstrating the scalability of the sampling-based approach when parallelized.

C. Quadruped Robot (Unitree Go1)

Task: Reference tracking to a goal pose from various initial states.
Results:
- Success Rate: 100% success rate for both MPPI-DK and classic MPPI across 10 initial states.
- Speed: MPPI-DK reduced per-step computation time from 11.7 ms (Classic MPPI) to 8.8 ms on GPU.
- Quality: MPPI-DK produced slightly smoother control inputs and a final state error slightly closer to the goal compared to the true dynamics baseline, while maintaining similar average tracking errors.

5. Significance and Conclusion

The paper establishes that learning structured linear dynamics (Koopman operators) is a viable and effective strategy for accelerating sampling-based control.

Real-Time Viability: By decoupling the complexity of the dynamics model from the sampling process, MPPI-DK enables high-frequency control on complex robotic platforms (like legged robots) where traditional MPPI would be too slow.
Data-Driven: The approach does not require an analytical model of the system; it learns directly from interaction data, making it applicable to systems where deriving exact physics equations is difficult.
Future Impact: This work bridges the gap between data-driven learning and optimal control, offering a pathway to deploy sophisticated, stochastic control strategies on embedded robotic systems with limited computational resources.