V-MORALS: Visual Morse Graph-Aided Estimation of Regions of Attraction in a Learned Latent Space

This paper introduces V-MORALS, a novel method that estimates Regions of Attraction in a learned latent space using only image-based trajectory data and Morse Graphs, thereby overcoming the limitations of existing approaches that require full state knowledge or known system dynamics.

Faiz Aladin, Ashwin Balasubramanian, Lars Lindemann, Daniel Seita

Published 2026-03-03
📖 5 min read🧠 Deep dive

Imagine you are trying to teach a robot how to stand up or balance a pole. Usually, to make sure the robot doesn't fall, engineers need a perfect, mathematical map of the robot's body, its joints, and its speed. They need to know exactly where every part is at every millisecond.

But what if you only have a camera? What if the robot can only "see" the world through a video feed, without knowing its own internal math? This is the problem the paper V-MORALS solves.

Here is the story of how they did it, explained with simple analogies.

1. The Problem: The "Black Box" Camera

In the old days, if you wanted to know if a robot was safe, you needed a full report card of its internal state (speed, angle, position).

  • The Issue: Cameras are messy. A single picture of a robot doesn't tell you if it's moving fast or slow. It's like looking at a single frame of a movie; you don't know if the car is speeding up or stopping. Plus, a picture has millions of pixels (too much data), while the robot's actual "state" is just a few numbers.
  • The Challenge: How do you predict if a robot will fall (fail) or stand up (succeed) just by watching a video, without knowing the robot's internal math?

2. The Solution: The "Dreaming" Robot

The authors created a system called V-MORALS. Think of it as a robot that learns to dream in a simplified world.

Instead of trying to process millions of pixels, the system does three things:

  • Step 1: The Silhouette Filter (The Mask)
    Imagine you are looking at a busy street scene. To understand a car's movement, you don't care about the trees or the clouds. You only care about the car.
    V-MORALS takes the video and turns it into a black-and-white silhouette. It strips away the background, the lighting, and the textures. It only keeps the shape of the robot. This makes the data much simpler to handle.

  • Step 2: The Time-Lapse Compressor (The Latent Space)
    A single picture is confusing, but a short video clip tells a story.
    The system takes a short sequence of these silhouettes (like a 10-second time-lapse) and squishes them down into a tiny, abstract "thought bubble."

    • Analogy: Imagine taking a 10-minute movie and summarizing it into a single 3D shape. If the robot is falling, the shape looks like a "falling triangle." If it's balancing, the shape looks like a "stable pyramid."
    • This "thought bubble" is called the Latent Space. It's a compressed map where the robot's complex movements are reduced to simple coordinates.
  • Step 3: The Crystal Ball (The Dynamics Network)
    Once the robot is in this "thought bubble" world, the system learns the rules of physics. It learns: "If the shape is tilted this way, it will likely tip over next."
    It creates a crystal ball that predicts the future shape based on the current shape, entirely within this simplified world.

3. The Map: The Morse Graph

Now that the robot has a crystal ball and a simplified map, it needs to know where it can go safely.

  • The Analogy: Imagine a topographical map of a mountain range.
    • The Valleys (Attractors): These are the safe places where the robot naturally settles. One valley is "Standing Up" (Success). Another valley is "Lying on the Floor" (Failure).
    • The Slopes: These show how the robot moves. If you push the robot from a certain spot, gravity pulls it into one of the valleys.
  • The Morse Graph: This is a flowchart the system builds. It connects the dots on the map. It draws arrows showing: "If you start here, you will end up in the 'Success' valley. If you start there, you will end up in the 'Failure' valley."

4. Why This Matters

Previously, you needed the robot's internal "soul" (its exact math data) to draw this map. V-MORALS proves you can draw the map just by watching the robot move on a screen.

  • Real-world impact: This means we can test safety for robots in the real world using just cameras, without needing to program every single joint's physics. It's like teaching a child to ride a bike by watching them, rather than measuring their heart rate and muscle tension.

Summary

V-MORALS is like a smart observer that:

  1. Watches a robot via a camera.
  2. Filters the video to see only the robot's shape.
  3. Compresses the video into a simple 3D "thought."
  4. Predicts the future by simulating how that "thought" moves.
  5. Draws a map showing exactly which starting positions lead to success and which lead to failure.

It turns a chaotic, high-dimensional video feed into a clear, simple map of safety, allowing us to trust robots even when we can't see their internal code.