When Sensors Fail: Temporal Sequence Models for Robust PPO under Sensor Drift

This paper proposes augmenting Proximal Policy Optimization with temporal sequence models, particularly Transformers, to enable robust reinforcement learning under sensor drift and partial observability by inferring missing information from history, a claim supported by theoretical bounds on reward degradation and empirical success on MuJoCo benchmarks.

Kevin Vogt-Lowell, Theodoros Tsiligkaridis, Rodney Lafuente-Mercado, Surabhi Ghatti, Shanghua Gao, Marinka Zitnik, Daniela Rus

Published 2026-03-06
📖 5 min read🧠 Deep dive

Imagine you are teaching a robot dog to run a marathon. In a perfect video game, the robot has perfect eyes and ears; it sees every tree, feels every bump in the road, and knows exactly where it is. But in the real world, things go wrong. Maybe the camera lens gets smudged with mud, or the GPS signal drops out in a tunnel, or a sensor just decides to take a nap.

This paper is about teaching robots (specifically, AI agents using a method called PPO) how to keep running the marathon even when their "eyes" and "ears" start failing.

Here is the breakdown of their solution, using some everyday analogies:

1. The Problem: The "Amnesia" Robot

Most standard AI robots are like people with short-term memory loss. They only look at what is happening right now.

  • The Scenario: Imagine your robot is balancing on a tightrope. Suddenly, its left-eye camera goes black (sensor failure).
  • The Old Way: A standard robot (using an MLP) panics. It sees "black" and thinks, "I have no idea where I am!" It freezes or falls because it can't remember that it was leaning left five seconds ago.
  • The Reality: In the real world, sensors don't just fail once and fix themselves instantly. They often fail in clusters (like a whole group of sensors on a car losing power at once) and stay broken for a while. This is called "sensor drift."

2. The Solution: Giving the Robot a "Diary"

The authors decided to give the robot a memory. Instead of just looking at the current frame, the robot looks at a timeline of what happened in the last few seconds.

They tested three different ways to give the robot this memory:

  • The RNN/SSM (The "Recurrent" Memory): This is like a robot that tries to remember the past by whispering a summary of the last second to itself before looking at the next one. It's efficient, but if the whisper gets garbled (because a sensor failed), the whole chain of memory can get messed up.
  • The Transformer (The "Super-Searcher"): This is the star of the show. Imagine a robot that doesn't just whisper to itself. Instead, it has a giant whiteboard where it writes down everything that happened in the last minute. When it needs to make a decision, it doesn't just guess; it scans the whiteboard.
    • The Magic: If the "left eye" sensor is broken, the robot looks at the whiteboard, sees that the "left eye" was working fine 3 seconds ago, and says, "Okay, I know what my left eye saw back then, so I can guess what it's seeing now." It can skip over the broken parts and focus on the good data.

3. The Experiment: The "Blindfold" Test

The researchers put these robots in a virtual gym (MuJoCo) with tasks like running, hopping, and walking.

  • The Setup: They simulated a disaster where up to 60% of the sensors were randomly broken or covered in mud.
  • The Results:
    • The Standard Robot (MLP) fell apart immediately. Without perfect vision, it couldn't figure out how to move.
    • The Whispering Robots (RNN/SSM) tried their best but often got confused when the "whisper" was interrupted by sensor failure.
    • The Super-Searcher (Transformer) kept running. Even with half its sensors broken, it used its "whiteboard" (history) to fill in the gaps. It was the only one that stayed upright and kept moving forward.

4. The Math: Why It Works

The authors didn't just guess; they did the math. They proved a "safety guarantee."

  • Think of it like a weather forecast. They calculated the odds that the robot would fail.
  • They found that the robot's safety depends on two things:
    1. How smooth the robot's brain is: If the robot makes tiny, gentle adjustments rather than wild jumps, it's safer.
    2. How long the sensors stay broken: If sensors fail for a long time, the robot needs a better memory.
  • The math showed that the Transformer approach is the most robust way to handle these "bad weather" conditions.

The Big Takeaway

In the real world, things break. Sensors get dirty, networks lag, and data gets lost.

  • Old AI: "I can't see, so I stop."
  • New AI (Transformer-based): "I can't see right now, but I remember what I saw a moment ago, and I can guess what's happening. I'll keep going."

This paper proves that giving AI agents a temporal sequence model (a way to reason about time and history) is the secret sauce for making them reliable in the messy, unpredictable real world. It's the difference between a robot that trips over a pebble and a robot that knows how to step over it, even if it can't see the pebble clearly.

Get papers like this in your inbox

Personalized daily or weekly digests matching your interests. Gists or technical summaries, in your language.

Try Digest →