Nonlinear Performance Degradation of Vision-Based Teleoperation under Network Latency

Imagine you are trying to drive a car, but you aren't sitting in the driver's seat. Instead, you are sitting in a control room miles away, watching the car on a video screen and steering it with a joystick. This is called teleoperation. It's like being a remote pilot for a car, and it's becoming a crucial safety net for self-driving vehicles when they get stuck.

However, there's a catch: the internet isn't instant.

This paper is a deep dive into what happens when that video signal is slightly "stale" (delayed). The researchers wanted to know: How much lag can we handle before the car starts spinning out of control?

Here is the story of their findings, explained simply.

The Setup: The "LAVT" Test Lab

To find the answer, the researchers built a special digital playground called LAVT (Latency-Aware Vision Teleoperation testbed).

Think of this like a high-tech video game simulator, but instead of playing for fun, they are stress-testing the connection.

The Car (Server): A virtual car in a simulated city (like a video game world) with a camera on the front.
The Driver (Client): A computer miles away that receives the video and sends steering commands back.
The Problem: They artificially slowed down the internet connection to see how the car reacted to different levels of "lag."

The Experiment: Driving in Slow Motion

They ran 180 different driving tests on three different types of roads:

Straightaways: Easy driving.
Sharp Turns: Harder driving.
Curvy Roads: The most challenging.

They started with zero lag (perfect connection) and then gradually added delays, like adding heavy traffic to a highway. They measured two things:

Did the car finish the route? (Success rate)
How far did it drift off the road? (Tracking error)

The Big Discovery: The "Tipping Point"

The most exciting part of the paper is that the car didn't just get slightly worse as the lag increased. It didn't get worse slowly. Instead, it hit a cliff.

The "Safe Zone" (0–150 ms): Imagine you are talking to a friend on a video call with a slight delay. You can still have a conversation. The car drove fine, maybe wobbling a tiny bit on sharp turns, but it stayed on the road.
The "Danger Zone" (150–225 ms): This is where things get scary. The researchers found a sharp tipping point. Once the delay passed about 200 milliseconds (less than a blink of an eye), the car's performance collapsed.
- The Analogy: Imagine trying to catch a ball thrown at you, but your brain is 200ms slow. You swing your hand after the ball has already passed. You miss. The car does the same thing. It sees the road curve, but by the time it turns the wheel, it's already too late. It over-corrects, swings the other way, over-corrects again, and starts oscillating (wiggling wildly) until it crashes.
- The Result: At this delay, the success rate dropped from 100% to below 50%. The car went from a safe driver to a chaotic one almost instantly.

The "Double Whammy"

The researchers also tested what happens if the steering commands are also delayed (not just the video).

The Metaphor: Imagine you are playing a video game where the video is laggy, and your controller is also laggy.
The Finding: It made the situation much worse. Even if the video was okay, if the steering command was slow, the car failed faster. It's like trying to drive a car where the steering wheel is disconnected from the tires for a split second.

Why Does This Matter?

This paper tells us that vision-based teleoperation is fragile.

If we rely on humans or AI to drive cars remotely using cameras, we have to know exactly how fast the internet needs to be.

The Rule of Thumb: If the delay is under 150ms, we are probably safe.
The Warning: If the delay hits 225ms, the system is likely to fail catastrophically.

The Takeaway

The researchers didn't invent a new way to fix the lag (like a time machine). Instead, they did something just as important: they drew the map of the danger zone.

They showed us exactly where the "cliff" is. Now, engineers building self-driving cars and remote driving systems know they must design their networks to stay well below that 200ms mark, or they need to invent new "predictive" systems that guess where the car will be rather than just reacting to where it is.

In short: Driving a car remotely is like walking a tightrope. A little wind (lag) is fine, but cross a certain line, and you don't just stumble—you fall. This paper tells us exactly where that line is.

1. Problem Statement

Teleoperation is becoming a critical fallback mechanism for autonomous vehicles, allowing remote intervention when onboard autonomy fails. However, the impact of network latency on vision-based, perception-driven control remains insufficiently understood.

The Gap: Existing studies on teleoperation latency often rely on LiDAR-based localization or geometric path-following, where perception is decoupled from control timing. In contrast, vision-based control relies directly on image data; even moderate delays cause temporal misalignment between visual observations and vehicle motion.
The Challenge: In closed-loop lane keeping, delayed visual feedback leads to outdated lane geometry and shifted features, potentially causing oscillatory instability and system failure. There is a lack of systematic, quantitative data characterizing exactly when and how vision-based control collapses under varying network delays.

2. Methodology

The authors developed a specialized research framework and conducted a systematic simulation study to isolate latency effects.

A. The Latency-Aware Vision Teleoperation (LAVT) Testbed

Architecture: A distributed ROS 2 framework using rmw zenoh for communication. It separates the system into a Server (Vehicle side: CARLA simulation or DBW vehicle, camera capture) and a Client (Remote side: Video decoding, control logic).
Key Features:
- Explicit Time-Stamping: Embeds 64-bit nanosecond timestamps in video frames and control commands.
- One-Way Latency Measurement: Uses Chrony for clock synchronization to accurately measure one-way video latency ( $\tau_v$ ) and control latency ( $\tau_c$ ).
- Controlled Injection: Uses Linux Traffic Control (tc netem) to inject deterministic delays independently on the video (Server→Client) and control (Client→Server) channels.
- Controller: A deterministic, classical vision-based lane-keeping controller (Pure Pursuit for lateral control, PI for longitudinal) running on the client. It uses no learning-based compensation to isolate the raw effect of latency.

B. Experimental Design

Environment: CARLA Simulator (Town04) with three distinct routes (A, B, C) featuring straight segments, sharp 90° turns, and sustained curvature.
Protocol: 180 closed-loop experiments (30 runs per condition).
Latency Conditions:
- L0 (Baseline): 0 ms injected delay (inherent system latency only).
- L1–L3: Increasing video latency ( $\tau_v$ ) only (75 ms, 150 ms, 225 ms).
- L4–L5: High video latency combined with additional control-channel delay ( $\tau_c$ ).
Metrics: Mean Absolute Lateral Error (MAE), 95th-percentile cross-track error, route completion rate, collision rate, and lane-invasion events.

3. Key Contributions

Empirical Characterization of Instability: The paper provides the first systematic data on how delayed visual feedback specifically degrades closed-loop lane keeping, identifying a sharp "tipping point" rather than gradual degradation.
LAVT Framework: Introduction of a reproducible, research-oriented ROS 2 testbed designed specifically for latency studies in vision-based teleoperation, capable of deployment on both simulation and full-scale drive-by-wire vehicles.
Quantitative Stability Thresholds: Establishment of specific latency boundaries where system stability collapses, distinguishing between perception delay and control delay effects.

4. Key Results

The study reveals a nonlinear collapse in system stability as latency increases:

Stability Threshold: The system remains robust up to ~150 ms of one-way perception latency.
The Collapse Zone: Between 150 ms and 225 ms, the system undergoes a sharp transition from stable to unstable.
- Route Completion: Drops from 100% (at 0 ms) to 50% (at ~211 ms) and further to 10% (at ~287 ms + control delay).
- Tracking Error: The 95th-percentile cross-track error nearly doubles at 150 ms and increases nearly sixfold at the highest latency conditions.
Mechanism of Failure: As latency increases, the controller reacts to stale lane geometry, causing overshoot and counter-corrections. This leads to oscillatory steering and phase-lag effects, eventually resulting in lane departures and collisions.
Compounding Effect: Adding control-channel delay ( $\tau_c$ ) on top of high visual latency (Conditions L4, L5) significantly accelerates system failure, even if visual latency remains constant.
Survivorship Bias: The study highlights that tracking error metrics calculated only on completed runs can be misleading (showing "improvement" at high latencies because unstable runs terminated early). Therefore, route completion rate is a more critical metric for stability assessment.

5. Significance and Implications

Safety Boundaries: The findings establish a critical safety boundary for vision-based teleoperation. Systems relying on camera feedback must maintain one-way perception latency below 150 ms to ensure robust operation; delays exceeding 225 ms render the system highly prone to catastrophic failure.
Design Guidance: This work provides a quantitative baseline for designing latency-compensation strategies (e.g., predictive control, model-based estimation). It suggests that simple buffering is insufficient once the system crosses the stability threshold.
Future Research: The LAVT testbed enables future studies on stochastic network conditions (jitter, packet loss) and the evaluation of predictive algorithms against these empirically derived failure thresholds.

In summary, the paper demonstrates that vision-based teleoperation is not merely "degraded" by latency but faces a sharp stability boundary. Beyond this threshold, the closed-loop system becomes fundamentally unstable, necessitating advanced mitigation strategies for safe remote driving.