Quantized Online LQR

This paper introduces the Quantized Certainty Equivalent (QCE-LQR) algorithm for online linear-quadratic regulation with unknown dynamics under communication constraints, which achieves optimal O(logT)O(\log T) bit transmission by sending learned system estimates rather than raw states, thereby matching fundamental information-theoretic lower bounds while recovering unquantized control performance.

Barron Han, Victoria Kostina, Babak Hassibi

Published 2026-04-15
📖 5 min read🧠 Deep dive

The Big Picture: A Remote Pilot and a Local Co-Pilot

Imagine you are flying a very complex airplane (like a Boeing 747), but you are doing it from a control tower miles away. You can't see the plane directly; you only get tiny, blurry snapshots of its position sent to you over a very slow, narrow radio channel.

The Problem:
In the past, to control the plane, the pilot in the tower would have to send a command ("Turn left 5 degrees") every single second. Because the radio is slow, this takes up a lot of bandwidth. Also, because the radio is fuzzy, the commands get distorted, making the plane wobble and fly inefficiently. If the plane's physics change (e.g., it gets heavier or the wind changes), the pilot has no idea how to adjust because they don't know the plane's current "personality."

The New Idea:
This paper proposes a smarter way to fly. Instead of sending tiny, blurry snapshots of the plane's position every second, the plane (the "plant") does the heavy lifting locally.

  1. The Plane's Job: The plane has a super-computer on board. It watches itself, figures out exactly how it's flying, and learns its own physics (how it responds to the rudder, the engines, etc.).
  2. The Tower's Job: The tower knows the goal (fly efficiently, save fuel, stay safe) but doesn't know the plane's current physics.
  3. The Handshake: The plane sends a summary of what it learned about its own physics to the tower. The tower uses this summary to calculate the perfect flight plan (the "policy") and sends that plan back to the plane.
  4. The Execution: The plane executes the plan itself. Since the plane knows its exact position and the plan, it flies perfectly.

The Catch: The summary the plane sends must be tiny because the radio is slow. The paper asks: How small can we make this summary while still flying perfectly?


The Core Discovery: "Logarithmic" vs. "Linear"

The authors discovered a fundamental rule about how much data you need to send to learn and control a system.

  • The Old Way (Sending Raw Data): If you try to send the plane's position every second, you need a massive amount of data that grows with time. It's like trying to describe a movie by sending a photo of every single frame. The file size gets huge.
  • The New Way (Sending "Updates"): The paper proves that you only need to send the changes in what you've learned.
    • Imagine you are learning a new language. At first, you make many mistakes and need to send long explanations. But as you get better, you only need to send tiny corrections ("No, it's this word, not that one").
    • The paper shows that the total amount of data needed to control the plane perfectly over a long time only grows logarithmically.
    • Analogy: If you fly for 100 hours, you might need 100 bits of data. If you fly for 10,000 hours, you don't need 10,000 bits; you only need a few hundred more. The "cost" of learning slows down drastically.

The Secret Sauce: The "Smart Ruler"

The hardest part of this paper is the math behind how to compress the data.

The authors realized that learning a system isn't uniform. Some parts of the system are easy to figure out quickly (like the weight of the plane), while others are tricky and take a long time to learn (like how the wind affects the tail).

  • The Mistake of the "One-Size-Fits-All" Ruler: If you use a standard ruler to measure everything, you have to make the ruler very precise to catch the tricky parts. This wastes space on the easy parts.

  • The "Smart Ruler" (Two-Scale Quantization): The authors invented a special measuring tool that has two speeds:

    1. Fast Speed: For the easy-to-learn parts, it sends tiny, quick updates.
    2. Slow Speed: For the tricky parts, it sends slightly larger, slower updates.

    By mixing these two speeds, they ensure they never send too much data, but they never miss a critical detail. This allows the plane to learn perfectly without clogging the radio.

The "Safety Net"

There is a risk: What if the plane sends a summary that is slightly wrong, and the tower calculates a flight plan that crashes the plane?

The paper includes a "Safety Net" phase:

  1. The Burn-in: At the very start, the plane uses a simple, safe, pre-programmed flight mode (like a training wheels mode) while it gathers data.
  2. The Trigger: Once the plane is 99.9% sure it understands its own physics, it flips a switch. It sends a "Safe" signal to the tower.
  3. The Handoff: The tower then starts sending the complex, optimized flight plans. If the plane ever starts to wobble too much, it instantly reverts to the simple "training wheels" mode.

Why This Matters

This research solves a major problem for the future of technology: The Internet of Things (IoT) and Autonomous Systems.

  • Battery Life: Drones, satellites, and self-driving cars often run on batteries. Sending huge amounts of data drains batteries fast. This method saves energy.
  • Bandwidth: In remote areas (like deep oceans or space), internet is slow. This method allows complex robots to work perfectly even with terrible internet connections.
  • Privacy: Instead of sending raw video or sensor data (which might reveal secrets), the robot only sends a mathematical summary of what it learned.

Summary in One Sentence

The paper proves that a robot can learn to control itself perfectly while talking to a remote brain using only a tiny, shrinking amount of data, by sending "updates on what it learned" rather than "raw snapshots of the world," using a clever two-speed compression trick to stay safe and efficient.

Get papers like this in your inbox

Personalized daily or weekly digests matching your interests. Gists or technical summaries, in your language.

Try Digest →