Scalar Federated Learning for Linear Quadratic Regulator

The paper introduces ScalarFedLQR, a communication-efficient federated learning algorithm for heterogeneous Linear Quadratic Regulator control that reduces per-agent uplink communication to a single scalar while achieving linear convergence and improved gradient accuracy as the number of participating agents increases.

Mohammadreza Rostami, Shahriar Talebi, Solmaz S. Kia

Published 2026-04-08
📖 4 min read☕ Coffee break read

Imagine you are the captain of a massive fleet of drones. Your goal is to teach all of them the perfect flight pattern to save the most battery and avoid obstacles. This is a classic "Linear Quadratic Regulator" (LQR) problem—a fancy way of saying "find the best control rule."

The problem? You can't just look at the math. You have to learn by doing. You have to send the drones out, let them crash a few times (or just wobble), measure the results, and then adjust their rules. This is called "model-free learning."

Here is the catch:

  1. The Fleet is Huge: You have hundreds of drones.
  2. The Data is Heavy: To figure out the perfect rule, each drone needs to send you a massive "instruction manual" (a huge list of numbers) back to the central server.
  3. The Bandwidth is Tiny: Your radio connection is slow. Sending these huge manuals from hundreds of drones would clog the network instantly.
  4. The Cost is Real: Every time a drone flies a test pattern, it burns battery and risks crashing. You don't want to waste these expensive "test flights."

The Old Way: FedLQR

Previously, researchers tried to solve this by having every drone send its full, heavy instruction manual back to the server.

  • Pros: The server gets a very clear picture.
  • Cons: It chokes the network. If you have 100 drones, the server has to process 100 huge files. It's like trying to download a 4K movie from 100 different friends at the same time on a dial-up connection.

The New Way: SCALARFEDLQR

The authors of this paper, Mohammadreza Rostami and colleagues, came up with a clever trick called SCALARFEDLQR.

Here is how it works, using a simple analogy:

The Analogy: The "Blindfolded Hiker" and the "Compass"

Imagine each drone is a hiker trying to find the bottom of a valley (the best policy).

  • The Old Way: Every hiker sends a detailed map of the entire terrain back to the base camp. This takes forever to transmit.
  • The New Way: Instead of sending a map, the base camp sends a random direction (like "North-East") to every hiker.
    1. The hiker takes a tiny step in that direction and feels the slope.
    2. They don't send a map. They just send back one single number: "How much steeper did it get?" (This is the scalar projection).
    3. They also send a tiny "seed" (a password) so the base camp knows exactly which random direction they used.

Why is this magic?

  1. Tiny Messages: Instead of sending a 100-page map, the drone sends a single postcard with one number on it. This reduces the data load from "O(d)" (huge) to "O(1)" (tiny), regardless of how complex the drone is.
  2. The Magic of Numbers: The server receives thousands of these "one-number" messages. Because the directions were random but known (thanks to the seed), the server can mathematically reconstruct the average direction of the slope. It's like listening to a thousand people whispering "up" or "down" in random directions; if you average them out, you get a very accurate sense of which way the hill actually goes.
  3. The "More the Merrier" Effect: Here is the coolest part. Usually, in math, throwing away information makes things worse. But here, the more drones you have, the better it gets.
    • If you have 10 drones, the random noise might be a bit messy.
    • If you have 1,000 drones, the random noise cancels itself out perfectly. The server gets a crystal-clear picture of the best direction to go, even though it only received tiny, one-number messages.

The Results

The paper proves two big things:

  1. Safety: Even though the drones are sending tiny, incomplete messages, the math guarantees they will never fly into a wall or crash. They stay in the "safe zone" the whole time.
  2. Speed: The fleet learns just as fast as the old method, but uses a fraction of the radio bandwidth.

The Bottom Line

Think of SCALARFEDLQR as a way to coordinate a massive army of robots without clogging the communication lines. Instead of shouting detailed battle plans, each soldier just whispers a single number into a walkie-talkie. When you combine thousands of those whispers, you get a perfect strategy, saving battery, time, and bandwidth.

It turns a communication bottleneck into a non-issue, allowing us to control huge fleets of robots (like drone swarms or self-driving cars) efficiently and safely.

Get papers like this in your inbox

Personalized daily or weekly digests matching your interests. Gists or technical summaries, in your language.

Try Digest →