LQRS: Learned Query Re-optimization Framework for Spark SQL

The paper proposes LQRS, a learned query re-optimization framework for Spark SQL that leverages a curriculum reinforcement learning strategy and runtime observations to dynamically refine execution plans, achieving up to a 90% reduction in end-to-end execution time compared to existing methods.

Jiahao He, Yutao Cui, Cuiping Li, Jikang Jiang, Yuheng Hou, Hong Chen

Published 2026-03-05
📖 4 min read☕ Coffee break read

Imagine you are the captain of a massive cargo ship (a database) trying to deliver goods (data) to a port. Your job is to figure out the fastest route to get there.

The Old Way: The Guessing Captain

Traditionally, the ship's captain (the Query Optimizer) looks at a map and tries to guess how much cargo is in each container and how fast the wind is blowing. Based on these guesses, they draw a single route on a piece of paper before the ship even leaves the dock.

The problem? The captain's guesses are often wrong. Maybe a container is actually empty, or a storm is brewing. But once the ship leaves the dock, the captain is stuck with that paper route. Even if they see a storm coming or realize a container is light, they can't change the course. They just have to power through, wasting fuel and time.

The "Smart" Way (Previous Attempts): The Pre-Flight Guru

Recently, engineers tried to replace the guessing captain with a Super-Computer trained on millions of past voyages. This computer is great at predicting the route before the ship leaves. It's much better than the old captain.

But, it still suffers from the same flaw: It can't change the plan once the ship is moving. It makes a perfect plan based on what it thinks is happening, but if reality is different, the ship is stuck with a bad route until it crashes or finishes slowly.

The New Solution: LQRS (The "Pilot Who Can Reroute")

The paper introduces LQRS, a new system that acts like a smart pilot who can reroute the ship while it's sailing.

Here is how LQRS works, using simple analogies:

1. The "Checkpoints" (Query Stages)

In the world of big data (Spark SQL), the journey isn't one long straight line. It's broken into Checkpoints. Imagine the ship drops off a batch of cargo, checks the weather, and then decides what to do next.

  • Old systems only look at the map at the very beginning.
  • LQRS waits for the ship to reach a checkpoint. Once the ship arrives, it knows the exact truth: "Wow, that first container was actually empty!" or "That storm is much smaller than we thought!"

2. The "Re-Optimization" (Changing the Course)

Because LQRS knows the real facts at the checkpoint, it doesn't just stick to the old paper plan. It says, "Okay, since that container is empty, let's skip the long detour and take the shortcut!"

  • It can swap the order of tasks (like deciding to load the light boxes before the heavy ones).
  • It can change the method of delivery (like switching from a slow truck to a fast drone).
  • Crucially: It does this while the ship is moving, not just before it leaves.

3. The "Teacher and Student" (Reinforcement Learning)

LQRS learns using a method called Reinforcement Learning, which is like training a dog.

  • The Student (The Actor): This is the part of the system that makes the decisions (e.g., "Let's swap these two tables").
  • The Teacher (The Critic): This part watches the student. If the student makes a move that saves time, the Teacher gives a treat (positive reward). If the student makes a move that causes a delay or a crash, the Teacher gives a "no" (negative reward).
  • Curriculum Learning: At first, the student is only allowed to make simple choices (like "start with this table"). As it gets smarter, the Teacher lets it try more complex moves (like "swap these two huge tables"). This prevents the student from getting overwhelmed.

4. The "Plug-and-Play" Tool

One of the coolest things about LQRS is that it doesn't require rebuilding the whole ship. It's a plug-and-play extension. Think of it like adding a high-tech GPS navigation system to an old car. The car (Spark SQL) still runs the engine, but the GPS (LQRS) can tell the driver to turn left or right instantly based on real-time traffic, overriding the old map.

Why is this a Big Deal?

The researchers tested LQRS on four different "oceans" (datasets) with thousands of complex queries.

  • The Result: LQRS was up to 90% faster than other smart systems.
  • The Analogy: If the old smart system took 100 minutes to deliver the cargo, LQRS did it in 10 minutes.

Summary

LQRS is a system that stops guessing and starts observing. Instead of making a rigid plan based on guesses, it waits for real data to arrive, learns from it instantly, and changes the plan on the fly. It combines the brainpower of AI with the flexibility of real-time adjustments, making database queries faster, smarter, and more efficient.