Learning to Cover: Online Learning and Optimization with Irreversible Decisions

This paper proposes an asymptotically optimal online learning and optimization algorithm for minimizing irreversible facility openings under a coverage target, demonstrating that a policy balancing initial limited exploration with subsequent rapid exploitation achieves sub-linear regret that converges exponentially fast to its infinite-horizon limit.

Alexandre Jacquillat, Michael Lingzhi Li

Published 2026-03-06
📖 5 min read🧠 Deep dive

Imagine you are the mayor of a city planning to build a network of vaccination clinics (or maybe coffee shops, or emergency shelters) to serve your entire population. You have a big goal: you need to successfully open 1,000 clinics that actually work and serve people.

But here's the catch: You don't know which locations will work.

Some spots are perfect. Others are disasters because of traffic, bad weather, or lack of demand. Opening a clinic costs a fortune, and once you build it, you can't easily take it down (it's an irreversible decision).

You have a limited amount of time (say, 5 rounds of construction). If you wait too long to gather data, you run out of time. If you rush and build everywhere at once, you might waste money on 500 failed clinics.

This paper is about finding the perfect middle ground between "guessing blindly" and "waiting forever." It's called "Learning to Cover."

The Core Problem: The "Blindfolded Archer"

Imagine you are an archer trying to hit a target 1,000 times.

  • The No-Learning Approach: You just shoot 10,000 arrows blindly. You'll hit 1,000 eventually, but you wasted 9,000 arrows.
  • The Perfect-Learning Approach: You spend 10 years studying the wind, the bow, and the target before shooting a single arrow. You hit 1,000 targets with exactly 1,000 arrows. But you missed your deadline.
  • The "Learning to Cover" Approach: You shoot a few arrows, see where they land, adjust your aim, and shoot again. You want to hit 1,000 targets using the fewest arrows possible, but you only have 5 rounds to do it.

How the Solution Works: The "Pilot Program"

The authors propose a strategy that sounds like a smart business plan: Start small, learn fast, then scale up.

  1. The Pilot Phase (Exploration): In the first round, you open a small number of facilities in diverse locations. You aren't trying to hit the target yet; you are trying to learn. You treat these like a "test drive."
    • Analogy: Think of a chef tasting a soup before serving a banquet. They don't cook 1,000 bowls yet; they cook one small bowl to see if it needs salt.
  2. The Update: After the first round, you see which locations worked and which failed. You feed this data into a computer model (a "classifier") that gets smarter. It starts predicting: "Ah, locations near parks work well; locations near highways fail."
  3. The Scale-Up (Exploitation): In the second and third rounds, you use that smart model to pick the best spots. You stop guessing and start opening facilities where you are almost certain they will succeed.
  4. The Result: By the final round, you are opening facilities so efficiently that you barely waste any money.

The Big Discovery: "Sub-Linear" Magic

The paper's most exciting finding is about efficiency.

If you just guessed randomly, your wasted money (called "regret") would grow linearly. If you needed 1,000 clinics, you might waste money on 500 failures. If you needed 10,000, you'd waste on 5,000. It's a straight line of waste.

But with this "Learning to Cover" method, the waste grows sub-linearly.

  • The Metaphor: Imagine you are filling a bucket with a leaky hose.
    • Linear: The bigger the bucket, the more water you waste.
    • Sub-Linear: As the bucket gets bigger, your hose gets smarter. You waste a little bit more, but not that much more.
  • The Math: If you need 1,000 clinics, you might waste on 30 failures. If you need 1,000,000 clinics, you might only waste on 30,000 failures. The percentage of waste drops as the project gets bigger.

Why This Matters in Real Life

The authors show this works for:

  • Clinical Trials: Testing new drugs at different hospitals. You don't want to open 100 hospitals if 80 of them will fail to recruit patients. You test 10, learn, then open the rest.
  • Vaccination Campaigns: During a pandemic, you need clinics everywhere fast. You can't wait for perfect data, but you can't afford to open clinics in empty fields. You use this method to find the "sweet spot."
  • Venture Capital: Investing in startups. You don't bet the whole farm on one idea. You make small bets (pilot programs), see which ones work, and pour money into the winners.

The Takeaway

The paper proves that you don't need to be perfect to be efficient.

Even if you only have a few rounds to learn (a short planning horizon), taking a moment to run a small "pilot program" allows you to learn just enough to make the rest of your decisions incredibly smart. It turns a chaotic, expensive gamble into a streamlined, data-driven success story.

In short: Don't guess blindly, and don't wait forever. Test a little, learn fast, and then go big.