Optimizing Earth Observation Satellite Schedules under Unknown Operational Constraints: An Active Constraint Acquisition Approach

The Big Picture: The "Black Box" Satellite Problem

Imagine you are the mission control manager for a high-tech Earth Observation satellite. Your job is to take photos of specific cities, forests, or ships. Each photo has a "priority score" (a forest fire is high priority; a sunny beach is low priority). You want to take as many high-priority photos as possible.

The Catch: The satellite has strict physical rules.

It can't snap its camera too fast between two distant targets (it needs time to rotate and stabilize).
It can't take too many photos in a row without recharging its battery.

The Problem: In the real world, engineers don't always have a perfect list of these rules written down in a math textbook. Instead, the rules are buried inside complex computer simulations or "engineering manuals." If you ask the simulation, "Can I take these photos?" it just says "YES" or "NO." It won't tell you why it said "No." It's like a Black Box that only gives binary answers.

The paper asks: How do we find the best schedule when we don't know the rules, and the only way to learn them is by asking a "Yes/No" question?

The Old Way vs. The New Way

The Old Way: "Guess and Check" (The Greedy Approach)

Imagine you are trying to pack a suitcase but you don't know the airline's weight limit.

Strategy: You just throw your most expensive items in first. If the bag is too heavy, you start taking things out randomly until it fits.
Result: You might miss a great combination of items because you didn't understand the rules. You end up with a suitcase that is either too light (wasted space) or you have to unpack it many times.

The "Two-Phase" Way: "Learn Everything, Then Act" (FAO)

Imagine you decide to interview 100 airline employees to learn the exact weight limit before you even pack a single item.

Strategy: Ask 100 questions to figure out the rules. Once you think you know the rules, pack your suitcase.
Result: You might spend all your time asking questions and run out of time to actually pack. Or, you might ask the wrong questions and still get the rules slightly wrong.

The New Way: "Learn While You Do" (L&O with CCA)

This is what the paper proposes. Imagine you are packing your suitcase, but you have a smart assistant who learns the rules while you are packing.

You make a guess: "I'll put the camera and the laptop in."
The Black Box says: "NO." (It doesn't say why).
The Smart Assistant (CCA) steps in: Instead of panicking, the assistant asks a few quick, targeted questions to figure out which rule was broken.
- Assistant: "Is it because the camera and laptop are too heavy together?" (Box: No).
- Assistant: "Is it because the camera needs 5 minutes to cool down before the laptop?" (Box: Yes).
Update: The assistant writes down a new rule: "Camera needs 5 minutes before Laptop."
You try again: You immediately rearrange your suitcase based on this new rule and try again.

The Magic: You don't wait to learn all the rules before you start packing. You learn the specific rules that stop you from packing, fix your plan, and keep going. You stop as soon as you find a good enough suitcase, rather than waiting to learn every single rule in the universe.

Key Concepts Explained with Metaphors

1. The "Black Box" Oracle

Think of the satellite's engineering simulator as a strict bouncer at a club.

You show him your list of photos (your schedule).
He checks it against his hidden rulebook.
If it's bad, he just says "No entry." He doesn't tell you if it's because of the dress code, the ID, or the noise level.
The Paper's Innovation: The authors built a detective (CCA) that stands next to you. When the bouncer says "No," the detective asks clever follow-up questions to figure out exactly which rule was broken, so you can fix it for next time.

2. Conservative Constraint Acquisition (CCA)

This is the detective's strategy. It's called "Conservative" because it plays it safe.

The Scenario: You tried to take Photo A and Photo B, and the bouncer said "No."
The Detective's Logic: "Okay, maybe the rule is 'A and B need 3 minutes apart.' But wait, maybe the real rule is 'A and B need 4 minutes apart' because of a battery issue?"
The Strategy: The detective assumes the rule is stricter than it might actually be. It says, "Let's assume they need 4 minutes apart to be safe."
Why? It's better to be slightly too strict (and maybe miss one photo) than to be too loose and get rejected again. This "over-learning" actually helps the computer find a solution faster because it stops wasting time on impossible schedules.

3. The "Interleaved" Dance

The paper's method is like a dance between a Chef and a Food Critic.

Old Method: The Chef cooks a whole meal, the Critic tastes it and says "It's bad." The Chef throws it away, reads a cookbook for an hour, and tries again.
New Method (L&O): The Chef cooks a little bit. The Critic tastes it and says "Too salty." The Chef immediately adds a little sugar and tries again while the Critic is still tasting.
Result: The Chef finds a delicious dish much faster because they aren't waiting to read the whole cookbook before taking a bite.

What Did They Find? (The Results)

The researchers tested this on computer simulations with up to 50 different photo targets.

Speed: The new method (L&O) was 5 times faster than the old "Learn-Everything-First" method.
Quality: It found better schedules (higher priority photos) than just guessing.
Efficiency: It didn't need to ask the "Bouncer" 100 questions. It usually found the best answer after asking only 5 to 20 questions.
The "Good Enough" Surprise: They discovered that the system doesn't need to learn all the rules perfectly. Even if it only figured out 5% of the hidden rules, it could still find the best schedule. It just needed to learn the few rules that were blocking the best options.

The Bottom Line

This paper solves a problem where we don't know the rules of the game, but we have a referee who can only say "Yes" or "No."

Instead of trying to write down the entire rulebook before playing, the authors created a system that learns the rules on the fly while playing the game. It's like learning to drive a car by listening to the engine make a "clunk" sound when you shift gears too fast, rather than reading a manual on how the transmission works before you ever turn the key.

In short: Don't wait to know everything to start. Start, get rejected, learn the specific reason, fix it, and keep going. You'll get to the finish line faster and with a better result.

1. Problem Definition

The paper addresses the Earth Observation (EO) Satellite Scheduling Problem (EOSP) under a novel and challenging condition: Unknown Operational Constraints.

Standard Context: EO scheduling involves selecting which ground targets to image and when, maximizing total priority while respecting visibility windows and operational limits (e.g., rotation time, power, thermal limits).
The Challenge: In real-world scenarios, the mathematical models for these constraints are often incomplete or non-existent. Constraints are frequently embedded in high-fidelity simulators, engineering margin documents, or firmware logic rather than explicit formulas.
The Oracle Model: The authors assume the constraint model is hidden behind a binary feasibility oracle. The system can propose a full schedule, and the oracle returns Yes (feasible) or No (infeasible) without specifying which constraint was violated or the nature of the violation.
Goal: The objective is to find a high-priority feasible schedule by interacting with this oracle, learning the hidden constraints on the fly, rather than requiring a pre-defined model.

Simplified Model Used:
To study this interaction, the authors restrict the problem to two dominant constraint families:

Pairwise Separation: A minimum time gap ( $\delta$ ) required between two tasks $i$ and $j$ due to satellite rotation and stabilization.
Global Capacity: A limit on the number of tasks ( $k$ ) that can be scheduled within a sliding time window ( $w$ ), modeling power or bandwidth budgets.

2. Methodology: Learn & Optimize (L&O) with CCA

The proposed solution integrates constraint acquisition directly into the optimization loop, rather than treating them as separate phases.

A. The Framework: Learn & Optimize

The algorithm (Algorithm 1) operates in an iterative loop:

Optimize: Solve the scheduling problem using a constraint solver (CP-SAT) based on the currently learned constraints ( $L$ ).
Query: Submit the resulting schedule to the oracle.
Learn:
- If Yes: The schedule is feasible. The algorithm updates the best solution found and potentially prunes the candidate basis. It can terminate early if the current solution is accepted.
- If No: The schedule is infeasible. The algorithm invokes Conservative Constraint Acquisition (CCA) to refine the model $L$ by adding new constraints derived from the rejection.
Repeat: Continue until a feasible solution is found or the query budget is exhausted.

B. Core Innovation: Conservative Constraint Acquisition (CCA)

CCA is a domain-specific procedure designed for the separation/capacity structure of EO scheduling. Unlike generic acquisition algorithms (e.g., QuAcq), it does not use general "FindScope" subroutines but exploits the specific ordering of the constraints.

Mechanism: When a schedule is rejected, CCA attempts to identify the cause:
1. Pair Querying: For violated separation pairs, it performs a binary search over candidate separation values ( $\delta$ ) using partial queries (scheduling only the two conflicting tasks). It identifies the strongest justified gap $\delta^*$ and adds $sep(i, j, \delta^*)$ to the learned model.
2. Capacity Fallback: If no separation is justified, it learns the weakest violated capacity constraint (smallest window width $w$ and largest capacity $k$ ).
Conservatism: CCA is "conservative" because it may learn over-tightened constraints. For example, if a schedule is rejected due to a power limit, but the binary search for separation fails to find a cause, CCA might infer a stricter separation constraint than actually exists. While this reduces the search space, it does not necessarily prevent finding the optimal feasible solution.

3. Key Contributions

Problem Formulation (EOSP-UC): The first formalization of EO scheduling where constraints are hidden behind a binary oracle, moving beyond the assumption of fully specified models.
CCA Procedure: A novel, domain-specific acquisition algorithm tailored to separation and capacity constraints, avoiding the overhead of generic acquisition methods.
Interleaved Optimization: Embedding CCA within the Learn&Optimize framework allows the system to improve solutions continuously. It terminates as soon as a high-quality feasible solution is found, rather than waiting for a full constraint model to be reconstructed.
Empirical Validation: Extensive evaluation on synthetic instances (up to 50 tasks) demonstrating that partial constraint knowledge is sufficient for high-quality scheduling.

4. Experimental Results

The authors compared their L&O approach against:

Priority Greedy (PG): A baseline with no constraint knowledge.
FAO (Full Acquire-then-Optimize): A two-phase approach that runs 100 queries to learn constraints, then solves.
CP-SAT Reference: Solving with the true hidden model (used as the ground truth for gap calculation).

Key Findings:

Performance vs. Greedy: L&O drastically outperforms the greedy baseline. For $n \le 30$ , the average optimality gap drops from 65–68% (Greedy) to 17.7–35.8% (L&O).
Efficiency vs. FAO:
- Query Efficiency: L&O finds its best solution using significantly fewer main oracle queries (5–21 queries) compared to FAO's fixed 100 queries.
- Solution Quality: At $n=50$ , L&O achieves a better average gap (17.9%) than FAO (20.3%).
- Speed: L&O is approximately 5× faster in wall-clock time at $n=50$ (130s vs. 695s) because it stops early once a feasible solution is accepted.
Partial Knowledge Sufficiency: The algorithm finds optimal or near-optimal solutions even when it has exactly identified only 4–10% of the hidden constraints. This suggests that learning the specific constraints that block high-value infeasible schedules is more important than reconstructing the entire model.

5. Significance and Limitations

Significance:

Practical Applicability: The approach addresses a critical gap in satellite operations where engineering constraints are often "black boxes." It enables automated scheduling without requiring engineers to manually extract complex mathematical models from simulators.
Anytime Property: The method provides a high-quality feasible solution quickly, making it suitable for time-critical operational environments.
Paradigm Shift: It demonstrates that "perfect" constraint recovery is not necessary for effective optimization; learning enough to prune the infeasible high-value regions is sufficient.

Limitations:

Over-Tightening: CCA may learn constraints stricter than reality (e.g., inferring a 4-slot separation when the true limit is 3), potentially excluding feasible solutions.
Model Scope: The current study is limited to separation and capacity constraints. Extending to other constraint types (e.g., thermal, downlink) would require adapting CCA or using generic algorithms.
Oracle Assumptions: The method assumes a perfect, stationary oracle. It does not currently handle noisy feedback or constraints that drift over time.

Conclusion

This paper presents a robust framework for optimizing Earth Observation satellite schedules when operational constraints are unknown. By interleaving active constraint acquisition with optimization, the L&O approach achieves superior solution quality and significantly faster execution times compared to traditional "learn-then-solve" pipelines, proving that partial constraint learning is a viable and efficient strategy for complex real-world scheduling problems.