Handling Infinite Domain Parameters in Planning Through Best-First Search with Delayed Partial Expansions

Imagine you are a chef trying to perfect a recipe. In the world of standard computer planning, the chef has a fixed list of ingredients: "Add 1 cup of flour," "Add 2 eggs," "Add 1 teaspoon of salt." The computer just checks if these specific amounts work.

But in the real world, cooking is more fluid. Sometimes you don't just add "1 cup" of flour; you add "a little bit more" or "a lot less" depending on how the dough feels. You are making a continuous decision. In computer science, these fluid, adjustable numbers are called Control Parameters.

The problem is that computers hate infinite choices. If you tell a computer, "Add any amount of flour between 0 and 10 cups," it gets overwhelmed. There are infinite possibilities (1 cup, 1.0001 cups, 1.0000001 cups...), and the computer can't check them all one by one.

This paper introduces a new way for computers to handle these infinite choices without getting stuck. Here is the breakdown using simple analogies:

1. The Old Way: The "Constraint" Trap

Previous methods treated these fluid numbers like rules in a math test. Instead of asking the computer to choose the amount of flour, they asked it to solve an equation to see if a specific amount fits the rules.

The Analogy: Imagine you are trying to find a key in a giant, dark room. The old method says, "Don't look around; just calculate exactly where the key must be based on the shadows." It's smart, but if the math gets too hard, the computer gives up.

2. The New Way: The "Sampling" Explorer

The authors propose a new algorithm called S-BFS (Sampling Best-First Search). Instead of trying to solve the whole math problem at once, they let the computer explore the room.

The Analogy: Imagine you are in a massive, infinite library looking for a specific book. You can't read every book (there are too many).
- The Strategy: You pick a shelf, and instead of reading every book on it, you sample a few. You pick one, check if it looks promising, and if it does, you keep it. If it doesn't, you put it back and try another.
- The Twist (Delayed Partial Expansion): In the past, if a computer picked a shelf, it had to look at every book on that shelf before moving on. That's impossible here. So, this new method says: "Look at just one book from this shelf. If it looks good, come back later and look at another one from the same shelf." You don't finish the shelf all at once; you chip away at it over time.

3. The "Re-Expansion" Trick

Here is the clever part: What if the computer picks a "bad" book (a bad plan) but later realizes it might have been useful?

The Analogy: Imagine you are hiking up a mountain. You take a step, but then you realize, "Wait, maybe I should have taken a slightly different step."
In this new algorithm, the computer is allowed to go back to a previous spot, try a different number (a different step size), and see if that leads to a better view. It doesn't throw the old path away; it just adds a new branch to the map.

4. The "Penalty" System (Rectification)

Since the computer can keep going back and trying new numbers forever, how do we stop it from spinning in circles?

The Analogy: Imagine a game where every time you visit the same spot and try a new path, you have to pay a small "tax."
At first, the tax is low, so you are free to explore. But if you keep visiting the same spot over and over, the tax gets higher and higher. Eventually, the tax becomes so high that the computer decides, "Okay, I've explored this enough; let's try a completely new area." This ensures the computer eventually finds the solution without getting stuck in an infinite loop.

5. The Results: Does it Work?

The authors tested this "Sampling Explorer" against the old "Math Solver" methods.

The Outcome: The new method was much better at solving complex problems where the numbers could be anything (infinite possibilities).
The Trade-off: The "Math Solver" (like the NextFLAP planner) sometimes found slightly shorter, more perfect paths for small problems. But the "Sampling Explorer" (S-BFS) could solve way more problems that the others couldn't touch at all. It's like the difference between a surgeon who can only operate on small, simple cuts (perfect but limited) and a general practitioner who can handle almost any injury, even if the treatment isn't always the absolute shortest path.

Summary

This paper teaches computers how to make fluid, continuous decisions (like "how much gas to press") rather than just rigid, on/off choices.

Old Way: Try to calculate the perfect answer instantly (hard/impossible for infinite choices).
New Way: Take a guess, check if it's okay, and if not, try a slightly different guess later. Keep exploring until you find the way out.

It's a shift from solving a math problem to playing an intelligent game of exploration, allowing AI to handle real-world scenarios where things aren't just "black and white" but exist on a smooth, infinite spectrum.

1. Problem Definition

The paper addresses the challenge of automated planning with control parameters.

Context: In classical planning, actions are instantiated over finite objects. However, modern planning domains (e.g., PDDL extensions) often require continuous numeric decision variables (control parameters) to define the magnitude or rate of an action (e.g., "how much fuel to burn" or "how fast to move").
The Challenge: The decision space for these parameters is infinite (continuous intervals). Existing state-of-the-art planners (like POPCORN and NextFLAP) typically treat control parameters as constraints to be satisfied via Linear Programming (LP) or Satisfiability Modulo Theories (SMT) solvers, rather than as explicit decision points in the search space. Other approaches use neural networks to concretize plans, effectively bypassing systematic search.
Goal: The authors propose a method to treat control parameters as explicit decision points within a systematic search framework, ensuring the algorithm can navigate infinite decision spaces while guaranteeing a notion of completeness.

2. Methodology: Sampling Best-First Search (S-BFS)

The core contribution is the Sampling Best-First Search (S-BFS) algorithm, a modification of standard Best-First Search (BFS) designed for infinite branching factors.

A. Formalization

The problem is formalized as a numeric planning problem with control variables ( $U$ ).

State Space ( $S$ ): Defined by Boolean variables ( $F$ ) and numeric state variables ( $X$ ).
Control Space ( $U$ ): A set of bounded numeric variables representing the decision space (intervals).
Transition: A transition is a tuple $(s, \langle a, \mu \rangle, s')$ , where $a$ is an action and $\mu$ is a specific valuation of control variables.
Plan: A sequence of action-control pairs, not just actions.

B. Key Algorithmic Modifications

Standard BFS fails here because a node has infinitely many successors. S-BFS introduces two mechanisms:

Delayed Partial Expansions via Sampling:
- Instead of generating all successors, the algorithm uses a sampling function ( $\phi$ ) to probabilistically select a subset of successors (one at a time) from the infinite decision space.
- A state is not "closed" after one expansion; it remains in the open list to be potentially re-expanded later.
Rectification Functions ( $r_h$ ):
- To prevent the algorithm from getting stuck in infinite loops (re-expanding the same state indefinitely) and to ensure progress, a rectification function modifies the node evaluation criterion (NEC, $f$ ).
- The function $r_h(n, s)$ increases the cost/penalty of a state $s$ based on the number of times ( $n$ ) it has been re-expanded.
- Condition: $r_h$ must be monotonically increasing after a certain number of expansions to ensure that no state dominates the priority queue forever.

C. Algorithm Variants

The paper defines specific instances based on the evaluation function $f$ :

S-G ( $f = r_h$ ): Pure heuristic search with rectification.
S-A ( $f = g + r_h$ ): A* style search incorporating path cost ( $g$ ) and rectified heuristic.

3. Key Contributions & Theoretical Properties

A. Probabilistic Completeness

The authors prove that S-BFS is probabilistically complete under specific conditions:

Sampling Condition: The sampling function $\phi$ must have support over the entire decision space $D(s)$ (i.e., every possible successor has a non-zero probability of being sampled).
Rectification Condition: The rectification function must eventually increase monotonically.
Result: If a solution exists, the probability of finding it approaches 1 as the number of steps $n \to \infty$ .

B. Solution Quality Bounds

Proposition 1: In the search tree, the $f$ -value of any non-leaf node is bounded by the $f$ -value of its parent.
Theorem 2: For the S-A variant, if a goal state is found, its cost is bounded by the rectified $f$ -value of the initial state. While this does not guarantee global optimality (due to the infinite space), it provides a bound on solution quality relative to the search progress.

C. Sampling Strategies

The paper evaluates three sampling strategies:

Systematic ( $\phi_s$ ): Samples extremes and midpoints of intervals (e.g., 0, 1, 0.5, 0.25...).
Uniform ( $\phi_u$ ): Random uniform sampling.
Heuristic-Guided ( $\phi_h$ ): Biases sampling toward states with better heuristic values.

4. Experimental Results

Setup

Baselines: Compared against NextFLAP (a state-of-the-art planner using SMT/forward search) and MCTS with Progressive Widening (a standard infinite-space search algorithm). POPCORN was excluded due to compilation issues.
Domains: Used 7 domains (3 from POPCORN, 4 extended from IPC numeric domains like Drone and Sailing).
Metrics: Coverage (number of solved problems), plan quality (number of actions), and runtime.

Findings

Coverage Superiority:
- S-G solved 100% of the 140 test problems.
- S-A solved significantly more problems than NextFLAP and MCTS.
- NextFLAP solved fewer instances, likely due to its reliance on optimization modules that struggle with the specific structure of these continuous domains.
- MCTS performed poorly, solving very few instances.
Rectification Impact:
- Logarithmic rectification ( $r_{log}(n) = \log(1+n)$ ) yielded the best coverage for both S-G and S-A. This suggests that slow growth allows the heuristic to guide the search effectively without overly penalizing re-expansions too early.
Sampling Impact:
- Systematic and Uniform sampling outperformed Heuristic-Guided sampling.
- Reasoning: The heuristic function used ( $h_{GC}$ ) had many "plateaus" (flat regions), making it difficult to distinguish promising states. Consequently, heuristic-guided sampling behaved similarly to uniform sampling but with higher computational overhead.
Plan Quality:
- NextFLAP produced plans with fewer actions (higher quality) for the problems it solved, attributed to its explicit minimization of makespan and optimization steps.
- S-BFS produced slightly longer plans but solved a much broader set of problems.

5. Significance and Conclusion

Paradigm Shift: The paper successfully shifts the paradigm of handling control parameters from "constraint satisfaction" to "explicit decision making" within a systematic search.
Theoretical Foundation: It establishes the first systematic search algorithm with probabilistic completeness guarantees for planning in infinite decision spaces defined by control parameters.
Practical Utility: The S-BFS algorithm demonstrates superior robustness and coverage compared to existing solvers, making it a viable alternative for complex numeric planning problems where current methods fail to find any solution.
Future Work: The authors plan to integrate this framework with temporal planning (handling durative actions) and develop heuristics specifically designed for infinite decision spaces (moving beyond goal-counting heuristics).

In summary, this work provides a rigorous theoretical and practical framework for navigating infinite planning spaces, proving that delayed partial expansion combined with probabilistic sampling is a competitive and complete approach for control parameter planning.