The Big Problem: Editing Photos is Like a Complex Construction Project

Imagine you have a photo and a very complicated instruction: "Find the bench, paint it pink, get rid of the cat, and paint the wall yellow."

Doing this isn't just one click. It's a chain of events. You need to find the bench, cut it out, change its color, find the cat, erase it, find the wall, and change its color. In the world of AI, each of these steps requires a different specialized tool (like a digital scalpel, a paintbrush, or a detector).

The problem is that AI tools are expensive to run (they take time and computing power). If you try every possible combination of tools to see which one works best, it's like trying to build a house by randomly picking up bricks and hammers until you accidentally build a wall. It takes forever and costs a fortune.

The Old Way: The "Slow and Steady" Search

Previous methods (like the one called CoSTA∗) acted like a very careful, slow detective.

They broke the big task into small steps (find bench, paint bench, etc.).
For every single step, they ran a complex search algorithm (called A∗ search) to find the absolute best tool combination.
This search was accurate but slow and expensive. It was like hiring a team of architects to draw 50 different blueprints for a single brick wall just to make sure they picked the right one.

The New Way: FaSTA∗ (The "Fast-Slow" Agent)

The authors created FaSTA∗ (Fast-Slow Toolpath Agent). Think of this as a Master Contractor who has learned from years of experience.

FaSTA∗ uses a "Fast-Slow" strategy, similar to how your brain works:

Fast Thinking (Intuition): You see a familiar situation (like a spilled cup) and immediately know to grab a towel. You don't think about it.
Slow Thinking (Reasoning): You see a weird, new situation (like a leak in a strange pipe) and you have to stop, think, and figure out the solution step-by-step.

How FaSTA∗ Works:

1. The "Fast" Plan: Using a Library of Shortcuts
Instead of searching for a solution every time, FaSTA∗ keeps a Library of Shortcuts (called Subroutines).

The Analogy: Imagine you've painted a hundred walls before. You know that for "small white walls," you always use "Brush A." You don't need to research paint brushes every time; you just grab Brush A.
In the Paper: The AI looks at past successful tasks. It uses a Large Language Model (LLM) to find patterns. It turns these patterns into Symbolic Rules.
- Example Rule: "If the object is small and the background is simple, use Tool X → Tool Y → Tool Z."
When a new task comes in, FaSTA∗ first checks its library. If it finds a matching shortcut, it uses it immediately. This is the "Fast Plan." It skips the expensive search entirely.

2. The "Slow" Plan: The Safety Net
What if the task is totally new, or the shortcut fails?

The Analogy: If you try to use your "Brush A" shortcut on a giant, textured mural and it fails, you don't give up. You stop, put down the brush, and call in the experts to figure out a custom solution.
In the Paper: If the "Fast Plan" (the shortcut) doesn't work or isn't available, FaSTA∗ switches to the "Slow Plan." It runs the expensive, careful A∗ search just for that specific difficult step to find the perfect tool path.

3. The "Learning" Loop: Getting Smarter Over Time
This is the magic part. FaSTA∗ doesn't just use the shortcuts; it learns new ones.

The Analogy: Every time the Master Contractor finishes a job, they write down what worked and what didn't. If they realize that "Brush A" fails on "red walls," they update their rulebook: "Brush A is for white walls; get Brush B for red walls."
In the Paper: After running many tasks, the AI analyzes its own "traces" (logs of what happened). It uses inductive reasoning to create new rules and add them to its library.
- Result: The more tasks it does, the more shortcuts it has, and the less it needs to use the expensive "Slow Plan."

The Results: Faster and Cheaper, Without Losing Quality

The paper tested FaSTA∗ against the old "Slow" method (CoSTA∗) and other AI editors.

Speed/Cost: FaSTA∗ was 49.3% cheaper and faster on average. It saved nearly half the cost because it used its "Fast" shortcuts for most tasks.
Quality: The quality of the final images was almost identical to the slow method (only a tiny 3.2% drop in a specific metric, but visually very competitive).
Reliability: When the shortcuts failed, the "Slow Plan" kicked in to save the day, ensuring the task still got done correctly.

Summary

FaSTA∗ is an AI that learns to be efficient. Instead of reinventing the wheel for every photo edit, it builds a mental library of "tried and true" recipes (subroutines). It uses these recipes 91% of the time for a lightning-fast result. Only when it encounters a truly unique or tricky problem does it slow down to do the heavy lifting. This makes complex image editing much more practical and affordable.

Technical Summary: FaSTA∗: Fast-Slow Toolpath Agent with Subroutine Mining for Efficient Multi-Turn Image Editing

Problem Statement

Multi-turn image editing requires applying a sequence of heterogeneous operations (e.g., detection, segmentation, inpainting, recoloring) to a single image based on complex natural language instructions. While existing Large Language Model (LLM) agents excel at high-level planning, they often misestimate the cost and quality of specific AI tools, leading to suboptimal toolpaths or hallucinations (e.g., selecting an expensive diffusion model when a simple filter suffices). Conversely, classical $A^*$ search can find optimal, verifiable toolpaths but is computationally expensive and lacks the ability to learn from past experiences to accelerate future planning. Previous work, such as CoSTA∗, combined LLM planning with $A^*$ search but remained a computational bottleneck due to repeated, expensive exploration of the same subtasks without reusing learned knowledge.

Methodology: FaSTA∗

FaSTA∗ is a neurosymbolic agent designed to address these inefficiencies by integrating online subroutine mining with an adaptive fast-slow planning framework. The system operates on the premise that humans learn reusable actions from past experiences; FaSTA∗ mimics this by continuously extracting and refining symbolic subroutines from successful toolpaths.

1. Online Inductive Reasoning and Subroutine Mining

Unlike test-time-only approaches, FaSTA∗ employs an online learning loop to build a Subroutine Rule Table.

Data Logging: The agent records detailed execution traces ( $\tau$ ) for each subtask, including the tool path taken, context features (e.g., object size from YOLO, mask properties from SAM, background complexity inferred by LLM), execution costs, and quality outcomes.
Periodic Refinement: Every $K$ tasks (e.g., $K=20$ ), the system triggers a refinement cycle. An LLM performs inductive reasoning on the recent batch of traces to identify recurring, successful tool sequences (subroutines) and the specific contextual conditions under which they perform well.
Symbolic Rule Generation: The LLM synthesizes compact, symbolic rules mapping subtasks and context features to cost-effective subroutines. For example: If object_size is "Not Too Small" and background is "Simple Texture," then use YOLO $\to$ SAM $\to$ SD Inpaint for Object Removal.
Verification: Proposed rules undergo rigorous verification on specialized test datasets. A "Net Benefit" score balances cost reduction against quality degradation. Only rules that improve the cost-quality trade-off are integrated into the rule table, with a retry mechanism for refinement if initial proposals fail.

2. Adaptive Fast-Slow Planning

FaSTA∗ executes tasks using a two-tiered planning strategy:

Fast Planning (Default): For each subtask in a high-level plan generated by an LLM, the agent first attempts to select a pre-learned subroutine from the Subroutine Rule Table based on the current image context. This selection is immediate and avoids search.
Slow Planning (Fallback): If no applicable subroutine exists, or if the selected subroutine's output fails a Visual Language Model (VLM) quality check, the agent triggers a "slow planning" phase. This involves a localized $A^*$ search on the low-level tool subgraph for that specific subtask to find an optimal path.
Execution Flow: The system prioritizes the fast plan. The expensive $A^*$ search is only invoked lazily when the fast plan fails or is unavailable, significantly reducing average execution time.

Key Contributions

Symbolic Subroutine Memory: The paper introduces a method for LLMs to learn reusable subroutines as symbolic rules from execution traces. This transforms the agent from a stateless planner into one with a learnable memory, allowing it to generalize successful patterns across diverse tasks.
Adaptive Fast-Slow Planning: FaSTA∗ develops a neurosymbolic framework that prioritizes efficient, rule-based "fast planning" while retaining the robustness of "slow planning" ( $A^*$ search) as a fallback. This architecture mimics human cognitive efficiency, handling common cases rapidly while reserving deep search for novel or challenging scenarios.
Cost-Quality Optimization: The system achieves a superior Pareto frontier, significantly reducing computational costs while maintaining competitive success rates compared to state-of-the-art baselines.

Experimental Results

Experiments were conducted on the CoSTA∗ benchmark (121 image-prompt pairs, 550 total manipulations) and the Complex-Edit benchmark.

Efficiency: FaSTA∗ reduces the average execution cost by 49.3% compared to CoSTA∗ (from ~58s to ~29s per image). On the Complex-Edit benchmark, it achieves a ~30% cost reduction.
Quality: The method maintains high output quality, with only a 3.2% degradation in accuracy compared to CoSTA∗ (0.91 vs. 0.94 average human evaluation score).
Subroutine Reuse: As the agent explores more tasks, the reuse rate of learned subroutines increases exponentially. In the final evaluation, 91% of subtasks were handled entirely by the "fast plan" (subroutines), with only 9% requiring the fallback to $A^*$ search.
Ablation Studies:
- Removing the subroutine verification step increased the fallback rate to 28%, demonstrating the necessity of validating rules before deployment.
- Removing the slow planning fallback (Fast Plan Only) caused a significant drop in quality (0.84), confirming the critical role of $A^*$ search for robustness.
- Using a smaller LLM for subroutine selection increased fallback rates and total cost, highlighting the need for capable reasoning models in the induction phase.

Significance and Claims

The authors claim that FaSTA∗ addresses a key bottleneck in prior multi-turn image editing methods: the computational expense of repeated, exhaustive search without knowledge reuse. By combining the planning efficiency of LLMs with the accuracy of $A^*$ search and the adaptability of inductive reasoning, FaSTA∗ offers a scalable, cost-sensitive solution.

The paper emphasizes that this approach moves beyond simple caching or memorization by learning generalized, symbolic rules that apply to varying contexts. While the system requires a "warm-up" phase to accumulate sufficient experience for effective subroutine mining, it eventually converges toward an optimal cost-quality frontier. The authors position FaSTA∗ as a promising direction for developing agile, continually improving AI agents capable of handling complex, long-horizon tasks efficiently.

FaSTA∗^*∗: Fast-Slow Toolpath Agent with Subroutine Mining for Efficient Multi-turn Image Editing