Reasoning Knowledge-Gap in Drone Planning via LLM-based Active Elicitation

Imagine you are flying a drone to find a lost hiker in a dense, smoky forest. You have a super-smart AI brain inside the drone, but it's not perfect. Sometimes, the smoke is just harmless fog, and sometimes it's a deadly fire barrier. The drone can't tell the difference just by looking.

The Old Way: "Stop and Ask"
Traditionally, when the drone gets confused, it hits the brakes, hovers in mid-air, and screams, "Human! I don't know what to do! Take the controls!"
This is like a student who doesn't know the answer to a math problem immediately raising their hand and asking the teacher to solve the whole equation for them. It's slow, annoying for the teacher, and the drone stops moving.

The New Way: "The Smart Detective"
This paper introduces a new way for drones and humans to work together. Instead of handing over the steering wheel, the drone acts like a detective who knows exactly what questions to ask to solve the mystery quickly.

Here is how their new system, called MINT (Minimal Information Neuro-Symbolic Tree), works, using a simple analogy:

1. The "What If" Tree (MINT)

Imagine the drone's brain is drawing a giant family tree, but instead of people, the branches are "What If" scenarios.

The Root: The drone sees smoke.
The Branches:
- Branch A: "What if this smoke is safe?" -> The drone plans a shortcut through the smoke.
- Branch B: "What if this smoke is dangerous?" -> The drone plans a long, safe detour around it.

The drone calculates: "If I go the wrong way, will I crash or waste 10 minutes?"

If the smoke is far away from the path, the branches look the same. The drone thinks, "No big deal," and keeps flying.
If the smoke is blocking the only path, the branches look very different. The drone thinks, "This is critical! I need to know the answer!"

2. The "Yes or No" Question (Active Elicitation)

Once the drone knows a question is important, it doesn't ask a vague question like, "What's going on out there?" (That's like asking a human, "Tell me everything about the forest," which is too much work).

Instead, it uses a Large Language Model (LLM)—basically a super-charged chatbot—to turn the complex math into a simple, binary question:

"Is the smoke ahead safe to fly through?" (Yes/No)

This is like a detective saying, "Did the butler do it? Yes or No?" It's the most efficient way to get the answer.

3. The Human's Role

The human operator (maybe a firefighter or a rescue worker) just listens and says "Yes" or "No" via voice. They don't need to be a pilot. They just need to know the context of the environment.

If the answer is "Yes": The drone instantly cuts off the "danger" branch of its tree, locks in the shortcut plan, and zooms forward.
If the answer is "No": It cuts off the "safe" branch, locks in the detour, and flies around.

Why is this a big deal?

The researchers tested this in a high-tech video game simulation (NVIDIA Isaac) and in real life with a real drone.

The Old Way (Pure AI): The drone was too scared to fly through smoke, so it took long, slow detours. It succeeded only 77% of the time.
The "Ask Everything" Way: The drone asked a human for every little thing it saw. It succeeded 100%, but it bothered the human 2 times for every mission.
The New Way (MINT): The drone only asked when it really mattered. It succeeded 100% of the time and only bothered the human 1.4 times on average.

The Bottom Line

This paper teaches drones to be smart about when to ask for help. Instead of panicking and handing over control, they analyze the situation, figure out the one specific piece of information they are missing, and ask a simple "Yes or No" question.

It turns the human from a "pilot" into a "consultant," making rescue missions faster, safer, and much less stressful for everyone involved.

Here is a detailed technical summary of the paper "Reasoning Knowledge-Gap in Drone Planning via LLM-based Active Elicitation."

1. Problem Statement

In Unmanned Aerial Vehicle (UAV) operations, particularly in open-world environments like search-and-rescue (SAR), autonomous agents often encounter environmental uncertainties (e.g., unidentifiable obstacles, ambiguous object semantics).

Current Limitation: Traditional Human-AI collaboration relies on control handover. When the AI encounters uncertainty, it suspends autonomy and transfers full control to a human operator.
Drawbacks: This approach is inefficient and cognitively demanding. It assumes the human operator possesses both the situational awareness and the low-level piloting skills to execute optimal maneuvers, which is often not the case.
Core Insight: The primary failure mode is not a lack of control policy, but a knowledge gap (missing specific information about the environment). The goal is to shift the paradigm from "acting" (taking over control) to "knowing" (resolving specific information deficits).

2. Methodology

The authors propose a Neuro-Symbolic Framework that integrates Large Language Models (LLMs), Vision-Language Models (VLMs), and symbolic reasoning to enable Active Elicitation. The system operates through three main stages:

A. Object-Driven Uncertainty Identification

The UAV uses an RGB-D camera and a VLM to construct a local semantic map. It identifies two types of ambiguities:

Obstacle-based Ambiguity: Critical properties of an obstacle are unknown (e.g., "Is the smoke toxic or passable?").
Target-based Ambiguity: Multiple objects match the task description, creating high entropy in the goal distribution (e.g., "Which of the two boxes contains the medicine?").

B. Knowledge-Gap Reasoning via MINT

To decide if and what to ask, the system constructs a Minimal Information Neuro-Symbolic Tree (MINT):

Tree Construction: The root represents the current state with the knowledge gap. Branches represent hypotheses (e.g., $h_{safe}$ vs. $h_{danger}$ ).
Simulation: For each branch, a hierarchical planner generates a temporary trajectory ( $\tau_h$ ) and calculates costs ( $C$ ).
Evaluation Metrics:
- Trajectory Divergence: Measures the difference in path cost and geometry between hypotheses. If the difference is below a threshold ( $\delta$ ), the uncertainty is deemed irrelevant.
- Goal Entropy: Measures the Shannon entropy of potential target distributions. High entropy indicates a need for clarification.
Pruning: If a gap is critical (causes significant decision divergence), the tree expands; otherwise, it is treated as a leaf, and the plan proceeds without human interaction.

C. LLM-Driven Active Elicitation

Once the MINT identifies a critical gap, an LLM acts as the inference engine:

Query Generation: The LLM analyzes the tree structure to formulate an optimal binary (Yes/No) question that maximizes Information Gain (IG).
- Objective: Find a query that collapses the hypothesis space into a single confident branch.
- Example: Instead of "What should I do?", the system asks, "Is the smoke ahead safe to fly through?"
Plan Refinement: Upon receiving the human's binary response (via voice interface), the system prunes inconsistent branches, updates the semantic map with the new "expert" knowledge, and generates a final optimized trajectory.

3. Key Contributions

Paradigm Shift: Moves Human-AI collaboration from reactive control takeover to proactive information elicitation, reducing cognitive load on non-expert operators.
MINT Framework: Introduces a novel Minimal Information Neuro-Symbolic Tree that explicitly structures uncertainty, allowing the agent to distinguish between "noise" uncertainties (irrelevant to the plan) and critical ones.
Optimal Query Formulation: Leverages LLMs to convert symbolic tree analysis into natural language binary queries, ensuring minimal interaction cost while maximizing information gain.
Integrated System: Demonstrates a full-stack workflow combining VLM perception, neuro-symbolic reasoning, voice interfaces, and low-level UAV control in both simulation and real-world settings.

4. Experimental Results

The framework was evaluated in two settings:

A. High-Fidelity Simulation (NVIDIA Isaac)

Scenario: A UAV navigating a burning warehouse with smoke (obstacle uncertainty) and a closed room (goal uncertainty).
Baselines:
- Pure LLM (Passive): No querying; treats uncertainty as obstacles or free space.
- Exhaustive Query: Asks the human about every detected uncertainty.
Results:
- Success Rate: MINT achieved 100%, compared to 77% for Pure LLM and 100% for Exhaustive Query.
- Interaction Cost: MINT reduced human interactions by 30% compared to the Exhaustive Query baseline (1.4 queries vs. 2.0 queries on average), proving its ability to filter out irrelevant uncertainties.

B. Real-World Deployment

Scenario: A quadrotor tasked with picking up medicine from a specific box among decoys.
Results:
- Success Rate: MINT achieved 100% success, whereas the Passive Planner failed 65% of the time (35% success rate) due to semantic ambiguity.
- Latency: The average time from start to trajectory completion was 20.7 seconds.
- Robustness: The system successfully resolved semantic ambiguities (e.g., identifying the correct colored box via voice query) that geometric planning alone could not solve.

5. Significance and Conclusion

This work establishes a scalable foundation for intuitive, voice-based human-robot collaboration in dynamic environments.

Efficiency: By filtering out non-critical uncertainties, the system minimizes the burden on human operators while maintaining high task success rates.
Scalability: The neuro-symbolic approach allows the system to handle complex, open-world scenarios where pure geometric planning fails.
Future Work: The authors plan to extend MINT to handle non-binary queries, continuous value elicitation, and the integration of formal methods (e.g., temporal logic) for enhanced safety.

In summary, the paper demonstrates that equipping AI agents with the ability to reason about what they don't know and ask targeted questions is a superior strategy to simply handing over control, significantly enhancing the autonomy and reliability of UAVs in complex missions.