ConEQsA: Concurrent and Asynchronous Embodied Questions Scheduling and Answering

Imagine you are a personal assistant for a busy family. In the old days, your job was simple: someone would ask, "Where are the car keys?" You would walk around the house, find them, and bring them back. Then, you would wait for the next question. This is how current AI robots work today; they answer one question at a time, one by one.

But in real life, life doesn't wait. While you are looking for the keys, someone might yell, "Is the stove on?" and another person might ask, "Where is the dog?" These questions arrive at different times, and some are much more urgent than others. If you keep doing things one by one, the person worried about the stove might have to wait too long, and you might waste time walking back and forth to the kitchen twice.

This paper introduces a new way for robots to handle this chaos. They call it ConEQsA (Concurrent and Asynchronous Embodied Questions Scheduling and Answering).

Here is the breakdown of their solution using simple analogies:

1. The Problem: The "One-Task-at-a-Time" Robot

Current robots are like a single-lane road. If a car (a question) is waiting, the next car has to wait in line. Even if the second car is an ambulance (an urgent question), it still has to wait for the first car to finish its slow journey. This is inefficient and dangerous in real-world scenarios.

2. The Solution: The "Super-Organized Chef"

The authors propose a new system called ConEQsA. Imagine a master chef in a busy kitchen.

The Menu (The Questions): Instead of cooking one dish, finishing it, and then starting the next, the chef receives a list of orders that keep coming in.
The Shared Memory (The Pantry): The chef doesn't just look at one order in isolation. They remember everything they've seen in the pantry. If they walk to the fridge to get milk for a smoothie, they also notice the eggs are missing. They remember that for the next order that needs eggs. They don't need to walk to the fridge again later; they already know the eggs are gone.
The Priority System (The Urgency Bell): The chef has a smart system that rings a bell for urgent orders. If a customer says, "My food is on fire!" (High Urgency), the chef drops everything to handle that, even if they were halfway through chopping vegetables for a salad.

3. How It Works (The Magic Ingredients)

The paper describes three main tricks the robot uses to be this efficient:

Shared Group Memory: Instead of forgetting everything after answering one question, the robot keeps a "notebook" of everything it sees. If Question A asks "Is the TV on?" and the robot walks past the TV to answer it, it writes down "TV is off" in the notebook. If Question B later asks "Is the TV on?", the robot checks the notebook and answers immediately without walking anywhere!
Smart Scheduling (The Priority Planner): The robot doesn't just pick questions randomly. It calculates a "score" for every question based on:
- Urgency: Is this a fire alarm or a casual chat?
- Scope: Is the answer nearby (local) or far away (global)?
- Rewards: If I go to the kitchen, will I find answers for three different questions at once?
- Dependencies: Do I need to find the key before I can open the door?
Targeted Exploration: The robot doesn't just wander aimlessly. It plans a path that is most likely to answer the most important questions while gathering info for the other questions along the way.

4. The New Test: The "CAEQs" Benchmark

To prove this works, the authors created a new test called CAEQs.

Imagine a video game with 40 different rooms (like a house, an office, a factory).
In each room, the robot gets 5 questions.
Some questions appear at the start, and others pop up later (asynchronously).
Some questions are marked "High Priority" (like a safety alert), and others are "Low Priority" (like "what color is the sofa?").
The robot is graded not just on getting the right answer, but on how fast it answers the urgent ones and how little walking it does to get all the answers.

5. The Results

When they tested their new "Super-Organized Chef" robot against the old "One-Task-at-a-Time" robots:

Faster: It answered urgent questions much faster.
Smarter: It wasted less time walking around because it reused information it already found.
More Efficient: It answered about 9% of questions just by looking at its memory, without taking a single step!

The Bottom Line

This paper is about teaching robots to stop being rigid and start being flexible. Instead of treating every question as a separate, isolated task, they teach the robot to see the "big picture," prioritize what matters most, and use its memory to save time and energy. It's the difference between a robot that is a slow, single-lane worker and a robot that is a fast, efficient, and responsive team leader.

1. Problem Formulation: Embodied Questions Answering (EQsA)

The paper identifies a critical gap in existing Embodied Question Answering (EQA) research. Traditional EQA focuses on an agent answering a single, isolated question by exploring a 3D environment. However, real-world deployments (e.g., rescue robots, factory assistants) involve dynamic, multi-user interactions where:

Multiple questions arrive asynchronously.
Questions have varying degrees of urgency.
Questions often have dependencies (e.g., "Where is the key?" followed by "Is the door locked?").

To address this, the authors formalize a new problem setting called Embodied Questions Answering (EQsA). In EQsA, an agent must manage a dynamic set of initial and follow-up questions, scheduling exploration to answer them concurrently while minimizing latency weighted by urgency.

2. Methodology: The ConEQsA Framework

The authors propose ConEQsA, an agentic framework designed to handle concurrent, urgency-aware scheduling and answering. The system operates as distributed microservices communicating via Redis streams.

Key Components:

Group Memory: A shared, persistent memory store coupling observation images with structured semantic records (e.g., object bounding boxes, captions, 3D positions). This allows information gathered for one question to be reused for others, reducing redundant exploration.
Question Pool & Scheduler: Active questions are buffered and prioritized using a Priority Planning mechanism. The scheduler does not process questions in First-In-First-Out (FIFO) order but selects the next target based on a weighted score $P(q_i)$ .
Priority Score Calculation:
$P(q_i) = w_u \cdot \text{Urgency} + w_s \cdot \text{Scope} + w_r \cdot \text{Reward} + w_d \cdot \text{Dependency}$
- Urgency: A convex transform of the urgency label ( $-\ln(1-u_i)$ ) to prioritize critical tasks.
- Scope: Prioritizes "local" questions (answerable nearby) over "global" ones.
- Reward: Estimates how many other pending questions would benefit from the exploration required for the current question (information overlap).
- Dependency: Ensures prerequisites are met (e.g., locating an object before checking its state) via a Directed Acyclic Graph (DAG).
Targeted Exploration: Once a question is selected, the agent performs question-conditioned frontier exploration. It uses a semantic map and relevance signals (from VLMs like Qwen2.5-VL) to navigate toward regions most likely to answer the specific question.
Finishing Module: Before initiating exploration, the system queries the Group Memory. If sufficient evidence exists, the question is answered directly (Zero Exploration), significantly boosting efficiency.

3. Key Contributions

Problem Formulation (EQsA): Defined a novel task requiring agents to handle asynchronous, multi-question workloads with urgency constraints, moving beyond single-question paradigms.
ConEQsA Framework: Developed a concurrent agentic system featuring:
- Shared group memory for knowledge reuse.
- A priority-planning scheduler that balances urgency, scope, reward, and dependencies.
- A "Finishing Module" to enable direct answering from memory.
CAEQs Benchmark: Introduced the Concurrent Asynchronous Embodied Questions dataset:
- Based on 40 scenes from HM3D.
- Contains 200 questions (5 per scene: 3 initial, 2 asynchronous follow-ups).
- Includes human-annotated urgency labels (Low, Medium, High) and dependency structures.
New Evaluation Metrics: Proposed metrics specifically for EQsA:
- Direct Answer Rate (DAR): Measures the ability to answer without new exploration.
- Normalized Urgency-Weighted Latency (NUWL): The primary metric measuring the total delay of answers, weighted by their urgency.

4. Experimental Results

The authors evaluated ConEQsA against strong sequential baselines (Explore-EQA and Memory-EQA) on the CAEQs dataset.

Performance Gains:
- Efficiency (NS): ConEQsA reduced Normalized Steps to 0.321, significantly outperforming Memory-EQA (0.410) and Explore-EQA (0.472).
- Responsiveness (NUWL): ConEQsA achieved an NUWL of 0.204, representing a 57% reduction in latency compared to Memory-EQA and 63% compared to Explore-EQA.
- Knowledge Reuse (DAR): ConEQsA achieved a 9.0% Direct Answer Rate, whereas sequential baselines scored 0% (as they always initiate new exploration).
- Accuracy: ConEQsA maintained competitive accuracy (0.65) comparable to the SOTA Memory-EQA (0.64).
Ablation Study: Removing any component of the priority planner (Urgency, Scope, Reward, or Dependency) led to performance degradation, confirming that the multi-faceted scheduling strategy is essential for handling complex workloads.

5. Significance and Future Work

Real-World Applicability: The paper argues that reducing physical interaction steps (navigation) is crucial for real-world robotics due to energy and time constraints. ConEQsA demonstrates that concurrent scheduling and memory reuse are key to making embodied agents responsive in dynamic environments.
Standardization: The CAEQs benchmark and NUWL metric provide a fair, standardized protocol for evaluating future multi-question EQA methods.
Limitations & Future Directions: The current system relies on the accuracy of underlying single-question EQA components. Future work aims to extend the framework to multi-agent scenarios, where multiple robots collaboratively solve EQsA tasks, introducing challenges in decentralized coordination.

In summary, ConEQsA represents a paradigm shift from treating EQA as a series of isolated tasks to a holistic, concurrent management problem, significantly improving efficiency and responsiveness in embodied AI.

ConEQsA: Concurrent and Asynchronous Embodied Questions Scheduling and Answering

1. The Problem: The "One-Task-at-a-Time" Robot

2. The Solution: The "Super-Organized Chef"

3. How It Works (The Magic Ingredients)

4. The New Test: The "CAEQs" Benchmark

5. The Results

The Bottom Line

1. Problem Formulation: Embodied Questions Answering (EQsA)

2. Methodology: The ConEQsA Framework

Key Components:

3. Key Contributions

4. Experimental Results

5. Significance and Future Work

More like this

ARC-AGI-3: A New Challenge for Frontier Agentic Intelligence

When Is Collective Intelligence a Lottery? Multi-Agent Scaling Laws for Memetic Drift in LLMs

AutoSAM: an Agentic Framework for Automating Input File Generation for the SAM Code with Multi-Modal Retrieval-Augmented Generation

Trust as Monitoring: Evolutionary Dynamics of User Trust and AI Developer Behaviour

Formal Semantics for Agentic Tool Protocols: A Process Calculus Approach