Parallelized Planning-Acting for Efficient LLM-based Multi-Agent Systems in Minecraft

Imagine you are playing a video game like Minecraft with a team of friends. In a normal game, you all talk, decide what to do, and then act. But in the world of AI, most "smart" agents (robots powered by Large Language Models) work like a very slow, rigid team: one person thinks, then everyone stops and waits for them to finish talking before anyone can move.

If the game world changes while they are thinking (like a monster appearing or a bridge collapsing), the agent is stuck. It's like trying to drive a car while the driver is reading a map, and the car only moves when the driver puts the map down.

This paper introduces a new way to make AI teams work: Parallelized Planning-Acting. Here is the breakdown in simple terms:

1. The Problem: The "Stop-and-Go" Traffic Jam

Current AI systems work in Serial Mode.

Think: The AI stops everything to plan its next move.
Act: It executes that move.
Repeat: It stops again to plan the next move.

In a chaotic game like Minecraft, this is terrible. If a dragon attacks while the AI is planning, the AI can't react until it finishes its thought process. It's like a chess player who has to freeze their hand for 10 seconds every time they think, while their opponent keeps moving pieces.

2. The Solution: The "Thinking While Running" Team

The authors propose a Dual-Thread Architecture. Imagine a human runner who can also talk on the phone while running.

Thread 1 (The Runner/Acting): This part of the AI is constantly moving, fighting, and gathering resources. It doesn't wait for permission; it just keeps going.
Thread 2 (The Navigator/Planning): This part is constantly looking at the map, reading the chat, and thinking about the next best move.

The Magic Trick: These two threads run at the same time. The "Runner" keeps moving while the "Navigator" figures out the next step.

3. The "Interrupt" Button

This is the most important feature. In the old system, once the AI started digging for gold, it had to finish digging even if a monster attacked.

In this new system, the "Navigator" can hit an Interrupt Button.

Scenario: The AI is mining a diamond. Suddenly, the Navigator sees a zombie coming.
Action: The Navigator instantly yells, "Stop mining! Fight the zombie!"
Result: The "Runner" drops the pickaxe immediately and switches to a sword. No waiting, no finishing the current task first.

4. The "Central Brain" (Shared Memory)

In many AI teams, if Agent A sees a monster, Agent B doesn't know about it until Agent A finishes its whole task and tells them. This is like playing a game of "Telephone" where the message takes too long to get across.

This system uses a Centralized Memory System.

Think of it as a Live Google Doc that everyone is editing in real-time.
As soon as Agent A sees a monster, it updates the Doc.
Agent B reads the Doc instantly and knows to run away or help.
This ensures the whole team is always on the same page, using the freshest information.

5. The "Recipe Book" (Recursive Skill Library)

Minecraft is complex. To make a Diamond Sword, you need diamonds. To get diamonds, you need a pickaxe. To get a pickaxe, you need iron. To get iron, you need coal... and so on.

Old AI agents often got stuck because they didn't know the steps to get the tools they needed.

This paper gives the AI a Massive, Smart Recipe Book.
If you ask the AI to "Get a Diamond Sword," it doesn't just say "I don't know how." It automatically breaks it down: Need sword -> Need diamonds -> Need pickaxe -> Need iron -> Need coal.
It solves the whole chain of tasks automatically, like a chef who knows exactly how to get from "raw ingredients" to "gourmet meal" without being told every single step.

The Result?

The researchers tested this in Minecraft with teams of AI agents fighting bosses (like the Ender Dragon) and gathering resources.

Faster: They finished tasks much quicker because they didn't stop to think.
Smarter: They reacted instantly to danger because of the "Interrupt" button.
Better Teamwork: They shared information instantly via the "Live Google Doc."

In a nutshell: This paper teaches AI agents how to be like a professional sports team: they are constantly moving and reacting to the game in real-time, while a coach (the planner) is shouting instructions and adjusting the strategy on the fly, all without the players ever stopping to catch their breath.

Here is a detailed technical summary of the paper "Parallelized Planning-Acting for Multi-Agent LLM Systems in Minecraft".

1. Problem Statement

Existing Large Language Model (LLM)-based Multi-Agent Systems (MAS) primarily rely on serialized execution paradigms. In these systems, an agent must complete a full LLM planning cycle (reasoning, decision-making) before executing any action. This creates significant bottlenecks in dynamic, real-time environments like Minecraft for three main reasons:

Inflexible Scheduling: Agents cannot react to sudden environmental changes or urgent events while waiting for an LLM response.
Limited Replanning: Once an action sequence begins, it often runs to completion without interruption, preventing agents from adapting to new information (e.g., a boss changing attack patterns).
Memory Sharing Delays: In many frameworks, memory updates (observations, chat logs) only occur after an action is finished. This leads to agents operating on outdated information, hindering team coordination.

The authors argue that to achieve true real-time interaction in open-world scenarios, the separation of planning (reasoning) and acting (execution) is necessary, allowing them to occur concurrently.

2. Methodology

The paper proposes a Parallelized Planning-Acting Framework featuring a dual-thread architecture with an interruptible execution mechanism. The system is designed to decouple LLM reasoning from physical action execution.

A. Dual-Thread Architecture

Each agent operates with two independent, asynchronous threads communicating via a shared Action Buffer (a single-slot queue):

Planning Thread:
- Driven by the LLM and a Centralized Memory System.
- Continuously monitors the environment, team chat logs, and current actions.
- Generates new action proposals ( $A_{new}$ ) and an interruption flag ( $flag_{intr}$ ).
- If $flag_{intr}$ is true (indicating the new plan is more urgent or the current action is obsolete), it sends a restart signal to the acting thread.
- Writes the latest action to the shared buffer, overwriting previous entries to ensure the most up-to-date plan is always available.
Acting Thread:
- Responsible for executing skills from a Comprehensive Skill Library.
- Periodically checks the Action Buffer.
- If a new action is present, it immediately aborts the current skill execution (if an interrupt flag is set) and switches to the new action.
- If no interrupt is triggered, it continues the current action while the planner works in the background.

B. Centralized Memory System

To solve memory sharing delays, the authors implement a unified memory repository updated in real-time:

Observation Records: Continuously polled (e.g., every second) to reflect the latest agent status and environmental state.
Chat Logs: Supports both Passive Communication (LLM-generated summaries of observations) and Active Communication (agents explicitly sending messages via skills). This ensures all agents have access to the latest team context.
Action History: Records execution history to help the planner decide whether to interrupt or continue current tasks.

C. Comprehensive Skill Library & Recursive Task Decomposition

The acting thread utilizes a skill library based on Mineflayer that automates complex workflows.

Recursive Task Decomposition: Instead of asking the LLM to plan every step of a complex task (e.g., "Craft Diamond Armor"), the system uses a Directed Acyclic Graph (DAG) to model task dependencies.
Mechanism: When a high-level task is requested, the system automatically resolves prerequisites (e.g., mining ore $\to$ smelting $\to$ crafting tools) recursively.
Benefit: This reduces the number of LLM calls significantly (from $L$ calls for a path of length $L$ to just 1 call for the high-level goal) and offloads detailed execution to the deterministic skill library.

3. Key Contributions

Parallelized Planning-Acting Framework: A novel architecture that decouples planning and acting into dual threads with interruptible execution, enabling real-time adaptation in dynamic environments.
Centralized Memory System: A real-time updated memory module that minimizes information latency, ensuring agents coordinate based on the latest environmental and team data.
Comprehensive Skill Library with Recursive Decomposition: A DAG-based mechanism that automates prerequisite resolution for complex tasks, drastically reducing LLM overhead and improving execution efficiency.
Empirical Validation: Extensive experiments in Minecraft demonstrating a paradigm shift from serialized deliberation to parallelized interaction.

4. Experimental Results

The framework was evaluated on three benchmark task categories in Minecraft:

Resource Collection:
- Compared multi-agent (3 agents) vs. single-agent baselines.
- Result: The parallelized multi-agent system significantly reduced completion times (e.g., Diamond Armor: 13.7 min vs. 28.3 min for single agent).
Boss Combat (Elder Guardian, Wither, Ender Dragon):
- Evaluated success rates and health ratios across different team sizes (3 to 20 agents).
- Result: High success rates (up to 100% with 10-20 agents). The system successfully adapted to dynamic boss mechanics (e.g., switching to melee when the Wither becomes immune to ranged attacks).
Adversarial PVP:
- Direct comparison between the Parallelized framework and a Serialized baseline in 2v2 and 3v3 battles.
- Result: The parallelized framework achieved a significantly higher victory rate. The interruption mechanism was identified as the key factor, allowing agents to instantly switch targets or prioritize healing when under attack, whereas serialized agents were often "stuck" in previous actions.
Ablation Studies:
- Removing the recursive task decomposition caused a failure in complex tasks.
- Removing the parallelized framework (reverting to serialization) drastically reduced success rates in combat scenarios.
- Removing the centralized memory system led to poor coordination and lower efficiency.
Scalability:
- Tests with 5–50 agents showed that inference time stabilizes rather than growing linearly, and token costs grow approximately linearly, confirming the system's scalability.

5. Significance

This work addresses a critical limitation in current LLM-based MAS: the inability to operate effectively in non-paused, dynamic environments. By introducing a parallelized architecture with interruptible execution, the authors enable agents to:

React to environmental changes in real-time without waiting for a full planning cycle.
Coordinate more effectively through up-to-date shared memory.
Execute complex, multi-step tasks efficiently by offloading dependency resolution to a structured skill library.

The framework sets a new standard for embodied AI in open-world games and offers a generalizable approach for deploying multi-agent systems in any dynamic, real-time domain where responsiveness is crucial.