"Don't Do That!": Guiding Embodied Systems through Large Language Model-based Constraint Generation

Imagine you are teaching a robot dog how to clean your house. You give it a simple command: "Go clean the living room, but don't get too close to the fireplace because it's hot, and don't go into the kitchen if the cat is sleeping there."

If you ask a standard, super-smart AI (like a Large Language Model or LLM) to just "figure out the path," it might get confused. It might hallucinate, thinking the fireplace is a decoration it can walk through, or it might forget the cat and walk right into the kitchen, waking the pet. It's like asking a brilliant but dreamy poet to drive a car; they might write a beautiful poem about the journey, but they'll crash the car because they can't do the math of steering.

This paper introduces a new system called STPR (Safe Trajectory Planning with Restrictions) to solve this problem. Think of STPR not as a driver, but as a translator and a safety inspector.

Here is how it works, broken down into simple steps:

1. The Translator (The LLM's New Job)

Instead of asking the AI to "drive the car," STPR asks the AI to write the rulebook.

The Old Way: You tell the AI, "Don't go near the fire." The AI tries to guess the path.
The STPR Way: You tell the AI, "Don't go near the fire." The AI translates that sentence into a strict, mathematical Python code function.
- Analogy: Imagine the AI is a translator who doesn't speak "Robot" but speaks "Math." You say, "No fire!" and the translator writes a sign that says: IF distance_to_fire < 2 meters THEN STOP.

2. The Safety Inspector (The Point Cloud)

Once the AI writes that code, STPR takes it and uses it to paint invisible "danger zones" in the robot's 3D map.

The Metaphor: Imagine the robot is walking through a field of invisible fog. The code generated by the AI turns the area around the fireplace into a zone of "sticky tar." If the robot tries to step into the tar, the system says, "Nope, that's a collision."
The system samples thousands of points in the air (like a digital spray of paint) to create a 3D map of where the robot can and cannot go, based entirely on the rules the AI wrote.

3. The Driver (The Classic Algorithm)

Now, the robot doesn't need to "think" or "guess" anymore. It just uses a very old-school, super-reliable navigation tool (like A* or RRT*).

The Metaphor: This is like a GPS that only knows how to find the shortest path on a map that has already been marked with "Road Closed" signs. The GPS doesn't need to understand why the road is closed (it doesn't know about fire or cats); it just sees the red line and finds a way around it.
Because the "road closed" signs were written by the smart AI, and the driving is done by the math-perfect GPS, the robot is guaranteed to be safe.

Why is this a big deal?

The paper tested this in four tricky scenarios:

The Security Camera: The robot had to avoid a camera's "field of view" (an invisible cone of sight).
The Hidden Hole: The robot had to avoid a pit trap covered by a rug (something sensors might miss).
The Sleeping Cat: The robot had to avoid the kitchen only if a cat was there (a conditional rule).
The Hot Fireplace: The robot had to calculate heat radiation and stay at a safe distance.

The Results:

Standard AI Planners: Failed miserably. They walked into the fire, ignored the cat, or crashed into walls because they were "hallucinating" (making things up).
Vision Models (Looking at pictures): Also failed. They couldn't understand the logic of "if the cat is here, don't go there."
STPR: 100% Success. It followed every rule perfectly. It even figured out that if a path was impossible (like the cat blocking the only door), it would simply say, "I can't go there," instead of trying to crash through the wall.

The Best Part: It's Cheap and Fast

Usually, to get an AI to be this smart, you need a massive, expensive supercomputer. But the authors found that smaller, cheaper AI models (specifically those trained on coding) work just as well for this task.

Analogy: You don't need a Nobel Prize-winning physicist to write a simple "Do Not Enter" sign. You just need a competent clerk who knows how to write the sign correctly. STPR uses the "clerk" (the coding AI) to write the rules, and the "math genius" (the search algorithm) to follow them.

In summary: STPR is a way to let robots understand human instructions like "Don't touch the hot stove" by turning those words into strict math rules, and then letting a reliable, old-school calculator do the driving. It combines the best of both worlds: the creativity of human language and the safety of hard math.

1. Problem Statement

Robotic navigation in real-world environments requires adhering to complex, non-standardized constraints provided by human operators (e.g., "avoid the hot fireplace," "do not enter the kitchen if a pet is present"). These constraints are often:

Informal and Vague: Expressed in natural language rather than formal logic.
Context-Dependent: Relying on semantic understanding (e.g., heat dissipation, social norms) rather than just sensor data.
Difficult to Translate: Converting these instructions into mathematical constraints for path planners is challenging.

Limitations of Existing Approaches:

Pure LLM Planning: Directly asking an LLM to generate a path leads to hallucinations, lack of physical consistency, and partial compliance with constraints.
Traditional Planning: Algorithms like A* or RRT* are mathematically sound but cannot interpret natural language constraints.
Vision-Language Models (VLMs): While they can interpret images, they often fail to generate valid, constraint-compliant paths and suffer from hallucinations regarding static objects or conditional logic.

2. Methodology: STPR Framework

The authors propose STPR (Safe Trajectory Planning with Restrictions), a neuro-symbolic framework that bridges natural language and classical path planning. Instead of using LLMs to generate the plan, STPR uses them to generate constraints in the form of executable code.

Core Workflow

Constraint Generation (LLM):
- The system prompts an LLM with a structured template containing:
  - System Instruction: Defines the robot's role.
  - Environment Block: Symbolic representation of the scene (e.g., object coordinates).
  - Constraint Block: Natural language instructions (e.g., "Avoid heat zone").
  - Python Signature: A strict function signature (def is_in_constraints(x, y, z): ...) to force the LLM to output valid Python code rather than text.
- The LLM generates a Python boolean function $f(x, y, z)$ that returns True if a point violates the constraint and False otherwise. This leverages the LLM's pre-training on code to ensure syntactic correctness and logical precision.
Point Cloud Sampling (Rejection Sampling):
- The generated Python function is used to create a "virtual obstacle" layer in the 3D environment.
- Sampling: The system samples points within the environment.
- Rejection: Points are tested against the LLM-generated function. If the function returns True (forbidden), the point is rejected.
- Optimization: To avoid inefficient uniform sampling, the system queries the LLM for an over-approximating bounding box for the constraint region, samples within that box, and then applies the rejection test.
- The resulting valid points form a constrained point cloud stored in a kd-tree for fast nearest-neighbor queries.
Constrained Path Planning (Classical Algorithm):
- A traditional search algorithm (e.g., A* or RRT*) operates on the refined state space (the point cloud).
- During the search, every candidate state is checked against the nearest neighbor in the kd-tree. If the distance is within the robot's radius, the state is pruned as a collision.
- This ensures the final path is mathematically guaranteed to be collision-free and constraint-compliant.

3. Key Contributions

Neuro-Symbolic Architecture: STPR decouples the problem: LLMs handle the translation of natural language to code, while classical algorithms handle the decision-making (planning). This avoids LLM hallucinations in path generation.
Executable Constraint Generation: By forcing the LLM to output Python functions rather than plans, the system leverages the LLM's coding capabilities to create precise, interpretable, and auditable mathematical constraints.
Compatibility with Compact Models: The framework works effectively with smaller, cost-efficient code-specialized LLMs (e.g., Granite-34B-Code, Llama-3.1-70B), reducing the need for massive, expensive reasoning models.
Integration with Existing Pipelines: The use of point clouds allows STPR to augment existing visual SLAM pipelines (e.g., ROS SLAM Toolbox) without requiring sensor hardware upgrades.

4. Experimental Results

The authors evaluated STPR in a simulated Gazebo environment (ROS) across four scenarios:

S1 (Security Camera): Avoiding a camera's Field of View (3D projection constraints).
S2 (Hidden Hole): Avoiding a pit trap not visible to standard sensors.
S3 (Conditional Logic): Avoiding the kitchen if an animal is present.
S4 (Physics): Maintaining a safe distance from a fireplace based on heat dissipation models.

Performance Metrics:

Success Rate: STPR achieved 100% success in finding valid paths or correctly proving no path exists (Soundness and Completeness). In contrast, vanilla A*/RRT* had 0% success (ignoring constraints), and naive VLM planners (GPT-4o) had 0–10% success due to hallucinations.
Path Quality: STPR paths were optimal (A*) or asymptotically optimal (RRT*). VLM paths were often invalid, significantly longer, or non-existent.
Runtime: STPR end-to-end latency ranged from 12 to 18 seconds.
- Prompting took the longest (especially for complex physics in S4).
- Planning was fast (seconds).
- Smaller models (Granite-34B) were significantly faster than large reasoning models (GPT o1-pro) with comparable success rates.
Model Robustness: The system worked reliably with various code LLMs. It failed only with very small models (Llama-3.1-1B) that lacked reasoning capabilities.

5. Significance

Safety and Reliability: STPR provides a theoretical guarantee of constraint compliance, a critical requirement for deploying robots in human environments.
Cost-Effectiveness: It demonstrates that expensive, large reasoning models are not necessary for robotic constraint handling; smaller, specialized code models suffice.
Interpretability: Unlike "black box" end-to-end LLM planners, STPR's constraints are explicit Python functions that human experts can audit, modify, or debug without retraining the model.
Scalability: The framework is adaptable to dynamic, stochastic, or continuous-time settings and can be integrated with various pathfinding algorithms beyond A* and RRT*.

In conclusion, STPR represents a practical shift from "LLMs as planners" to "LLMs as constraint translators," enabling embodied agents to safely navigate complex, instruction-driven environments.

"Don't Do That!": Guiding Embodied Systems through Large Language Model-based Constraint Generation

1. The Translator (The LLM's New Job)

2. The Safety Inspector (The Point Cloud)

3. The Driver (The Classic Algorithm)

Why is this a big deal?

The Best Part: It's Cheap and Fast

1. Problem Statement

2. Methodology: STPR Framework

Core Workflow

3. Key Contributions

4. Experimental Results

5. Significance

More like this

Interpretable Tau-PET Synthesis from Multimodal T1-Weighted and FLAIR MRI Using Partial Information Decomposition Guided Disentangled Quantized Half-UNet

SUPERGLASSES: Benchmarking Vision Language Models as Intelligent Agents for AI Smart Glasses

MultiModalPFN: Extending Prior-Data Fitted Networks for Multimodal Tabular Learning

OpenGLT: A Comprehensive Benchmark of Graph Neural Networks for Graph-Level Tasks

A Survey on 3D Gaussian Splatting