SafeGen-LLM: Enhancing Safety Generalization in Task Planning for Robotic Systems

Imagine you are hiring a brilliant, incredibly fast, but somewhat reckless robotic chef to cook a complex meal in a busy kitchen.

This chef (the AI) has read every cookbook in the world. It knows how to chop, fry, and bake. But because it's just "reading" recipes, it doesn't actually know the rules of the kitchen. It might try to put a raw chicken leg on a hot pan before washing it, or stack a tower of plates so high it topples over and breaks everything.

In the world of robotics, this is the problem with current AI planners. They are smart, but they aren't safe.

This paper introduces SafeGen-LLM, a new training method that turns that reckless chef into a Master Safety Inspector who can cook safely in any kitchen, even ones it has never visited before.

Here is how they did it, broken down into three simple steps:

1. The Problem: The "Smart but Dangerous" Chef

Old Planners (The Rigid Robots): These are like chefs who follow a single, strict recipe. If you change the ingredients slightly, they freeze. They are slow and can't handle complex, new situations.
Standard AI (The Reckless Genius): These are like the chefs who can improvise anything but don't understand safety. They might invent a delicious dish that explodes the stove. They are fast but dangerous.
The Goal: We need an AI that is fast and flexible and never breaks the safety rules, no matter what task it's given.

2. The Solution: The Two-Stage Training Camp

The authors created a special training camp for the AI called SafeGen-LLM. It happens in two phases:

Phase 1: The "Syntax School" (Supervised Fine-Tuning)

Imagine teaching the chef the grammar of cooking.

Before, the AI might say, "Put the egg on the fire."
Now, we show it thousands of examples of perfect plans where safety rules are followed.
The Result: The AI learns the strict language of planning (like PDDL) and understands that "You must wash the chicken before you cook it." It stops making silly formatting mistakes and learns the basic rules of the game.

Phase 2: The "Safety Obstacle Course" (GRPO with Reward Machines)

This is the magic part. Imagine the chef is now in a training arena with a strict referee (the Reward Machine).

The chef tries to solve a puzzle (like stacking blocks or driving a ferry).
The Referee doesn't just say "Good job" or "Bad job." It gives a detailed scorecard:
- Did you crash? (Safety Violation) -> Huge Penalty.
- Did you drop a block? (Precondition Violation) -> Medium Penalty.
- Did you finish the goal? -> Bonus Points.
The Twist: The referee uses a "curriculum." It starts with easy puzzles (stacking 2 blocks) and slowly makes them harder (stacking 20 blocks with complex rules).
The AI tries, fails, gets a specific score, learns from the mistake, and tries again. Over time, it learns that safety is more important than speed. It learns to avoid the "crash" penalty at all costs.

3. The Superpower: Generalization

The coolest part of this paper is Generalization.

Usually, if you train a chef to cook Italian food safely, they might fail at Japanese food. But SafeGen-LLM is different.

Because it learned the principles of safety (like "don't drop things," "don't overload the vehicle") rather than just memorizing specific recipes, it can walk into a completely new kitchen (a new domain) and immediately know how to be safe.
It can handle instructions written in code, JSON, or even plain English, and still produce a safe plan.

Real-World Proof

The researchers didn't just run this on a computer. They put it on a real robot arm.

The Test: Stack blocks without hitting them.
The Old Way: The robot would try to stack them, miss, and smash the blocks together.
The SafeGen Way: The robot paused, recalculated, and stacked them perfectly without a single collision.

The Big Takeaway

Think of SafeGen-LLM as a Safety Seatbelt for AI.
Before, we had powerful engines (AI models) that could go very fast but had no brakes. This paper teaches the engine how to wear a seatbelt, how to check the mirrors, and how to drive safely on roads it has never seen before.

It proves that by training AI with strict safety rules and smart feedback, we can create robots that are not just smart, but truly trustworthy enough to work alongside humans in factories, hospitals, and on the roads.

1. Problem Statement

Robotic task planning in safety-critical domains (e.g., autonomous driving, industrial automation) faces a triad of challenges:

Classical Planners: While formally verifiable, they suffer from poor scalability (exponential time growth with complexity) and rigid input/output requirements, making them difficult to adapt to dynamic safety constraints.
Reinforcement Learning (RL): RL-based planners often lack generalization (training on single tasks) and require prohibitively high data interaction costs.
Base Large Language Models (LLMs): While capable of flexible reasoning and handling diverse inputs (natural language, PDDL), untrained LLMs cannot guarantee safety. They frequently generate syntactically invalid plans, violate preconditions, or ignore safety constraints, leading to hazardous behaviors.

Core Research Question: How can we systematically align LLMs to generate task plans that are not only goal-reaching but also safety-compliant and capable of generalizing to novel safety properties across different domains and problem instances?

2. Methodology: SafeGen-LLM Framework

The authors propose SafeGen-LLM, a two-stage post-training framework designed to inject verifiable safety knowledge into LLMs. The framework consists of three main components:

A. Dataset Construction

Multi-Domain Benchmark: The authors constructed a unified benchmark using four robotics-inspired domains from PDDL2 generators: Blocksworld, Ferry, Grippers, and Spanner.
Safety Constraints: They encoded explicit, hard safety constraints (e.g., collision avoidance, load limits, operation ordering) using PDDL3 :constraints (temporal logic).
Validation Pipeline: Problems were generated, solved by a classical temporal planner (OPTIC), and rigorously verified using the VAL tool. Only plans satisfying both domain preconditions and safety constraints were retained.
Format: Data was converted into instruction-response pairs where the model is prompted to output strictly formatted action sequences without natural language explanations.

B. Stage I: Supervised Fine-Tuning (SFT)

Objective: To teach the model the syntax, semantics, and structure of planning domains.
Process: The pre-trained LLM is fine-tuned on the constructed dataset of valid, safety-compliant plans.
Outcome: This stage eliminates format errors and teaches the model to generate executable action sequences that adhere to domain-specific rules, providing a strong initialization for reinforcement learning.

C. Stage II: Group Relative Policy Optimization (GRPO)

Algorithm: The authors utilize GRPO, an online RL algorithm that optimizes policies by comparing groups of candidate responses rather than relying on a separate critic network (making it more lightweight than PPO).
Fine-Grained Reward Machine: Instead of a binary "success/fail" reward, a hierarchical reward system is derived from formal verification:
1. Format Error: Lowest reward.
2. Safety Violation: High penalty (prioritized over goal achievement).
3. Precondition Violation: Moderate penalty.
4. Goal Not Satisfied: Lower penalty.
5. Success: Highest reward.
- Progress-based interpolation: For intermediate failures, rewards are interpolated based on how far the plan progressed before failing (e.g., number of steps executed safely), providing dense gradient signals.
Curriculum Learning: Training difficulty is progressively increased (Easy $\to$ Medium $\to$ Hard) based on domain-specific complexity metrics (e.g., number of blocks/objects). This stabilizes training and helps the model handle complex constraint interactions.

3. Key Contributions

Unified Safety-Aware Benchmark: A multi-domain PDDL3 dataset with explicitly defined safety constraints, enabling systematic evaluation of safety compliance and generalization.
Systematic Post-Training Framework: A novel combination of SFT (for syntax/semantics) and GRPO with formal verification-derived rewards (for safety alignment). This approach ensures models learn to prioritize safety constraints over mere goal achievement.
Cross-Domain Safety Generalization: Demonstrated that the trained models can generalize safety reasoning to unseen problems within a domain and across different domains, outperforming frontier proprietary models despite having fewer parameters.
Integration with Assurance Frameworks: Showed that SafeGen-LLM can be seamlessly integrated with external verification frameworks (like SafePilot) to achieve near-perfect reliability with minimal retry overhead.

4. Experimental Results

The framework was evaluated on open-source models (Mistral-7B, Llama-8B, Qwen3-14B) and compared against classical planners (OPTIC, Fast Downward) and proprietary models (GPT-5 Nano).

Scalability: In complex Blocksworld and Grippers tasks, the fine-tuned LLM achieved 100% success with stable runtime (~102s), whereas classical planners failed significantly (e.g., OPTIC at ~45-64%, Fast Downward at ~14-27%) as complexity increased.
Cross-Problem Generalization:
- Pre-trained: 0% success, high format/precondition errors.
- SFT: Success rate jumped to ~66-70%; format errors eliminated.
- GRPO: Success rate reached 82-100%; safety violations dropped to 0-2%.
Cross-Domain Generalization: Models trained on all four domains simultaneously maintained high performance across all of them. Qwen3-14B achieved 88-100% success, and Llama-8B achieved 78-94%, significantly outperforming GPT-5 Nano (which struggled with safety constraints, achieving only 18-20% on some domains).
Input Format Robustness: Despite being trained on PDDL3, the models generalized well to Natural Language (84% avg success) and JSON (92.5% avg success) inputs with negligible format errors.
Real-World Validation:
- Simulation: The safety-aware planner successfully restructured action sequences to avoid collisions that classical solvers missed.
- Physical Robot: Deployed on an Elephant myCobot 280 arm. The safety-aware plan executed without collision, while a baseline unsafe plan caused a physical collision.

5. Significance

SafeGen-LLM addresses a critical gap in robotic autonomy: the trade-off between the flexibility of LLMs and the rigorous safety requirements of physical systems.

Safety by Construction: By using formal verification to guide RL rewards, the model learns to treat safety constraints as hard invariants rather than soft preferences.
Efficiency: It demonstrates that smaller, open-source models (7B-14B parameters), when properly aligned, can outperform massive proprietary models in safety-critical planning tasks.
Practical Deployment: The framework is compatible with existing agentic workflows and real-world hardware, offering a scalable path toward deploying safe, generalizable AI planners in dynamic environments.

In conclusion, SafeGen-LLM establishes a new paradigm for robotic planning where verifiable safety knowledge is systematically injected into LLMs, enabling them to generalize safely across diverse and complex tasks.