PEPA: a Persistently Autonomous Embodied Agent with Personalities

Imagine you have a robot dog. Most robots today are like highly obedient but mindless interns. You give them a specific task—"Go to the kitchen and fetch the remote"—and they do it. But the moment you stop giving orders, they freeze. They don't know what to do next, they don't care if they run out of battery, and they certainly don't have a "personality." They are just waiting for the next script.

This paper introduces PEPA, a robot that is more like a living pet with a distinct personality. Instead of waiting for instructions, PEPA has an internal "soul" (its personality) that drives it to make its own decisions, learn from its mistakes, and keep going for a long time without needing a human to hold its hand.

Here is how it works, broken down into simple concepts and analogies:

1. The Core Idea: Giving the Robot a "Personality"

Think of personality not just as "being funny" or "being grumpy," but as a set of internal compasses.

A Lazy robot dog might think, "I'm tired, I'll just nap here."
A Curious robot dog might think, "I wonder what's behind that door? I'll go check it out."
A Cautious robot dog might think, "That floor looks slippery; I'll walk slowly."

In the past, engineers had to hard-code every single rule for these behaviors. PEPA changes this by treating personality as the operating system. The robot doesn't just follow a list of rules; it asks itself, "What would I do in this situation based on who I am?" This allows the robot to generate its own goals (like "I should explore the hallway") rather than waiting for a human to say, "Go explore."

2. The Three-Layer Brain (The "Cognitive Architecture")

The researchers built PEPA with three distinct layers of thinking, working together like a human brain:

Sys3 (The Dreamer & Planner): This is the "CEO" of the robot. It looks at the robot's memories and its personality.
- Analogy: Imagine you wake up and think, "I'm feeling energetic today (Personality), and I remember I didn't finish my walk yesterday (Memory). So, today's goal is to explore the garden." Sys3 does this automatically every day, setting goals based on who the robot is.
Sys2 (The Strategist): This is the "General" who figures out how to achieve the goal.
- Analogy: If the goal is "Explore the garden," Sys2 plans the route: "Okay, I need to go through the living room, avoid the cat, and press the elevator button." It uses advanced AI to make sure the plan is safe and logical.
Sys1 (The Body & Senses): This is the "Muscle" and "Eyes."
- Analogy: This is the robot actually walking, seeing the stairs, feeling the battery level, and pressing the buttons. Crucially, it also acts as a diary. Every time the robot does something, it writes it down in a memory log.

3. The Magic Loop: Learning from Experience

The real breakthrough is how these three layers talk to each other.

The robot acts (Sys1).
It writes down what happened in its diary (Memory).
At the end of the day, the "CEO" (Sys3) reads the diary.
- Scenario: If the "Curious" robot tried to jump off a high ledge and almost fell, Sys3 reads the diary and says, "Okay, being curious is great, but almost falling was bad. Tomorrow, I'll adjust my goals to be curious but stay on the ground."
The robot updates its internal rules for the next day.

This is Self-Evolution. The robot gets smarter and safer over time, not because a programmer rewrote its code, but because it reflected on its own life.

4. The Real-World Test: The Robot Dog in the Office

The team didn't just test this in a computer simulation; they put a real quadruped robot (a robot dog) in a multi-story office building.

The Challenge: The robot had to navigate stairs, call elevators, and move between floors without humans telling it exactly where to go.
The Result: They tested five different "personalities" (Lazy, Playful, Cautious, Working, Curious).
- The Lazy one stayed near the charging station and rested often.
- The Playful one ran around exploring but learned to stop before its battery died.
- The Cautious one moved slowly and checked everything twice.

The most impressive part? On the first day, some robots ran out of battery and "died" (stopped working). But by Day 3, after reflecting on their mistakes, all of them survived the full 24 hours, having learned to balance their personality-driven desires with the need to stay alive.

Why This Matters

Currently, robots are like actors reading a script. If the script ends, the show stops.
PEPA turns robots into improvisational actors. They have a character, they remember their past scenes, and they can write their own next lines. This is a massive step toward robots that can live with us, work with us, and adapt to our messy, unpredictable world without needing a human to constantly press a "start" button.

In short: PEPA gives robots a personality so they can decide what to do, learn from their mistakes, and keep going on their own—just like a living creature.

Here is a detailed technical summary of the paper "PEPA: a Persistently Autonomous Embodied Agent with Personalities."

1. Problem Statement

Current embodied agents (robots) largely rely on externally scripted objectives and fixed reward templates. While they excel at specific, predefined tasks, they lack persistent autonomy—the ability to operate self-sustainingly over extended periods in dynamic, unstructured environments without continuous human intervention.

The core challenges identified are:

Goal Generation: Without external task specifications, what determines an agent's goals?
Behavioral Coherence: How can an agent maintain consistent behavior over long horizons while adapting to new experiences?
Physical Constraints: Agents must manage energy, avoid damage, and ensure safety in real-world conditions, which purely software-based or goal-driven agents often ignore.
Organizational Gap: Existing approaches (e.g., lifelong learning, social robotics) treat personality as a static design parameter or fail to address the intrinsic organizational principles required for self-directed evolution.

2. Methodology: The PEPA Framework

The authors propose PEPA (Persistently Autonomous Embodied Agent), a framework that uses personality traits as an intrinsic organizational principle to drive autonomous goal generation and behavioral evolution.

A. Theoretical Foundation

POMDP with Composite Rewards: The agent's decision-making is modeled as a Partially Observable Markov Decision Process (POMDP) with a composite reward function:
$R_{total} = R_{intrinsic} + R_{extrinsic}$
- $R_{extrinsic}$ : Standard environmental feedback (e.g., reaching a destination).
- $R_{intrinsic}$ : Dynamically generated by the system based on personality ( $P$ ), accumulated memories ( $M$ ), and capability ( $C$ ). This ensures that identical states yield different rewards depending on the agent's character (e.g., an "Energetic" agent rewards exploration; a "Lazy" agent penalizes unnecessary movement).
Open-Ended Evolution (OEE): The system is designed to satisfy OEE criteria, where the agent generates an unbounded sequence of distinct goals and non-repeating behavioral trajectories, preventing stagnation.

B. Three-Layer Cognitive Architecture

PEPA operates through three interacting systems forming a closed loop:

Sys3 (Personality and Goal Generation):
- Role: The "brain" that synthesizes personality traits, self-modeling (battery, health), and episodic memories.
- Mechanism: Uses a Large Language Model (LLM) to generate hierarchical goals (Ultimate goals for long-term purpose; Daily goals for immediate sub-objectives).
- Reflection: At the end of each day, Sys3 retrieves memories, reflects on outcomes, and updates daily goals and intrinsic reward functions. This allows the agent to learn from experience without retraining.
- Personality Model: Based on the Big Five framework (Openness, Conscientiousness, Extraversion, Agreeableness, Neuroticism), parameterized via natural language descriptions.
Sys2 (Decision and Reasoning):
- Role: The planning core that selects actions to maximize total expected utility.
- Mechanism:
  - Training: Uses LLM-based Monte Carlo Tree Search (MCTS) to generate high-quality state-action pairs.
  - Deployment: A lightweight, distilled dual-head BERT model (Intent Classification + Slot Filling) replaces the LLM for real-time latency requirements. It balances exploitation and exploration under partial observability.
Sys1 (Perception, Execution, and Memory Recording):
- Role: The embodiment interface grounding decisions in physical reality.
- Function: Aggregates multimodal sensors (LiDAR, RGB-D cameras, proprioception) to construct world models. Executes locomotion, manipulation, and expression commands.
- Memory: Records structured episodic memories (action, pre/post states, outcome, resource consumption) which feed back to Sys3 for reflection.

3. Key Contributions

First Persistent Autonomous Embodied Agent with Personalities: Demonstrates a robot capable of persistent self-evolution under real-world physical constraints, governed by intrinsic personality-conditioned objectives rather than external scripts.
Novel Closed-Loop Self-Evolution Mechanism: A concrete implementation where embodied experiences are accumulated as episodic memory, reflected upon under personality conditioning to update goals/rewards, and optimized through planning.
Real-World Validation: Successful deployment on a quadruped robot platform in a multi-floor office environment, including complex tasks like elevator interaction and staircase navigation.
Open-Source Release: Public release of the codebase, model details, and specific mobility modules (elevator and staircase navigation).

4. Experimental Results

The framework was validated on a Unitree Go2-W wheeled quadruped robot equipped with a 6-DOF arm, LiDAR, and RGB-D cameras.

Hardware & Navigation:
- Elevator Interaction: Implemented a Finite State Machine (FSM) for calling, entering, and selecting floors using visual servoing.
- Staircase Navigation: Introduced a Height-Aligned Costmap to solve the failure of fixed-height slicing in multi-floor environments. This achieved 100% success (10/10 trials) in ascending and descending stairs, compared to 0% for the baseline.
Personality-Driven Behavior:
- Prototypes: Five distinct personalities were tested (Lazy, Playful, Cautious, Working, Curious).
- Behavioral Divergence: Under identical state inputs, different personalities produced statistically distinct action distributions. For example, "Lazy" agents prioritized resting (49.8% on Day 3), while "Playful" agents maintained high exploration but adapted to safety constraints.
- Self-Evolution (Survival):
  - Day 1: All agents failed due to battery depletion.
  - Day 2: Only the "Cautious" agent survived.
  - Day 3: After memory-driven reflection and reward updates, all five personalities survived 24-hour simulations with 72%–100% battery remaining. This proves the system can co-optimize personality alignment with self-preservation.

5. Significance

Paradigm Shift: Moves robotics from "scripted execution" to "intrinsic organization," where personality acts as a stable bias (similar to genotypic biases in biology) that structures long-term behavior.
Scalability: The architecture separates high-level reasoning (LLM) from low-level execution (distilled models), making it feasible for real-time deployment on resource-constrained hardware.
Robustness: The closed-loop reflection mechanism allows agents to adapt to physical failures (e.g., battery drain) and environmental changes autonomously, a critical step toward truly long-lived robotic companions and explorers.
Future Impact: This work provides a blueprint for agents that can evolve, learn, and maintain coherent identities in open-ended, unstructured environments without constant human oversight.

PEPA: a Persistently Autonomous Embodied Agent with Personalities

1. The Core Idea: Giving the Robot a "Personality"

2. The Three-Layer Brain (The "Cognitive Architecture")

3. The Magic Loop: Learning from Experience

4. The Real-World Test: The Robot Dog in the Office

Why This Matters

1. Problem Statement

2. Methodology: The PEPA Framework

A. Theoretical Foundation

B. Three-Layer Cognitive Architecture

3. Key Contributions

4. Experimental Results

5. Significance

More like this

The Structure of Service Level Agreement of Slice-based 5G Network

Digital currency hardware wallets and the essence of money

Adaptive aggregation of Monte Carlo augmented decomposed filters for efficient group-equivariant convolutional neural network

Positionality in Σ_0^2 and a completeness result

Slightly Non-Linear Higher-Order Tree Transducers