FAME: Force-Adaptive RL for Expanding the Manipulation Envelope of a Full-Scale Humanoid

Imagine a human-sized robot trying to stand perfectly still while holding a heavy box, pushing a door, or pulling a rope. Now, imagine that every time it moves its arms or the weight shifts, the robot's legs get confused and it starts to wobble and fall.

This is the problem the paper FAME (Force-Adaptive RL for Expanding the Manipulation Envelope) tries to solve.

Here is the simple explanation, using some everyday analogies:

The Problem: The "Confused Dancer"

Think of a humanoid robot like a dancer trying to balance on one foot.

The Arms: The robot's arms are like the dancer's arms. If the dancer reaches out to grab a heavy object, their center of gravity shifts.
The Legs: The legs are the dancer's feet. They need to adjust instantly to keep the dancer from falling.
The Issue: In the past, robot "brains" were like dancers who only practiced standing still. If you suddenly handed them a heavy box or pushed them, they didn't know how to react. They would just fall over because they didn't understand how the weight in their hands was changing their balance.

The Solution: FAME (The "Super-Sense" Brain)

The researchers created a new AI system called FAME. Think of FAME as giving the robot a "sixth sense" or a super-sense that connects its hands directly to its feet.

Here is how it works, broken down into three simple steps:

1. The "Context Encoder" (The Translator)

Usually, a robot's brain looks at its hands and its legs as two separate things.

Old Way: "My hands are holding a box. My legs are standing. Okay, I'm fine." (Then the box gets heavy, and crash).
FAME Way: FAME uses a special "translator" (called a Latent Context Encoder). It looks at two things at once:
1. Where the arms are positioned (e.g., stretched out high).
2. How hard the hands are pushing or pulling (the force).
It combines these into a single "secret code" (a latent context) and whispers it to the legs. It's like a coach shouting to the legs: "Hey! The arms are stretched out and pulling hard to the left! You need to lean slightly to the right to compensate!"

2. The "Training Gym" (The Simulation)

You can't teach a robot to balance by just letting it fall over a thousand times in the real world (it would break). So, they trained it in a video game simulation.

The Curriculum: They didn't just throw random weights at the robot. They used a "curriculum," which is like a video game leveling up.
- Level 1: The robot stands with arms at its side.
- Level 2: The robot holds a light weight.
- Level 3: The robot stretches its arms out weirdly while someone pushes it from different angles.
The Result: By the end of training, the robot has seen almost every possible way its arms could be positioned and every way it could be pushed. It has built a massive library of "what to do" for every situation.

3. The "Magic Trick" (No Sensors Needed)

Here is the coolest part. Usually, to know how hard you are pushing, you need special, expensive sensors in your wrists (like a digital scale in your hand).

FAME's Trick: The robot doesn't have these expensive sensors. Instead, it calculates the force itself by looking at its muscles (the electric motors in its joints).
The Analogy: Imagine you are lifting a heavy box. You don't need a scale to know it's heavy; you can feel your muscles straining. FAME does the same thing. It looks at how much "effort" (torque) its motors are using and mathematically figures out, "Ah, my arm is straining this much, which means I must be pulling a 30 Newton load." It then uses that guess to adjust its balance instantly.

The Results: Standing Strong

The team tested this on a real robot called the Unitree H12 (a full-sized, human-like robot).

Without FAME: When the robot tried to hold a load or pull something, it would wobble and fall over about 70% of the time. It was like a toddler trying to carry a big backpack.
With FAME: The robot stood steady and balanced 74% of the time, even when the load was heavy or the arms were in awkward positions.

The Big Picture

FAME is like teaching a robot to be a tightrope walker.

Before, the robot could only walk the tightrope if the wind was calm and it held nothing.
With FAME, the robot can walk the tightrope while juggling, holding a heavy umbrella, and being pushed by the wind. It knows exactly how to shift its weight to stay upright because it understands the connection between what its hands are doing and what its feet need to do.

This is a huge step forward because it means robots can finally do useful jobs in our world—like moving furniture, helping construction, or carrying groceries—without falling over every time they pick something up.

Here is a detailed technical summary of the paper "FAME: Force-Adaptive RL for Expanding the Manipulation Envelope of a Full-Scale Humanoid."

1. Problem Statement

Humanoid robots operating in human-centric environments must perform coordinated bimanual manipulation (using both arms) while maintaining stable standing. A critical challenge arises because external forces applied at the hands propagate through the kinematic chain, directly disturbing the robot's lower-body balance.

The Core Issue: Traditional balance controllers struggle when interaction forces vary in magnitude and direction, especially when combined with diverse upper-body arm configurations.
The Manipulation Envelope: Defined as the region of admissible external hand forces and arm configurations under which the robot can maintain stable standing. Current methods often fail to expand this envelope, limiting the robot's ability to handle heavy or asymmetric loads without falling.
Limitations of Existing Approaches:
- Model-based methods (e.g., MPC, LIPM) struggle with highly dynamic, uncertain disturbances.
- Standard Deep RL (DRL) policies often lack the ability to explicitly adapt to the coupling between upper-body pose and interaction forces, leading to conservative or unstable behaviors when forces are unpredictable.

2. Methodology: The FAME Framework

The authors propose FAME (Force-Adaptive Manipulation Envelope), a reinforcement learning framework that conditions a standing policy on a learned latent context representing upper-body states and forces.

A. System Architecture

The framework consists of two primary components (illustrated in Fig. 2):

Upper-Body Context Encoder ( $\mu$ ): A Multi-Layer Perceptron (MLP) that processes:
- Upper-body joint configurations ( $q_{ub} \in \mathbb{R}^{15}$ , including torso and arms).
- Bimanual interaction forces ( $F_L, F_R \in \mathbb{R}^3$ for left and right wrists).
- Output: A latent context vector $z_t \in \mathbb{R}^8$ that encodes the disturbance induced by the upper body.
Base Standing Policy: A PPO-trained policy that takes the latent context $z_t$ as an additional input to condition its lower-body control strategy.

B. Training Strategy

Curriculum Learning (Pose): To handle diverse arm configurations, the authors employ an upper-body pose curriculum. A scalar ratio $\rho_a$ gradually expands the range of randomized target arm poses from a nominal posture to the full feasible joint range as training progresses.
Force Injection: During training, diverse 3D forces are sampled uniformly over a sphere and applied to the hands. This creates an isotropic distribution of disturbances, forcing the policy to learn robustness against forces from any direction.
Baselines for Comparison:
- Base: Fixed posture, no encoder, no curriculum.
- Base+Curr: Pose curriculum only, no encoder (relies on implicit inference).
- FAME: Full framework (Curriculum + Encoder).

C. Sensor-Free Deployment (Key Innovation)

A major contribution is the ability to deploy the policy without wrist force/torque sensors.

Force Estimation: Instead of measuring forces directly, the system estimates interaction forces ( $F_{ext}$ ) online using rigid-body dynamics and Jacobian mappings.
Formula: $F_{ext} = -(J^\top)^\dagger (\tau - \tau_g)$ $F_{e x t} = - (J^{⊤})^{†} (τ - τ_{g})$
- $\tau$ : Measured joint torques.
- $\tau_g$ : Gravity compensation torques.
- $J$ : Wrist Jacobian.
- $(J^\top)^\dagger$ : Pseudo-inverse of the transposed Jacobian.
This estimated force is fed into the encoder, allowing the robot to adapt to load variations in real-time using only standard joint encoders and IMU data.

3. Key Contributions

Force-Adaptive Framework: Introduction of FAME, which uses latent context encoding to explicitly decouple and adapt to the coupling between upper-body joint states and applied wrist forces.
Sensor-Free Estimation: A novel deployment strategy that estimates wrist interaction forces from joint torques and dynamics, eliminating the need for expensive or fragile wrist force/torque sensors.
Expanded Manipulation Envelope: Demonstrated significant improvements in standing success rates across diverse and asymmetric arm configurations compared to baselines.
Real-World Validation: Successful deployment on a full-scale Unitree H12 humanoid, validating robustness in asymmetric (single-arm load) and symmetric (bimanual load) scenarios.

4. Experimental Results

Simulation Results

Experiments were conducted on the Unitree H12 simulation with five fixed arm configurations (ranging from forward-reaching to asymmetric poses) and randomized 3D hand forces.

Performance Metrics (Mean Success Rate over 10s):
- Base Policy: 29.44% (Fails significantly in asymmetric/extended poses).
- Base + Curriculum: 51.40% (Improves with pose diversity but struggles to disambiguate force direction).
- FAME: 73.84% (Significant improvement, particularly in challenging asymmetric cases like C5 where it reached 85.4%).
Analysis: The latent context allows the policy to explicitly understand the "disturbance vector," enabling the lower-body controller to counteract specific torque imbalances rather than relying on generic robustness.

Real-World Results (Unitree H12)

Scenarios:
- RE1: Asymmetric single-arm load (pulling a 30N load with one arm).
- RE2: Symmetric bimanual load (carrying a load with both arms).
Outcome:
- FAME: The robot maintained stable standing, with joint trajectories (hip/ankle) staying close to nominal configurations.
- Baseline (Base+Curr): The robot lost balance and fell. Joint positions drifted significantly, indicating a failure to compensate for the external moments.
Visual Evidence: Fig. 4 shows that without FAME, the robot's joints drift away from the stable configuration, whereas FAME keeps the robot upright despite the load.

5. Significance and Impact

Enabling Complex Manipulation: FAME bridges the gap between stable locomotion and complex bimanual manipulation. By expanding the "manipulation envelope," humanoid robots can now perform tasks involving heavy or off-center loads (e.g., construction, logistics) that were previously too unstable for current controllers.
Hardware Efficiency: The sensor-free force estimation approach is crucial for practical deployment. It reduces hardware complexity, cost, and failure points by removing the need for specialized wrist sensors, making the technology more scalable for real-world manufacturing and service applications.
Generalizability: The approach of using latent context adaptation for structured disturbances (force-configuration coupling) offers a new paradigm for robot learning, moving beyond simple environmental randomization to task-specific disturbance modeling.

In conclusion, FAME demonstrates that explicitly encoding the relationship between arm configuration and interaction forces allows full-scale humanoids to maintain balance under significant, uncertain loads, effectively expanding their operational capabilities in dynamic environments.