EquiBim: Learning Symmetry-Equivariant Policy for Bimanual Manipulation

Imagine you are teaching a pair of identical twins how to bake a cake together. You show them a video of the process: the left twin holds the bowl while the right twin stirs.

Now, imagine you flip the video horizontally. Suddenly, the right twin is holding the bowl and the left twin is stirring. Because they are identical twins in a symmetrical kitchen, this flipped video shows a perfectly valid way to bake the cake, too.

The Problem:
Most current robot learning systems are like students who memorize the video frame-by-frame without understanding the logic. If you show them the "flipped" video (where the right twin holds the bowl), the robot might get confused. It might try to have the left arm hold the bowl and the right arm hold the bowl at the same time, or it might freeze because it doesn't recognize the new setup. It lacks the "common sense" that the two arms are interchangeable and the task is symmetrical.

The Solution: EquiBim
The paper introduces EquiBim, a new training method that teaches robots this "common sense" explicitly. Think of EquiBim as a strict but helpful coach who says to the robot: "Hey, if you see a task where the left arm does X and the right arm does Y, and then you see a mirrored version where the arms swap places, your brain must tell the arms to swap their actions too. If the scene flips, your plan must flip with it."

How It Works (The Analogy)

The Mirror Test:
Imagine the robot is looking at a scene through a mirror. EquiBim takes the robot's view, creates a perfect mirror image of it, and asks the robot: "Okay, if the world looks like this mirror image, what should the arms do?"
- The Old Way: The robot guesses based on what it saw before, often getting it wrong because it didn't expect the mirror.
- The EquiBim Way: The robot is forced to learn that if the world flips left-to-right, the action plan must also flip left-to-right. It's like a dance instructor telling a partner, "If I step left, you must step right. If I turn this way, you turn that way."
No New Hardware, Just New Rules:
Usually, to make a robot smarter, you have to build a new, more complex brain (a new neural network architecture). EquiBim is different. It's like adding a rulebook to an existing brain. You don't need to rebuild the robot's head; you just add a constraint during training that says, "Your answers must be consistent with the mirror image." This makes it easy to plug into any existing robot learning system.
Why It Matters:
- Robustness: In the real world, things aren't always perfect. Objects might be placed slightly differently, or the lighting might change. Because EquiBim teaches the robot to understand the symmetry of the task, it doesn't panic when the situation changes slightly. It knows, "Oh, the object moved to the left, so I just need to swap my arm roles."
- Learning Faster: By understanding that the left and right sides are interchangeable, the robot effectively doubles its learning data. Every time it learns a move with the left arm, it automatically learns the mirrored move for the right arm.

The Results

The researchers tested this on a dual-armed robot (two arms working together) in both computer simulations and the real world.

In the simulation: The robots trained with EquiBim were much better at tasks like stacking blocks, passing objects, or pressing staplers. They were less likely to crash or get confused.
In the real world: When they tried tasks like handing a banana from one arm to the other, or hanging a toy chicken on a hook, the EquiBim-trained robots were much more reliable, especially when the objects were placed in new or unexpected positions.

The Bottom Line

EquiBim is a simple but powerful trick. It tells robots: "You have two identical arms, and the world is often symmetrical. Don't just memorize specific moves; learn the rule that if the world flips, your plan should flip too." This makes robots smarter, more adaptable, and much better at working together with two hands.

1. Problem Statement

Robotic imitation learning has achieved significant success in complex manipulation tasks. However, existing methods often fail to explicitly account for the physical symmetries inherent in robotic systems, particularly in bimanual (dual-arm) manipulation.

The Issue: Many dual-arm tasks possess bilateral symmetry (e.g., left-right exchange of arms and objects). Standard policies trained via imitation learning often produce asymmetric or inconsistent behaviors when presented with symmetric observations, even though the underlying physics and task structure are symmetric.
The Consequence: This lack of symmetry awareness degrades coordination quality, reduces robustness under distribution shifts (e.g., mirrored object placements), and limits generalization, especially when demonstration data is limited or imbalanced.
Limitations of Prior Work: Previous attempts to incorporate symmetry (e.g., symmetry-aware architectures or data augmentation) are often tightly coupled to specific network designs, observation modalities (e.g., images only), or action representations, making them difficult to generalize across different imitation learning pipelines.

2. Methodology: EquiBim Framework

The authors propose EquiBim, a model-agnostic training framework that enforces bilateral equivariance between observations and actions.

Core Concept

EquiBim treats physical symmetry as a group action ( $S$ ) that operates jointly on the observation space ( $O$ ) and the action space ( $A$ ). The goal is to ensure that if an observation is transformed by $S$ , the policy's predicted action should transform by the same $S$ .

Equivariance Constraint: $\pi(S(O)) \approx S(\pi(O))$ $π (S (O)) \approx S (π (O))$
- $O$ : Original observation (e.g., image, point cloud, joint states).
- $S(O)$ : Symmetrically transformed observation (e.g., left-right flipped image).
- $\pi(O)$ : Action predicted by the policy.
- $S(\pi(O))$ : The action transformed by the symmetry operation.

Technical Implementation

Symmetry-Consistent Regularization:
Instead of modifying the neural network architecture, EquiBim introduces a regularization loss term ( $L_{sym}$ ) during training:
$L_{sym} = \|\pi(S(O)) - S(\pi(O))\|^2$
This loss penalizes the policy if the action predicted from a transformed input does not match the transformed action of the original input.
Modality Agnosticism:
The framework is designed to work with diverse inputs and outputs without architectural changes:
- Visual Observations:
  - Images: $S$ is a horizontal flip.
  - Point Clouds: Points are transformed into the image coordinate frame, reflected across the sagittal plane, and transformed back.
- Action Representations:
  - End-Effector Space: Position and orientation are reflected across the sagittal plane.
  - Joint Space: A diagonal matrix $D$ is applied to flip the signs of specific joints based on the robot's kinematic structure (URDF) to reflect the left-right swap.
Plug-and-Play Integration:
EquiBim acts as a training module that can be added to existing imitation learning pipelines (e.g., Diffusion Policy, ACT) without altering the policy backbone, optimization procedure, or inference process.

3. Key Contributions

Explicit Symmetry Enforcement: A novel framework that explicitly enforces bilateral equivariance as an inductive bias, rather than relying on the data distribution to implicitly learn symmetry.
Model and Modality Agnostic: Unlike prior work, EquiBim is not tied to specific network architectures. It supports image and point cloud observations, as well as both end-effector and joint-space action parameterizations.
Unified Symmetry Definition: The paper provides a rigorous mathematical formulation for defining symmetry transformations ( $S$ ) consistently across heterogeneous observation and action spaces.
Comprehensive Evaluation: Validation across simulation (RoboTwin) and real-world hardware (LeRobot SO101) with diverse task configurations.

4. Experimental Results

Simulation Experiments (RoboTwin)

Evaluated on 8 bimanual tasks using Diffusion Policy (Image/Point Cloud) and DP3.

Performance Gains: EquiBim consistently improved success rates across all configurations.
- Image + Joint: Largest gain (+9.5%, from 34.1% to 43.6%). This suggests the regularization is most beneficial when geometric priors are weak (images lack explicit 3D structure, joint actions lack direct spatial mapping).
- Point Cloud + Joint/EE: Moderate gains (+3.3% to +4.4%).
Task Sensitivity:
- High Improvement: Tasks with clear symmetric structures (e.g., Beat Block Hammer, Move Can Pot) saw significant gains.
- Mixed Results: Tasks with inherent functional asymmetry (e.g., Handover Block where one arm stabilizes and the other moves) showed slight performance drops in simulation, as the regularization suppressed necessary asymmetric adaptations. However, real-world tests showed this was manageable.

Real-World Experiments (LeRobot SO101)

Evaluated on a dual-arm system with a head-mounted camera on three tasks: Banana Handover, Drumstick Hook Hanging, and Toy Chicken Hook Hanging.

Training Distribution: EquiBim improved success rates (e.g., Banana Handover improved from 30% to 60%).
Distribution Shift (Generalization):
- Mirrored Inputs: When object orientations were mirrored relative to training, the baseline (ACT) failed completely (0/10), while EquiBim achieved 50% success.
- Imbalanced Data: In the Drumstick task, demonstrations were high-quality on the left but poor on the right. EquiBim leveraged the strong left-side data to regularize the right-side behavior, achieving 40% success vs. 10% for the baseline.
Conclusion: The method significantly enhances robustness and generalization, particularly when facing symmetric distribution shifts or uneven data coverage.

5. Significance

Inductive Bias for Robotics: The paper demonstrates that explicitly encoding physical symmetry is a simple yet powerful inductive bias for bimanual learning. It compensates for missing geometric information in visual inputs and data scarcity.
Practical Applicability: By being architecture-agnostic, EquiBim lowers the barrier for integrating symmetry constraints into state-of-the-art imitation learning systems.
Robustness: The results highlight that symmetry-aware policies are more robust to environmental changes (distribution shifts) and can effectively transfer knowledge between symmetric task configurations, a critical requirement for real-world deployment.

In summary, EquiBim provides a general, effective solution to the problem of asymmetric policy behavior in dual-arm robots, proving that enforcing physical symmetry constraints leads to more robust and generalizable manipulation skills.

EquiBim: Learning Symmetry-Equivariant Policy for Bimanual Manipulation

How It Works (The Analogy)

The Results

The Bottom Line

1. Problem Statement

2. Methodology: EquiBim Framework

Core Concept

Technical Implementation

3. Key Contributions

4. Experimental Results

Simulation Experiments (RoboTwin)

Real-World Experiments (LeRobot SO101)

5. Significance

More like this

Model2Kernel: Model-Aware Symbolic Execution For Safe CUDA Kernels

Algorithmic Barriers to Detecting and Repairing Structural Overspecification in Adaptive Data-Structure Selection

Zero-Cost NDV Estimation from Columnar File Metadata

Persistence-based topological optimization: a survey

Multi-LLM Query Optimization