ABPolicy: Asynchronous B-Spline Flow Policy for Real-Time and Smooth Robotic Manipulation

Imagine you are teaching a robot arm to perform delicate tasks, like folding a towel, stacking blocks, or hanging a cup on a moving rack. The goal is for the robot to move smoothly, like a human dancer, rather than jerking around like a glitchy video game character.

The paper introduces a new system called ABPolicy to solve three major problems that usually plague robot controllers: jitter (shaky movements), stuttering (pausing to think), and clunky transitions (jerking when switching from one thought to the next).

Here is how ABPolicy works, explained through simple analogies:

1. The Problem: The "Stop-and-Go" Robot

Most robots today work like a student taking a test who has to raise their hand and wait for the teacher to grade the answer before they can write the next sentence.

The Issue: The robot sees a picture, stops moving to calculate the next move, waits for the computer to finish, and then moves. If the object it's holding is moving (like a cup on a rotating rack), the robot is often "too late" because it was paused.
The Result: The robot moves in a "stop-and-go" fashion, which is slow and causes the arm to shake or jerk.

2. The Solution: Drawing with "Magic Curves" (B-Splines)

Instead of telling the robot, "Move your hand to point A, then point B, then point C," ABPolicy changes the language. Instead of giving a list of specific points, it gives the robot a smooth curve to follow.

The Analogy: Imagine you are drawing a line.
- Old Way: You place a dot, then another dot, then another. If the dots aren't perfectly aligned, your line looks jagged and shaky.
- ABPolicy Way: You use a flexible ruler (a B-Spline). You only need to place a few "control points" (like pins holding the ruler down), and the ruler naturally creates a perfectly smooth, curved line between them.
The Benefit: By predicting these "pins" (control points) instead of individual dots, the robot is mathematically guaranteed to move smoothly. No more shaking!

3. The Secret Sauce: The "Two-Way Street" (Bidirectional Prediction)

To make the curve even better, the robot doesn't just look forward; it looks backward and forward at the same time.

The Analogy: Imagine driving a car. A bad driver only looks at the bumper in front of them. A good driver looks at where they just came from (to stay in the lane) and where they are going (to turn smoothly).
How it works: The robot predicts a chunk of the future and considers the recent past. This helps it understand the "flow" of the movement, ensuring the curve doesn't suddenly twist or break.

4. The Magic Trick: "Async" Thinking (Thinking While Moving)

This is the biggest game-changer. In the old way, the robot stops moving to think. In ABPolicy, the robot thinks while it moves.

The Analogy: Think of a professional chef cooking a complex meal.
- Old Way (Synchronous): The chef chops an onion, stops chopping, walks to the stove to check the sauce, walks back, chops another onion. It's inefficient and slow.
- ABPolicy (Asynchronous): The chef is chopping onions while the sauce simmers in the background. The "thinking" (chopping) and the "cooking" (simmering) happen at the same time.
The Result: The robot never stops. While it is executing the current movement, the computer is already calculating the next movement in the background. This makes the robot incredibly fast and responsive to changes (like a cup suddenly moving).

5. The "Seamless Stitch" (Continuity-Constrained Refitting)

Since the robot is thinking in the background, there is a tiny delay between when it sees something and when it acts. If it just switched to the new plan immediately, the arm might jump or jerk because the new plan didn't account for the split-second it spent thinking.

The Analogy: Imagine a train changing tracks. If the switch is thrown too abruptly, the train might derail or shake.
The Fix: ABPolicy uses a "refitting" trick. Before the robot starts the new plan, it gently adjusts the very first few "pins" of the new curve so they perfectly match where the robot actually is right now. It's like a tailor taking a new suit and quickly hemming the bottom so it fits the person perfectly without them having to stand still.

Summary: Why is this a big deal?

Smoother: The robot moves like a fluid stream of water, not a bouncing ball.
Faster: It never stops to "think," so it reacts instantly to moving objects.
Smarter: It handles complex, moving targets (like a rotating rack) much better than previous methods.

In short, ABPolicy teaches robots to draw with flexible rulers while running a marathon, ensuring they never trip, never stop, and always move with grace.

1. Problem Statement

Robotic manipulation in real-world environments requires control policies that are both temporally smooth (to ensure physically realistic motion) and highly responsive to dynamic changes. Existing imitation learning methods, particularly those using action chunking and diffusion/flow models, face three critical limitations when operating with synchronous inference in raw action spaces:

Intra-chunk Jitter: Predicting raw actions directly often results in high-frequency oscillations within a single action chunk.
Inter-chunk Discontinuity: Boundaries between consecutive action chunks often exhibit "jerks" or discontinuities, causing distribution shifts in subsequent observations.
Stop-and-Go Execution: Synchronous inference forces the robot to wait for the model to finish computing the next action chunk before executing, introducing latency that degrades responsiveness in dynamic environments.

2. Methodology: ABPolicy

The authors propose ABPolicy, an asynchronous flow-matching policy that operates in a B-spline control-point action space. The framework consists of four core components:

A. B-Spline Trajectory Parameterization

Instead of predicting raw action vectors, the policy predicts B-spline control points.

Mechanism: The system uses cubic B-splines ( $p=3$ ) to parameterize action trajectories.
Benefit: This representation inherently guarantees $C^2$ continuity (continuity in position, velocity, and acceleration), eliminating intra-chunk jitter and ensuring physically smooth motions.
Fitting: A linear least-squares problem is solved to find the optimal control points that approximate the ground-truth action sequence.

B. Bidirectional Action Prediction (BiAP)

To address inter-chunk continuity, the policy employs a Bidirectional Action Prediction scheme.

Mechanism: The policy does not predict a single future action but a full action chunk spanning $P$ past steps and $H$ future steps ( $A_t = [a_{t-P}, \dots, a_{t+H-1}]$ ).
Training: The model is trained using Flow Matching to generate the control points for this bidirectional window. By jointly modeling past and future actions, the policy learns the temporal structure necessary to minimize discontinuities at chunk boundaries.

C. Continuity-Constrained Refitting (CCR)

This is the core mechanism for handling asynchronous inference.

The Challenge: In asynchronous inference, the robot continues executing the previous trajectory while the new one is being computed. Directly applying the new prediction would cause a discontinuity with the currently executed actions.
The Solution: Upon receiving new control points from the policy, the system performs a local optimization. It adjusts only the initial subset of the new trajectory's control points ( $N_{free}$ ) to minimize the error between the new trajectory's start and the sequence of actions already executed. The remaining control points remain fixed from the policy's prediction.
Result: This "anchors" the new trajectory to the immediate past, ensuring seamless continuity without retraining the model.

D. Asynchronous Inference Framework

Architecture: The system decouples model inference and robot control into two parallel threads.
Operation: While the robot executes the current action chunk, the policy computes the next chunk in the background. Once the new chunk is ready, the CCR module refits it, and the action queue is updated immediately.
Benefit: This eliminates "idle time," allowing the robot to react to environmental changes in real-time without waiting for inference to complete.

3. Key Contributions

Novel Action Space: Introduction of a continuous B-spline control-point space for flow-matching policies, which inherently guarantees trajectory smoothness and reduces fitting errors compared to discretized or raw action spaces.
Continuity Mechanism: A simple yet effective Continuity-Constrained Refitting (CCR) optimization that seamlessly stitches asynchronously generated trajectories, solving the inter-chunk discontinuity problem.
Bidirectional Prediction: A BiAP strategy that models temporal dependencies across past and future actions to enhance trajectory coherence.
Real-Time Performance: An asynchronous inference architecture that maintains real-time responsiveness in dynamic environments by hiding inference latency behind execution.

4. Experimental Results

The authors evaluated ABPolicy on seven manipulation tasks (3 dynamic, 4 static) using a 6-DoF AgileX Piper manipulator.

Dynamic Task Performance:
- Asynchronous inference improved success rates by an average of 18.3% compared to synchronous inference in dynamic tasks (e.g., stacking/pushing on rotating platforms).
- The asynchronous approach significantly reduced action latency, enabling timely interaction with moving objects.
Static Task Efficiency:
- For static tasks, asynchronous inference reduced task completion time by 14.2% on average due to the elimination of execution stalls.
Action Smoothness & Accuracy:
- Reconstruction Error: The continuous B-spline representation achieved a Mean Error of 0.00031 and an SNR of 50.7 dB, outperforming Discrete Bins, DCT coefficients, and Discretized B-splines.
- Jitter Reduction: Compared to raw actions, ABPolicy reduced the Zero-Crossing Rate (ZCR) of velocity by 29.2% and the 95th percentile of acceleration (Acc p95) by 57.1%.
Ablation Study:
- Removing Bidirectional Action Prediction (BiAP) dropped the success rate on a static stacking task from 85% to 60% and increased final jitter by 46%, confirming the necessity of bidirectional modeling for smooth transitions.

5. Significance

ABPolicy addresses the fundamental trade-off between smoothness and responsiveness in robotic control. By shifting the action space to continuous B-splines and decoupling inference from execution, it provides a robust framework for real-world deployment. The method enables robots to perform complex, high-precision manipulation tasks in dynamic environments without the jitter or latency penalties associated with traditional synchronous, raw-action policies. This work offers a scalable path toward more agile and capable autonomous robots.