Shape-Interpretable Visual Self-Modeling Enables Geometry-Aware Continuum Robot Control

Imagine you have a very flexible robot arm, like an elephant's trunk or an octopus tentacle. This isn't a robot made of stiff metal joints; it's made of soft, bendy material that can twist and turn in infinite ways. This makes it amazing for squeezing into tight spaces or working safely around humans.

But here's the problem: It's incredibly hard to control.

Because it bends so much, figuring out exactly where every part of it is in 3D space is like trying to describe the shape of a piece of cooked spaghetti just by looking at a flat shadow on the wall. If you only look at the shadow (a 2D image), you might think the spaghetti is straight when it's actually curled up. Most robots today either rely on complex math formulas that break easily, or they use "black box" AI that just guesses what to do without really "understanding" its own body.

This paper introduces a clever new way to teach these soft robots how to understand themselves. Here is the breakdown using simple analogies:

1. The "Shadow Puppet" Problem

Imagine you are trying to direct a shadow puppet show. If you only have one light source, you can't tell if the puppet's hand is close to the screen or far away; the shadow looks the same either way.

The Old Way: Most robots use one camera (one light). They guess the shape, but they often get it wrong because they can't see the depth.
The New Way: The researchers use two cameras (two lights from different angles). By looking at the robot from two sides at once, they can figure out exactly what the 3D shape is, even without expensive 3D scanners.

2. Drawing with "Magic Dots" (Bezier Curves)

Once the robot knows what it looks like, it needs a way to describe that shape simply.

The Analogy: Instead of trying to describe every single pixel of the robot's body (which is like trying to describe a painting by listing the color of every single grain of sand), the robot uses Bezier Curves.
Think of this like drawing a curve on a computer using just a few "control dots." If you move the dots, the whole curve changes smoothly. The robot learns to describe its entire body using just a handful of these "magic dots." This makes the shape easy to understand and easy to control.

3. The "Mental Gymnast" (Self-Modeling)

This is the coolest part. The robot doesn't need a manual written by engineers. Instead, it learns by doing, just like a baby learning to walk.

How it works: The robot wiggles its muscles, looks at itself in the two cameras, and sees how its "magic dots" moved. It repeats this thousands of times.
The Result: It builds a mental model of its own body. It learns: "When I pull this cable, my body bends like this." It doesn't need to know the physics of friction or material stiffness; it just learns the relationship between its commands and its shape.

4. The "Dance Partner" (Hybrid Control)

Now that the robot knows its shape, it can do two things at once:

Move its tip: Like a surgeon's tool, it needs to get its "hand" to a specific spot.
Control its body: It needs to make sure its "arm" doesn't hit a wall or get tangled.

The Magic: The robot uses its mental model to solve a puzzle. It says, "I need my hand to stay here, but I need my elbow to move away from that obstacle." It calculates the perfect movement to do both simultaneously.

5. The "Dodgeball" Test (Obstacle Avoidance)

The researchers tested this by putting obstacles in the robot's path.

The Old AI: If a standard AI saw an obstacle, it might just crash into it because it only "saw" the shadow and didn't understand the 3D distance.
The New Robot: It sees the obstacle getting close in one of its camera views. It instantly says, "Oh no, I'm too close!" and uses its shape control to wiggle its body away from the obstacle, all while keeping its "hand" steady on its target. It's like a dancer dodging a chair while keeping their hand on a partner's shoulder.

Why This Matters

No More "Black Boxes": The robot's brain isn't a mystery. We can see exactly how it describes its shape, making it safer and more trustworthy.
No Sensors on the Body: You don't need to glue hundreds of tiny sensors onto the robot. Just two cheap cameras are enough.
Safe in Crowded Spaces: Because it understands its own 3D shape, it can work in messy, crowded environments (like inside a human body for surgery or in a collapsed building) without getting stuck or hurting anything.

In a nutshell: This paper teaches soft robots to look in a mirror (two mirrors, actually), draw a simple sketch of themselves, and learn how to move that sketch to avoid hitting things while still getting their job done. It's a giant leap toward robots that are as smart and adaptable as an octopus.

1. Problem Statement

Continuum robots (e.g., snake-like or tentacle-like robots) offer high flexibility and redundancy, making them ideal for complex, confined environments like minimally invasive surgery. However, their control is hindered by:

Nonlinear Dynamics: Continuous deformation and strong coupling make analytical modeling difficult.
Limitations of Existing Methods:
- Model-based approaches struggle with parametric uncertainty, friction, and hysteresis.
- Data-driven methods using discrete points require dense physical markers or expensive depth sensors, increasing complexity and interfering with compliance.
- End-to-end vision-based learning often relies on implicit "black-box" latent spaces. These lack explicit geometric awareness, suffer from non-unique mappings (single-view 2D images cannot uniquely define 3D shapes), and struggle with tasks like obstacle avoidance or self-motion.

Core Challenge: How to enable a continuum robot to autonomously learn its own 3D shape dynamics from visual data without analytical models or physical markers, while maintaining an interpretable geometric representation for advanced control.

2. Methodology

The authors propose a Shape-Interpretable Visual Self-Modeling Framework that bridges visual perception and geometry-aware control.

A. Shape Encoding (Visual to Geometric)

Instead of using raw pixels or discrete markers, the robot's shape is encoded into a compact, interpretable parameter space:

Multi-View Acquisition: Two monocular cameras capture planar images from different views.
Image Processing: Images are binarized, and the robot region is extracted. A morphological thinning operation generates a skeleton curve.
Bézier Curve Fitting: The skeleton is fitted using piecewise quadratic Bézier curves via least-squares optimization.
- The shape is defined by a small set of control points ( $p_0, \dots, p_{2M}$ ).
- This transforms 2D visual observations into a 3D shape feature vector ( $x_s$ ) by combining control points from multiple views. This ensures a unique 3D configuration without explicit 3D reconstruction.

B. Data-Driven Self-Modeling (NODEs)

The system learns the robot's dynamics directly from data using Neural Ordinary Differential Equations (NODEs):

Shape Dynamics Model: A neural network ( $f_{s,NN}$ ) learns the relationship $\dot{x}_s = f_s(x_s, u)$ , where $u$ is the actuation input.
Position Dynamics Model: A separate NODE learns the end-effector dynamics $\dot{x}_p = f_p(x_p, u)$ .
Training: The models are trained on time-series data of actuation inputs and corresponding shape/position states. This allows the robot to predict future states given current inputs.

C. Jacobian-Based Hybrid Control

Since analytical Jacobians are unavailable, the framework estimates them numerically using the trained NODEs:

Jacobian Estimation: The shape Jacobian ( $\hat{J}_s$ ) and position Jacobian ( $\hat{J}_p$ ) are approximated by perturbing the actuation input and observing the resulting state changes via the NODE solver.
Hybrid Controller: A unified control law combines:
1. Shape Controller: Drives the robot to a reference shape.
2. Position Controller: Drives the end-effector to a target position.
3. Obstacle Avoidance Controller: Activated when an obstacle enters a warning distance. It calculates an "escape velocity" for the closest point on the Bézier curve and maps this to a shape variation ( $\dot{u}_o$ ) to repel the robot body while maintaining the end-effector goal.

3. Key Contributions

Shape-Interpretable Self-Modeling: A novel framework where the robot learns its own 3D shape dynamics purely from multi-view 2D images, eliminating the need for analytical models, physical markers, or camera calibration.
Geometry-Aware Representation: By using Bézier curves, the method provides an explicit, low-dimensional, and continuous geometric representation. This allows the controller to "reason" about distances to obstacles and perform self-motion, which is impossible with implicit end-to-end latent spaces.
Hybrid Control Strategy: A unified controller that simultaneously regulates body shape and end-effector position, with integrated obstacle avoidance capabilities that do not compromise the primary task.
Experimental Validation: Comprehensive testing on a cable-driven continuum robot demonstrating superior performance over state-of-the-art end-to-end methods (specifically Deep Visual Inverse Kinematics).

4. Experimental Results

The method was tested on a 3-segment, cable-driven continuum robot (0.3m length) using two low-cost monocular cameras.

Shape-Position Regulation: The robot successfully reached reference shapes and positions.
- Accuracy: Shape errors were within 1.56% of image resolution; end-effector errors were within 2% of robot length.
Tracking: The robot tracked complex trajectories (e.g., "∞" and "8" shapes) with high fidelity.
Obstacle Avoidance:
- In static obstacle scenarios, the robot successfully avoided collisions while reaching the target.
- In dynamic scenarios (moving obstacles), the robot adapted its body shape to maintain a safe distance while keeping the end-effector stationary.
Comparison with DVIK (Deep Visual Inverse Kinematics):
- 3D Consistency: The proposed method achieved consistent shapes in both camera views. The DVIK method (single-view) succeeded in View 1 but failed in View 2 due to non-unique 2D-to-3D mapping.
- Obstacle Avoidance: The proposed method avoided obstacles; DVIK collided because it lacked geometric awareness.
- Error: The proposed method showed significantly lower steady-state errors compared to DVIK.

5. Significance and Impact

Paradigm Shift: Moves continuum robot control from "black-box" end-to-end learning to interpretable, geometry-aware learning. It elevates shape from a latent variable to a perceptually grounded control primitive.
Practicality: Removes the dependency on expensive sensors, dense markers, or complex physical modeling, making the system easier to deploy in unstructured environments.
Safety: The explicit geometric representation enables proactive safety behaviors (obstacle avoidance, self-motion) that are critical for human-robot collaboration and medical applications.
Future Directions: While currently limited by segmentation sensitivity and smoothness assumptions of Bézier curves, this framework establishes a foundation for autonomous, safe manipulation in highly deformable robotic systems.