DiG-Net: Enhancing Human-Robot Interaction through Hyper-Range Dynamic Gesture Recognition in Assistive Robotics

This paper introduces DiG-Net, a novel deep learning framework that significantly enhances assistive human-robot interaction by enabling robust dynamic hand gesture recognition at hyper-range distances of up to 30 meters through the integration of Depth-Conditioned Deformable Alignment blocks, Spatio-Temporal Graph modules, and a specialized Radiometric Spatio-Temporal Depth Attenuation Loss.

Eran Bamani Beeri, Eden Nissinman, Avishai Sintov

Published 2026-03-17
📖 4 min read☕ Coffee break read

Imagine you are trying to talk to a helpful robot assistant, but you are standing 30 meters (about 100 feet) away. You can't shout, and you don't want to walk over to it. You just want to wave your hand to say, "Go back," or "Come here."

In the past, robots were like people with very poor eyesight. If you stood too far away, they couldn't tell the difference between a "stop" sign (a static hand) and a "go back" wave (a moving hand). The image was too blurry, too small, and the details were lost in the distance.

This paper introduces DiG-Net, a new "brain" for robots that solves this problem. Think of DiG-Net as giving the robot super-vision and super-memory.

Here is how it works, broken down into simple concepts:

1. The Problem: The "Foggy Window" Effect

When you look at something far away, it gets blurry and small. In the world of robotics, this is called attenuation.

  • The Old Way: Previous robots tried to guess what you were doing by looking at a single, blurry snapshot. It was like trying to guess a movie plot by looking at just one pixelated frame. They often confused a "stop" gesture with a "go back" gesture because they couldn't see the movement.
  • The New Way: DiG-Net knows that distance makes things blurry. Instead of fighting the blur, it uses a special trick to "un-blur" the image in its mind before making a decision.

2. The Secret Sauce: Three Superpowers

DiG-Net combines three different technologies to act like a detective solving a mystery:

  • Superpower A: The "Depth Detective" (DADA Blocks)
    Imagine looking at a person through a foggy window. You know they are far away, so you know their hand looks smaller than it really is. DiG-Net has a module that estimates exactly how far away you are. It then "warps" or stretches the image in its computer brain to compensate for that distance. It's like putting on special glasses that automatically adjust the focus so the robot sees your hand clearly, even if it's 30 meters away.

  • Superpower B: The "Time Traveler" (Spatio-Temporal Graphs)
    A single photo can be misleading. A hand held still could mean "stop" or it could be the middle of a "wave." DiG-Net doesn't just look at one frame; it looks at the story of the movement. It connects the dots between your hand's position in frame 1, frame 2, and frame 3. It understands that a "wave" is a story of motion, not just a static shape.

  • Superpower C: The "Smart Teacher" (RSTDAL Loss)
    When training a student, you usually treat every question the same. But DiG-Net has a special teacher (a mathematical tool called a "loss function") that knows: "The questions asked from far away are harder to see, so we need to study them extra hard."
    This teacher forces the robot to pay extra attention to the blurry, distant gestures during training. It learns that if a gesture is far away, it needs to be extra careful to get it right.

3. The Result: A Robot That "Gets" You

The researchers tested this system with real people waving at robots from distances up to 30 meters (about the length of three school buses).

  • Old Robots: Got confused easily, especially in the sun or with wind blowing leaves around.
  • DiG-Net: Achieved a 97.3% success rate. It could tell the difference between a "thumbs up" and a "go back" wave, even when the person was tiny in the camera frame.

Why Does This Matter?

Think about a person in a wheelchair who can't easily walk over to a robot to give it a command. Or a factory worker who needs to signal a robot to stop from across a noisy, dangerous floor. Or an elderly person at home who just wants to wave for help without shouting.

DiG-Net turns the robot into a trustworthy partner that understands you from a distance. It bridges the gap between "I am far away" and "I am understood."

In a nutshell: DiG-Net is like giving a robot a pair of high-tech binoculars and a memory that remembers your movements, allowing it to understand your hand signals clearly, even when you are standing at the other end of a football field.

Drowning in papers in your field?

Get daily digests of the most novel papers matching your research keywords — with technical summaries, in your language.

Try Digest →