Geometry-Aware Metric Learning for Cross-Lingual Few-Shot Sign Language Recognition on Static Hand Keypoints

This paper proposes a geometry-aware metric learning framework using rotation- and scale-invariant inter-joint angle descriptors derived from static hand keypoints to achieve robust cross-lingual few-shot sign language recognition, significantly outperforming conventional coordinate-based methods across diverse sign languages.

Chayanin Chamachot, Kanokphan Lertniponphan

Published Wed, 11 Ma
📖 4 min read☕ Coffee break read

Imagine you are trying to teach a robot to understand sign language. The problem is that there are over 300 different sign languages in the world, but for most of them, we don't have enough video examples to train a smart AI. It's like trying to learn a new language by only reading a few pages of a dictionary.

This paper proposes a clever solution: Teach the robot the "shape" of the hand, not the "location" of the hand.

Here is the breakdown of their idea using simple analogies:

1. The Problem: The "Camera Angle" Confusion

Imagine you are taking a photo of a friend making a "peace sign" (V-shape) with their fingers.

  • Scenario A: You take the photo from far away. Their hand looks tiny.
  • Scenario B: You take the photo from the side. Their hand looks squashed.
  • Scenario C: You take the photo from above. Their hand looks different again.

If you show these three photos to a standard computer program, it gets confused. It thinks, "Wait, is this a different sign? The hand is in a different spot, it's a different size, and it's facing a different way!" This is called Domain Shift. The computer is too focused on where the hand is in the room, rather than what the hand is actually doing.

This is a huge problem when you only have a few examples (called "Few-Shot Learning"). If you only show the computer 5 examples of a sign, and they all look different because of the camera angle, the computer will never learn the true shape of the sign.

2. The Solution: The "Stick Figure" Geometry

The authors realized that instead of giving the computer the raw coordinates (X, Y, Z) of the hand, they should give it the angles between the finger joints.

Think of a hand like a puppet made of sticks and hinges.

  • If you move the puppet closer to the camera, the sticks get bigger, but the angle at the hinge doesn't change.
  • If you rotate the puppet, the sticks point in different directions, but the angle at the hinge stays exactly the same.

The researchers created a special "language" for the computer that only speaks in angles. They measured the angle between every joint in the hand (like the bend in your knuckle).

  • Raw Coordinates: "My finger is at position (10, 20, 5)." (Changes if you move the camera).
  • Angle Descriptor: "My finger is bent at 45 degrees." (Stays the same no matter where you are).

They call this a Geometry-Aware approach. It's like describing a song by its melody (the relationship between notes) rather than the volume or the speed at which it's played.

3. The Magic Trick: Cross-Lingual Transfer

Now, here is the really cool part. The researchers trained their AI on American Sign Language (ASL), which has thousands of examples. Then, they tried to use that same AI to recognize signs in Thai, Brazilian, and Arabic sign languages, which have very few examples.

Usually, this fails because the cameras and recording conditions are different. But because their AI was trained on angles (the pure shape of the hand), it didn't care about the camera or the distance.

  • The Analogy: Imagine you learn to recognize a "Triangle" by looking at it in a book. Later, someone shows you a triangle drawn in the sand, or a triangle made of sticks. Even though the materials and sizes are different, you recognize it instantly because you learned the geometry, not the material.

4. The Results: A Giant Leap

The results were impressive:

  • Within the same language: Using angles made the AI much smarter, especially when there were very few examples to learn from.
  • Across different languages: The AI trained on ASL could recognize Thai signs almost as well as if it had been trained specifically on Thai data. In some cases, it was even better than training from scratch!

Summary

The paper is essentially saying: "Stop teaching robots to memorize where hands are in a room. Teach them to understand how hands bend."

By focusing on the invariant geometry (the angles that never change), they built a lightweight, efficient system that can learn new sign languages with very little data, acting as a universal translator for the "shape" of human hands. This is a huge step toward making sign language technology accessible to the hundreds of languages that currently lack digital support.