A Baseline Study and Benchmark for Few-Shot Open-Set Action Recognition with Feature Residual Discrimination

This paper addresses the underexplored challenge of Few-Shot Open-Set Action Recognition in video data by proposing a Feature-Residual Discriminator (FR-Disc) that significantly improves unknown action rejection without sacrificing closed-set accuracy, establishing a new state-of-the-art benchmark across five datasets.

Stefano Berti, Giulia Pasquale, Lorenzo Natale

Published 2026-03-05
📖 4 min read☕ Coffee break read

Imagine you are teaching a robot to recognize different human actions, like "jumping," "clapping," or "drinking coffee."

The Problem: The "Closed Room" Trap

Usually, when we train these robots, we give them a strict list of things to learn. If we teach them 10 actions, they only know those 10. If you show them a video of someone "dancing," and dancing wasn't on the list, the robot gets confused. It might guess, "Oh, that must be jumping!" and confidently get it wrong.

This is called a Closed-Set problem. It's like a security guard at a club who only has a list of 10 approved faces. If someone new walks up, the guard doesn't say, "I don't know you." Instead, they force a match: "You look a bit like Bob, so you're Bob!" This leads to false alarms.

In the real world, we need Open-Set recognition. We need the robot to say, "I know what jumping and clapping look like, but I have no idea what that is. I should reject it."

The Challenge: Learning with Very Few Examples

Now, imagine you can't show the robot thousands of videos. You only have one or five examples of each action (this is Few-Shot learning). It's like trying to teach a child to recognize a "Golden Retriever" by showing them just one picture, and then asking them to spot a Golden Retriever in a crowd of dogs they've never seen before.

The Solution: A New "Detective" for the Robot

The authors of this paper built a system that does two things:

  1. Recognizes the few known actions it was taught.
  2. Detects when it sees something totally unknown and says, "Nope, not on my list."

They tested this on five different video datasets (like a mix of sports, daily life, and diving videos).

The "Secret Sauce": The Feature-Residual Discriminator (FR-Disc)

The paper compares a few different ways to make the robot smarter at spotting unknowns. Here is the analogy for their best method, FR-Disc:

  • The Old Way (Softmax/Logits): Imagine the robot tries to guess the answer by looking at a scoreboard. If the score for "Jumping" is 90% and "Clapping" is 10%, it picks Jumping. The problem? Even if the robot is totally clueless, it still has to pick a winner. It might give "Jumping" a score of 51% just because it's slightly higher than the others. It's overconfident.
  • The New Way (FR-Disc): Instead of just looking at the scoreboard, the robot has a specialized detective (the Discriminator).
    • The robot first tries to match the new video to the closest thing it knows (e.g., "This looks a bit like Jumping").
    • Then, the Detective looks at the difference (the "residual") between the new video and the "Jumping" example.
    • If the video is actually "Jumping," the difference is tiny.
    • If the video is "Dancing" (which wasn't taught), the difference is huge and messy.
    • The Detective is trained specifically to look at that "messiness." If the mess is too big, the Detective shouts, "Reject! This isn't Jumping, and it's not anything I know!"

The Results: Why It Matters

The authors found that:

  1. Simple isn't always best: Just telling the robot to be less confident (a technique called "Entropy") helped a little, but not enough.
  2. The "Garbage" Class failed: They tried teaching the robot a "Garbage" category for unknowns, but the robot got confused. It memorized the specific "garbage" videos it saw during training instead of learning the concept of "unknown."
  3. The Detective (FR-Disc) won: This method was the champion. It didn't just get better at spotting unknowns; it actually got better at recognizing the known actions too. It was like giving the robot a magnifying glass that helped it see details it missed before.

The Big Takeaway

This paper is a "Baseline Study," which means they built a standard testing ground (a benchmark) so other scientists can compare their ideas fairly.

In short: They showed that by adding a "difference-checking detective" to a robot that learns from very few examples, we can make it safe for the real world. It won't confidently guess wrong when it sees something new; instead, it will politely admit, "I don't know this," which is exactly what we want from AI in real life.