Specificity-aware reinforcement learning for fine-grained open-world classification

This paper proposes SpeciaRL, a specificity-aware reinforcement learning framework that fine-tunes reasoning Large Multimodal Models to achieve an optimal balance between correctness and specificity in open-world fine-grained image classification by employing a dynamic, verifier-based reward signal.

Samuele Angheben, Davide Berasi, Alessandro Conti, Elisa Ricci, Yiming Wang

Published 2026-03-05
📖 4 min read☕ Coffee break read

Imagine you are at a bustling art gallery, and you're asked to describe a painting to a friend over the phone.

The Problem: The "Vague Artist"
Most modern AI image classifiers are like a very polite but slightly lazy artist. If you show them a picture of a Golden Retriever, they might say, "It's a dog." If you show them a Red Delicious Apple, they might say, "It's a fruit."

Technically, they are correct. A Golden Retriever is a dog. But they are too generic. They lack the "specificity" to tell you it's a Golden Retriever or a Red Delicious.

In the real world, this matters. If a doctor needs to know if a skin spot is a "mole" or a specific type of "melanoma," saying "it's a skin spot" isn't helpful. If a botanist needs to identify a rare flower, saying "it's a flower" is useless.

The challenge is: How do we get the AI to be more specific without making it guess and get things wrong?

  • If you just tell the AI, "Be specific!", it might panic and say, "It's a Golden Retriever named Sparky," when it's actually just a Labrador. It becomes specific but wrong.
  • If you let it be safe, it stays correct but vague.

The Solution: SpeciaRL (The "Smart Coach")
The authors of this paper created a new training method called SpeciaRL. Think of it as a smart coach training an athlete who already knows the sport but is afraid to take risks.

Here is how it works, using a simple analogy:

1. The "Group Try" (Rollouts)

Instead of asking the AI to guess the answer once, the coach asks it to generate 10 different guesses for the same picture.

  • Guess 1: "It's a bird." (Too vague)
  • Guess 2: "It's a sparrow." (Maybe right, maybe wrong)
  • Guess 3: "It's a White-throated Sparrow." (Very specific, but is it right?)
  • Guess 4: "It's a bird." (Safe)

2. The "Expert Judge" (The Verifier)

The AI doesn't know which guess is best. So, a super-smart "Judge" (another AI) looks at all 10 guesses and the actual correct answer (which the coach knows).
The Judge sorts the guesses into buckets:

  • Wrong: "It's a cat." (Discard!)
  • Generic: "It's a bird." (Okay, but boring.)
  • Specific: "It's a White-throated Sparrow." (Great!)

3. The "Dynamic Reward" (The Secret Sauce)

This is the most clever part. In the past, AI training was like a strict teacher who only gave a gold star if you got the exact right answer. If you were close but not perfect, you got nothing. This made the AI afraid to try hard.

SpeciaRL changes the rules:
The coach looks at the best guess the AI made in that group of 10.

  • Scenario A: The AI's best guess was just "Bird."
    • The Reward: The coach says, "Good job! You were as specific as you could be. You get a gold star for being a 'Bird'." (The AI learns: Don't force it if I don't know.)
  • Scenario B: The AI's best guess was "White-throated Sparrow."
    • The Reward: The coach says, "Great! You can be specific. Next time, don't settle for just 'Bird.' Aim for the sparrow!" (The AI learns: Push for the specific answer.)

Why This is a Big Deal

Most other methods try to force the AI to be specific by punishing it for being vague. This often makes the AI hallucinate (make things up) just to get the reward.

SpeciaRL is different because it respects the AI's limits.

  • It asks: "What is the maximum level of detail this AI can actually handle for this specific picture?"
  • It rewards the AI for reaching that limit, but never for guessing wildly beyond it.

The Result

The paper tested this on thousands of images (birds, cars, flowers, food).

  • Old AI: "It's a car." (Correct, but boring).
  • Forced AI: "It's a 1998 Ferrari F355 Challenge." (Specific, but often wrong).
  • SpeciaRL AI: "It's a 1998 Ferrari F355 Challenge." (Specific and correct).

In a Nutshell

SpeciaRL is a training technique that teaches AI to be confidently specific. It doesn't force the AI to guess; instead, it encourages the AI to dig deep and find the most detailed answer it knows is true, while gently stopping it from making things up when it's unsure. It's the difference between a student who memorizes a textbook and one who truly understands the material and can explain the fine details.