The impact of abstract and object tags on image privacy classification

This paper investigates the effectiveness of abstract versus object tags for image privacy classification, revealing that abstract tags outperform object tags when tag budgets are limited, while both become equally useful when a larger number of tags are available.

Darya Baranouskaya, Andrea Cavallaro

Published 2026-02-17
📖 4 min read☕ Coffee break read

Imagine you are trying to explain a photograph to a friend over the phone so they can guess if the photo is private (like a diary entry) or public (like a postcard).

To do this, you have a limited "word budget." You can only use a few words to describe the picture. The big question this paper asks is: What kind of words work best?

Do you use Concrete Words (names of specific things, like "a red car," "a passport," or "a dog")?
Or do you use Abstract Words (feelings, vibes, and concepts, like "romance," "danger," "freedom," or "intimacy")?

Here is the breakdown of the research, explained simply.

The Two Types of Descriptions

The researchers looked at how computers "see" photos.

  • Concrete Tags (The "What"): These are like a grocery list. They tell you exactly what objects are in the room. Example: "Man," "Sofa," "Passport."
  • Abstract Tags (The "Vibe"): These are like a mood board. They tell you the feeling or the story. Example: "Family," "Celebration," "Secret," "Justice."

The Experiment: The "Word Budget" Game

The researchers tested these descriptions on three different types of photo datasets to see which words helped a computer decide if a photo was private. They played a game with a strict rule: You can only use a certain number of words.

Scenario A: You have a very small budget (1–5 words)

Imagine you can only say one or two words to describe a photo.

  • The Result: If the photo is about feelings or personal secrets (like a couple kissing or a medical document), Abstract Words win.
  • The Analogy: If you see a photo of a wedding ring and a blurry background, saying "Love" or "Commitment" tells you more about the privacy risk than saying "Metal" or "Finger." Abstract words capture the context and the story immediately.
  • The Catch: If the photo is purely about spotting a specific object (like "Is there a gun in the room?"), Concrete words are still slightly better, but Abstract words are still surprisingly good.

Scenario B: You have a huge budget (13+ words)

Imagine you can write a whole paragraph describing the photo.

  • The Result: It doesn't matter much which words you use. Concrete and Abstract words perform equally well.
  • The Analogy: If you have enough words to say "A man holding a passport in a dark room looking nervous," you don't need the word "Fear" to know it's a private situation. The list of objects (Man, Passport, Room) is so long and detailed that it paints the full picture on its own. The "vibe" words become redundant because the "stuff" words have already told the whole story.

The Surprising Twist: Co-Occurrence

The researchers also looked at whether Abstract and Concrete words usually appear together.

  • The Finding: They rarely appear together directly. For example, the word "Romance" (Abstract) doesn't always show up next to "Flowers" (Concrete) in the computer's data.
  • The Lesson: This means Abstract words aren't just "copycats" of Concrete words. They are providing unique information that you can't get just by listing objects—but only if you don't have enough words to list everything.

The Big Takeaway for the Future

This research gives us a rule of thumb for building AI that protects our privacy:

  1. If you are limited on space (like a quick alert or a mobile app): Don't just look for objects. You must teach the AI to understand the "vibe" (Abstract tags). It's the most efficient way to spot privacy risks when you can't say much.
  2. If you have plenty of space (like a detailed report): You can stick to listing objects. It's easier for computers to find "Passports" and "Cars" than "Justice" or "Intimacy," and if you list enough of them, you get the same result.
  3. The Sweet Spot: For the most accurate privacy protection, especially for subjective things (like "is this photo embarrassing?"), a mix of both is best. But if you have to choose, start with the Abstract words.

In short: If you have a short story to tell, use the feeling words. If you have a long story to tell, the object words will do just fine.

Get papers like this in your inbox

Personalized daily or weekly digests matching your interests. Gists or technical summaries, in your language.

Try Digest →