Iconographic Classification and Content-Based Recommendation for Digitized Artworks

Imagine you walk into a massive, dusty library containing millions of paintings. You want to find a picture of a "sad dog," but the librarians (the museum experts) are too busy to look through every single painting to tell you what's inside. They only wrote down basic notes like "17th century," "oil on canvas," and "artist's name." You're stuck.

This paper introduces a new digital assistant called CARIS (Classification and Recommendation for the Iconclass System) designed to solve this problem. It acts like a super-smart, tireless intern who can look at a painting, figure out exactly what's happening in the story, and then find other paintings with similar stories for you.

Here is how the system works, broken down into simple steps with some creative analogies:

1. The "Eagle-Eyed" Detective (Object Detection)

First, the system needs to know what is actually in the picture. It uses a tool called YOLO (You Only Look Once), which is like a super-fast security camera that scans a painting and shouts out everything it sees: "I see a horse! I see a human! I see a dog!"

The Analogy: Think of YOLO as a very fast but slightly literal-minded child. If you show it a picture of a dog, it says "Dog." It doesn't yet know if the dog is a hero, a villain, or just a pet. It just sees the physical object.

2. The "Dictionary Translator" (Mapping to Iconclass)

Now, the system has a list of objects ("dog," "horse"), but museums don't use simple words; they use a massive, complex code system called Iconclass. This is like a giant, organized filing system where every possible theme in art has a specific code (e.g., 34B11 for "dog").

The system tries to translate the child's list ("dog") into the museum's code.

The Challenge: The word "dog" is tricky. In art, a dog might mean "loyalty," "hunting," or a specific story about Hercules.
The Solution: The system uses a "three-pass" strategy. First, it looks for an exact match. If that fails, it looks for partial matches. Finally, it checks if the dog appears in specific famous stories. It's like a translator who first tries a direct dictionary definition, then checks a thesaurus, and finally asks a local expert for context.

3. The "Story Detective" (Inferring Abstract Meanings)

Sometimes, the painting isn't just about objects; it's about a concept like "Justice" or "Hunting." You can't take a photo of "Justice," but you can take a photo of a woman holding scales and a sword.

The Analogy: The system acts like a logic puzzle solver. If it sees a blindfolded woman + scales + sword, it doesn't just say "woman, scales, sword." It connects the dots and says, "Aha! This is the code for Justice!"
It uses a set of simple rules (like a recipe book) to combine detected objects into deeper meanings.

4. The "Book Club Matchmaker" (Recommendation)

Once the system has the correct codes for your painting, it needs to find other paintings you might like. It uses three different "matchmaking" strategies:

The Family Tree (Hierarchy): If you like a painting about "Hercules," this method suggests other paintings about "Greek Heroes" or "Mythology," even if they aren't exactly Hercules. It understands that they are cousins in the art family tree.
The Rare Gem Finder (IDF): Some codes are very common (like "sky" or "tree"). Others are rare and specific (like "Hercules biting a snail"). This method gives extra weight to the rare codes. If you have a painting with a rare code, it will prioritize finding other paintings with that same rare code, ignoring the boring common ones.
The Perfect Overlap (Jaccard): This method looks at the whole list of codes. It asks, "How much of this list is shared with that painting?" It's great for finding paintings that are thematically very similar, avoiding suggestions that are just vaguely related.

Why is this important?

Currently, finding specific types of art in huge digital archives is like finding a needle in a haystack. You have to guess the right keywords.

This system changes the game by:

Automating the boring work: It does the initial "guessing" of what's in the picture so human experts can focus on the hard stuff.
Understanding the story: It doesn't just see pixels; it understands the meaning behind the pixels using the Iconclass system.
Connecting the dots: It helps you discover art you didn't know you wanted to see, based on the stories the art tells, not just how it looks.

The Catch (Limitations)

The authors are honest: the system is a "proof of concept," meaning it's a working prototype, not a finished product yet.

It's only as good as its eyes: If the "detective" (YOLO) mistakes a cow for a horse, the whole story gets wrong.
It needs training: The system needs to be taught more specific art terms to avoid getting confused by similar-looking animals or objects.

The Bottom Line

This paper presents a bridge between Artificial Intelligence (which sees shapes) and Human Culture (which understands stories). By combining a computer's ability to spot objects with a structured system of art history codes, CARIS helps us navigate the vast ocean of human creativity without getting lost. It's not replacing the museum curator; it's giving them a super-powered assistant to help everyone else find the treasure hidden in the collection.

1. Problem Statement

The digitization of cultural heritage (CH) has made vast amounts of artwork accessible, but it often lacks the interpretive context provided by human experts. While basic metadata (date, author) supports retrieval, iconographic access—understanding what is depicted and its symbolic meaning—remains a bottleneck.

The Challenge: Traditional machine learning approaches often rely on free-form labels or purely visual features, which fail to capture the structured, hierarchical, and symbolic nature of art history.
The Gap: There is a lack of systems that combine Computer Vision (CV) with standardized, controlled vocabularies like Iconclass to automate the classification of visual elements and recommend thematically related artworks based on symbolic meaning rather than just visual similarity.
Data Limitations: Existing datasets (like the Iconclass AI Test Set) often contain low-resolution images unsuitable for effective object detection, and training data for specific iconographic concepts is sparse and heterogeneous.

2. Methodology: The CARIS System

The authors propose CARIS (Classification and Recommendation for the Iconclass System), a proof-of-concept, four-stage pipeline designed to automate iconographic analysis and recommendation.

Stage 1: Object Detection (Computer Vision)

Model: Uses YOLOv8 (You Only Look Once) to detect visible objects in digitized artworks.
Preprocessing: Duplicate detections (e.g., multiple instances of the same object) are removed, as Iconclass assigns a single code per object type regardless of quantity.

Stage 2: Keyword-to-Code Mapping

The system maps YOLO-detected labels to Iconclass codes (a hierarchical alphanumeric vocabulary) using two strategies:

Keyword-Based Set Matching (Primary):
- Exact Set Match: Searches for an Iconclass code where the set of YOLO labels exactly matches the code's keywords.
- Subset Match: If no exact match is found, it looks for codes where the YOLO labels are a subset of the keywords (allowing for missed detections).
- Singleton Search: Searches for codes containing any single detected label to ensure broad coverage.
Description-Based Matching (Secondary): Attempts to match labels against full text descriptions of codes (found to be less accurate in initial tests).

Stage 3: Rule-Based Inference (Abstract Meaning)

Since CV cannot detect abstract concepts (e.g., "Justice"), the system uses a lightweight rule engine (JSON-based) to infer higher-level codes based on object combinations.

Example: Detecting a blindfolded woman + scales + a sword $\rightarrow$ Infers the code for "Justice."
Example: Detecting a deer + dog + horse + human $\rightarrow$ Infers "Hunting."
Constraint: The system avoids using Generative AI as a direct code generator to prevent hallucinations; instead, it uses them (potentially) only as filters for redundant codes.

Stage 4: Content-Based Recommendation

Given a set of Iconclass codes for a query image, the system recommends related artworks using three complementary algorithms:

Hierarchical Proximity: Exploits the Iconclass tree structure.
- Identical codes = 1.0 score.
- Shared immediate parent = 0.5 score.
- Shared grandparent = 0.25 score.
IDF-Weighted Overlap: Uses Inverse Document Frequency (IDF) to weight codes. Rare, specific codes (e.g., a specific mythological event) carry more semantic weight than common objects (e.g., "dog").
Jaccard Similarity: Calculates the intersection over union of code sets to favor tight thematic overlaps and reduce bias toward images with many generic codes.

3. Key Contributions

First Iconclass-Based Recommender: To the authors' knowledge, this is the first system explicitly combining Iconclass tags with content-based recommendation engines.
Hybrid CV-Symbolic Approach: It bridges the gap between low-level visual detection (YOLO) and high-level symbolic interpretation (Iconclass hierarchy), moving beyond pure visual similarity.
Three-Pronged Recommendation Strategy: The integration of hierarchical, statistical (IDF), and set-theoretic (Jaccard) methods provides a robust mechanism for handling different types of queries (e.g., rare myths vs. common scenes).
Open Source Implementation: The full CARIS system is released as a Python package with dedicated modules for I/O, classification, and recommendation.

4. Results and Evaluation

The system was evaluated using public domain images from Wikimedia Commons and the Iconclass AI Test Set (~87k images).

Classification Performance:
- Success: The system successfully mapped simple scenes (e.g., a dog portrait) to the correct specific Iconclass code ("34B11 dog") after filtering out irrelevant matches.
- Limitations (Code Explosion): When only generic objects were detected (e.g., "human"), the system returned thousands of potential codes (e.g., 1,231 codes for "human").
- Bottleneck: The primary failure point is object detection recall. If YOLO misses a critical element (e.g., mistaking a dog for a bear, or missing a falcon in a hunt scene), the inferred Iconclass codes and subsequent recommendations become inaccurate.
Recommendation Performance:
- Hierarchy: Successfully recommended artworks with related narratives even when exact code matches were missing (e.g., recommending a Hercules image based on "attributes of Hercules" codes when the specific myth code was absent).
- IDF: Effectively prioritized rare, diagnostic codes over common object overlaps.
- Jaccard: Provided robust recommendations for images with complex code sets, avoiding bias toward generic content.

5. Significance and Future Work

Significance: The paper demonstrates that Iconclass-aware computer vision can accelerate cataloging and enhance navigation in large heritage repositories. It validates the strategy of letting CV propose visible elements while using symbolic structures to derive meaning.
Future Directions:
1. Fine-Tuning YOLO: Creating Iconclass-compliant training sets annotated by experts to improve detection accuracy for specific art-historical objects.
2. Rule Engine Refinement: Semi-automatically mining rules from large corpora to better infer abstract concepts.
3. Multimodal Integration: Combining visual features with textual metadata and Vision-Language embeddings (e.g., CLIP).
4. Explainability: Developing a user interface that explains why a specific recommendation was made, making the system accessible to curators and the general public.

Conclusion: While the system currently requires significant engineering to handle detection errors and code explosion, it establishes a viable framework for automating iconographic analysis, shifting the paradigm from purely visual retrieval to semantic, meaning-based discovery in cultural heritage.