AMR-CCR: Anchored Modular Retrieval for Continual Chinese Character Recognition

Imagine you are the head librarian of a massive, ancient library dedicated to Chinese calligraphy. Your job is to identify every character in the books, rubbings, and stone tablets that keep arriving from archaeological digs.

Here is the problem: The library never stops growing.

Every week, new boxes arrive with characters written in different styles (scripts) from different eras. Some look like tiny drawings (Oracle Bone), others look like elegant cursive (Seal Script). The characters are incredibly similar to each other—like twins wearing slightly different hats—and the handwriting varies wildly depending on who wrote them.

The Old Way: The "Hard-Drive" Librarian

Traditionally, librarians tried to solve this by training a single "super-brain" (a neural network) to memorize every character they've ever seen.

The Flaw: Every time a new box of characters arrives, they have to retrain the whole brain.
The Result: The brain gets confused. It starts forgetting the old characters to make room for the new ones (a problem called "catastrophic forgetting"). It's like trying to learn a new language by cramming; you might remember the new words, but you start forgetting the old ones. Also, if the handwriting is messy, the brain gets stuck because it expects every "A" to look exactly the same.

The New Way: AMR-CCR (The "Smart Index" System)

The authors of this paper propose a completely different approach called AMR-CCR. Instead of trying to memorize everything in a single brain, they build a dynamic, searchable dictionary.

Here is how it works, using simple analogies:

1. The "Universal Translator" (Shared Embedding Space)

Imagine a universal translator that can turn any character, no matter the script or style, into a single, standardized "ID card" (an embedding).

Whether it's a rough stone carving or a smooth bamboo slip, the translator converts it into a unique code.
This allows the system to compare a new character against all the old ones instantly, just like checking a fingerprint against a database.

2. The "Script-Specific Glasses" (SIA + SAR)

This is the cleverest part. Different scripts (like Oracle Bone vs. Clerical Script) have different "flavors."

The Problem: If you look at an Oracle Bone character with "Clerical Script glasses," it looks weird and confusing.
The Solution: The system has a pair of smart glasses for each script.
- SIA (The Glasses): When a new character arrives, the system puts on the specific glasses for that script to "calibrate" the view, making the character look normal to the database.
- SAR (The Eye Doctor): Since the system doesn't always know which script a character is from just by looking at it, it has a tiny "Eye Doctor" module. This doctor quickly guesses which pair of glasses to put on before the character is checked against the dictionary.

3. The "Multi-Identity" Dictionary (Multi-Prototype)

In the old system, a character like "Dragon" had only one entry in the dictionary. If the new "Dragon" looked slightly different (maybe written by a left-handed person), the system might miss it.

The New Solution: The dictionary doesn't just have one entry for "Dragon." It has multiple entries (prototypes) for the same character, capturing all the different ways it can be written (e.g., "Dragon - Bold Style," "Dragon - Flowing Style," "Dragon - Ancient Style").
This ensures that no matter how messy or unique the handwriting is, there's a match waiting for it.

4. The "Zero-Shot" Magic (Reading the Unknown)

Sometimes, archaeologists find a character that has never been seen before. There is no picture of it in the database.

The Old Way: The system would just say, "I don't know."
The New Way: The system uses text descriptions. If the archaeologist says, "This character looks like a mountain and means 'high'," the system searches its dictionary for a character that matches that description, even if it has never seen a picture of that specific character before. It's like solving a mystery using clues rather than just matching photos.

The Result: EvoCON (The Training Ground)

To prove this works, the authors built a new training ground called EvoCON. It's a simulated library with six different stages of scripts, ranging from the oldest to the newest. They tested their system against the old methods, and the results were clear:

Old methods forgot the past and got confused by new styles.
AMR-CCR remembered everything, handled the messy handwriting perfectly, and could even guess the meaning of totally new characters using text clues.

In a Nutshell

Instead of trying to force a single brain to memorize an ever-changing, messy library, this paper suggests building a smart, adaptable filing system. It uses special glasses to understand different writing styles, keeps multiple copies of every character to handle variations, and uses text descriptions to solve mysteries of characters it has never seen before. It turns a chaotic, growing problem into a manageable, searchable treasure hunt.

Here is a detailed technical summary of the paper "AMR-CCR: Anchored Modular Retrieval for Continual Chinese Character Recognition."

1. Problem Formulation: Continual Chinese Character Recognition (Continual CCR)

The paper addresses a critical gap in Ancient Chinese Character Recognition (CCR). While existing research focuses on offline, closed-set classification (training on a fixed set of characters), real-world digitization workflows are non-stationary. New archaeological materials are continuously discovered, introducing:

New Scripts: Different historical writing systems (e.g., Oracle Bone, Bronze Inscriptions, Seal Script).
New Classes: New character variants within existing scripts.
Challenges:
1. Scalability & Subtlety: The class space grows indefinitely with subtle inter-class differences and scarce incremental data.
2. Intra-class Diversity: A single character class exhibits massive variation due to different writers, eras, and carrier conditions (e.g., stone rubbings vs. bamboo slips).

The authors formalize this as Continual CCR: a script-staged, class-incremental setting where data arrives in stages by script, requiring the model to learn new classes without forgetting old ones, while handling extreme intra-class distribution shifts.

2. Methodology: AMR-CCR Framework

The authors propose AMR-CCR (Anchored Modular Retrieval for Continual CCR), shifting from closed-set classification to an embedding-based dictionary retrieval paradigm.

Core Architecture

Shared Multimodal Space: The model uses a frozen Qwen3-VL-Embedding backbone as a "Stability Anchor" to maintain a consistent vision-language embedding space across all stages.
Dictionary-Based Matching: Recognition is performed by matching query embeddings against a dictionary of prototypes, rather than using a fixed classification head. New classes are added simply by extending the dictionary.

Key Components

Script-Conditioned Injection (SIA + SAR):
- SIA (Script-Interface Adapter): A lightweight, low-rank module injected into the frozen backbone. It performs script-specific calibration to adapt the shared embedding space to the nuances of a new script without disrupting the global structure.
- SAR (Script-Aware Routing): A lightweight MLP router that predicts the script identity of an input image at inference time (since script labels are unknown during testing) and selects the appropriate SIA. This ensures the correct calibration is applied before retrieval.
Image-Derived Multi-Prototype Dictionary:
- To address high intra-class diversity, the system does not use a single mean prototype per class. Instead, it employs Auto-K clustering (Spherical K-Means) to generate multiple prototypes per class, capturing distinct writing styles and media variations.
Two-Phase Training Strategy:
- Phase-I (New-Script Adaptation): Trains the new SIA for the current script using image-image and image-text contrastive learning (InfoNCE) while keeping the backbone and previous adapters frozen.
- Phase-II (Buffered Replay): Uses a memory buffer of past data to re-align the SIA bank across scripts and train the SAR router, mitigating catastrophic forgetting.

Zero-Shot Capability

The framework supports strict zero-shot deciphering for unseen characters. If no image prototypes exist for a new character, the system retrieves against a dictionary of textual meaning descriptions (leveraging the vision-language alignment), allowing the model to identify characters based on semantic descriptions alone.

3. Dataset and Benchmark: EvoCON

To enable systematic evaluation, the authors introduce EvoCON, a six-stage benchmark extended from the EVOBC dataset.

Scripts: Covers six major scripts: Oracle Bone (OBC), Bronze Inscriptions (BI), Seal Script (SS), Spring-and-Autumn (SAC), Warring States (WSC), and Clerical Script (CS).
Protocol: Data is onboarded stage-by-stage in a reverse-chronological order (CS $\to$ OBC).
Multimodal Augmentation: Each sample includes meaning descriptions (semantic) and shape descriptions (structural glyph details) generated via LLMs and verified by humans.
Splits: Includes a standard continual learning split and an explicit zero-shot split for characters with no training image exemplars.

4. Experimental Results

Experiments were conducted on EvoCON using Qwen3-VL-Embedding (2B and 8B parameters).

Performance: AMR-CCR significantly outperforms strong baselines (including ResNet-50, standard fine-tuning, and continual learning methods like ER, LwF, EWC, and DER++).
- Accuracy: The 8B model achieved 58.59% Top-1 accuracy (AA6) and 83.09% Top-10 accuracy, surpassing the best baseline (DER++) by +12.74% in Top-1.
- Stability: It maintained extremely low forgetting (FGT ~1.85%), demonstrating that the anchored modular approach effectively preserves cross-stage similarity structures.
Ablation Studies:
- SIA+SAR: Removing these modules caused accuracy to drop to ~28% and forgetting to spike to ~9.5%, confirming their necessity for script adaptation.
- Multi-Prototype: Using a single mean prototype degraded performance significantly, proving that capturing intra-class style diversity is crucial.
- Textual Supervision: Removing meaning/shape descriptions hurt zero-shot performance drastically, highlighting the value of multimodal alignment.

5. Key Contributions

Problem Formalization: Defined Continual CCR as a script-staged, class-incremental problem, addressing the limitations of static closed-set assumptions in cultural heritage digitization.
Framework (AMR-CCR): Proposed a novel Anchored Modular Retrieval framework that combines a frozen vision-language backbone with lightweight, script-conditioned adapters and a multi-prototype dictionary. This enables scalable class addition and robust retrieval under domain shifts.
Benchmark (EvoCON): Released a comprehensive six-stage benchmark with multimodal annotations (meaning/shape) and a zero-shot split, setting a new standard for evaluating continual learning in ancient character recognition.

6. Significance

This work represents a paradigm shift in ancient character recognition. By moving from classification to retrieval, the authors solve the scalability issue of ever-expanding character sets. The ability to add new classes without retraining the entire model, combined with the capacity to decipher unseen characters using only text descriptions, makes this approach highly practical for real-world archaeological digitization projects where data is sparse, diverse, and continuously evolving.