N-gram Injection into Transformers for Dynamic Language Model Adaptation in Handwritten Text Recognition

This paper proposes an N-gram Injection (NGI) method that dynamically adapts Transformer-based handwritten text recognition models to target language distributions at inference time by injecting external n-gram language models, thereby significantly reducing performance gaps caused by language shifts without requiring additional training on target data.

Florent Meyer, Laurent Guichard, Denis Coquenet, Guillaume Gravier, Yann Soullard, Bertrand Coüasnon

Published 2026-03-05
📖 5 min read🧠 Deep dive

The Problem: The "Over-Confident" Translator

Imagine you hire a brilliant translator to read handwritten notes. You train this translator exclusively on French recipes. They become a master at reading "flour," "sugar," and "oven." They are so good at French recipes that they can read them even if the handwriting is messy.

Now, imagine you hand them a handwritten medical prescription (the "target" task). The handwriting looks similar to the recipes, but the words are totally different: "dosage," "pill," "heart."

Because the translator was so deeply trained on French recipes, their brain is biased. When they see a messy scribble that looks like a word, their brain automatically guesses, "Oh, that must be 'flour'!" even if it's actually "pill." They are so confident in their training that they fail to recognize the new context.

In the world of computers, this is called Language Shift. Modern AI (Transformers) is great at reading handwriting, but if the words it sees at test time are different from the words it learned during training, its performance crashes.

The Solution: The "Dynamic Dictionary" (NGI)

The authors of this paper propose a clever fix called N-gram Injection (NGI).

Instead of retraining the whole translator (which takes forever and requires thousands of new examples), they give the translator a dynamic dictionary right at the moment of reading.

  • The Old Way: The AI tries to guess the word based only on what it learned in the past.
  • The New Way (NGI): As the AI reads the messy handwriting, it simultaneously looks at a "cheat sheet" (an n-gram model) that contains the most likely words for this specific situation.

If the AI is reading a medical form, the cheat sheet says, "Hey, in this context, the next word is likely 'aspirin' or 'dose,' not 'cupcake'." The AI then adjusts its guess instantly.

How It Works: The "Early Intervention"

Most people try to fix this problem after the AI makes a mistake (like a teacher correcting a student's essay at the end). This paper suggests a better approach: Early Injection.

Imagine the AI is a detective solving a mystery.

  1. Standard AI: The detective looks at the clue (the handwriting) and guesses the suspect based on their past cases.
  2. NGI AI: Before the detective even starts guessing, you hand them a file of "Current Suspects" (the n-gram data). The detective looks at the handwriting and the file simultaneously. They learn to weigh the handwriting clues against the current suspect list.

By injecting this information early into the AI's decision-making process, the AI learns to balance what it sees (the image) with what it expects (the language rules) in real-time.

The "N-gram" Concept

What is an n-gram? Think of it as a "predictive text" feature on your phone, but supercharged.

  • If you type "I am going to the...", your phone knows the next word is likely "store," "park," or "gym."
  • An n-gram is just a statistical map of these word combinations.
  • The Magic: You can swap these maps instantly. If you switch from reading recipes to reading medical forms, you just swap the "Recipe Map" for the "Medical Map." The AI doesn't need to be retrained; it just needs the new map.

The "Word Attention Network" (WAN)

The authors also built a new, lightweight AI model called WAN (Word Attention Network) to test this.

  • Think of big AI models as heavy trucks. They are powerful but slow and expensive to fuel (train).
  • WAN is a scooter. It's small, fast, and efficient.
  • The paper shows that even with this small scooter, if you give it the right "Dynamic Dictionary" (NGI), it can outperform the heavy trucks on specific tasks without needing a massive engine.

The Results: Why It Matters

The team tested this on three different handwriting datasets (like switching from reading a student's essay to reading a doctor's note).

  1. Without NGI: When the language changed, the AI's error rate doubled or tripled. It was confused and useless.
  2. With NGI: By swapping the "cheat sheet" (n-gram) to match the new text, the AI's accuracy stayed high. It didn't get confused by the shift.

The Big Win:
Usually, to fix an AI that is confused by new data, you have to feed it thousands of new examples and retrain it for days. This paper shows you can fix it instantly just by changing the language guide (the n-gram) at the moment of reading. No extra training, no extra cost, just a smarter way to look at the data.

Summary Analogy

Imagine you are playing a video game where the rules change every level.

  • Old AI: You memorize the rules for Level 1. When you get to Level 2, you keep trying to use Level 1 rules and you lose.
  • This Paper's AI: You are given a rulebook for the current level right before you start. You don't need to relearn the game; you just read the new rulebook and play perfectly.

This method allows computers to read messy handwriting from any context (legal forms, medical notes, historical letters) without needing to be retrained for every single new job.