DISCO: Document Intelligence Suite for COmparative Evaluation

The paper introduces DISCO, a comparative evaluation suite that reveals distinct performance strengths of OCR pipelines and vision-language models across diverse document types, providing empirical guidance for selecting the optimal processing strategy based on specific document characteristics and reasoning requirements.

Kenza Benkirane, Dan Goldwater, Martin Asenov, Aneiss Ghodsi

Published 2026-03-26
📖 4 min read☕ Coffee break read

Imagine you have a massive, messy library filled with all kinds of documents: handwritten letters from the 1920s, colorful infographics about space, complex medical prescriptions, and multi-page financial reports. You want to ask a computer, "What does this say?" or "What is the total cost here?"

The paper "DISCO" is like a giant, rigorous taste test for two different types of "librarians" (AI systems) that try to read these documents for you. The authors want to figure out: Which librarian should you hire for which job?

Here is the breakdown of the two librarians and what the study found:

The Two Librarians

  1. The "Scanner" (OCR Pipeline):

    • How they work: This librarian is like a high-tech photocopier. First, they scan the page and turn every single letter into plain text (like a Word document). Then, they hand that text to a second librarian (a language model) to answer your question.
    • Superpower: They are incredibly precise with handwriting and very long documents. They don't get confused by messy layouts because they just focus on the letters.
    • Weakness: If the text is in a weird font, a different language, or part of a complex chart, they might miss the context or get the layout wrong. They also lose the "picture" of the document once they turn it into text.
  2. The "Artist" (VLM - Vision-Language Model):

    • How they work: This librarian looks at the whole picture at once. They don't just read the words; they see the colors, the charts, the handwriting style, and where things are placed on the page. They answer your question directly from the image.
    • Superpower: They are amazing at understanding charts, colorful infographics, and documents with many different languages mixed together. They "get" the vibe of the document.
    • Weakness: They can get overwhelmed by huge, multi-page documents (like a 100-page contract) and sometimes they might "hallucinate" (make up details) if the handwriting is too messy.

The Great Taste Test (The Results)

The researchers tested these librarians on a buffet of different documents. Here is what they discovered:

  • The Handwriting Challenge:

    • Analogy: Imagine trying to read a doctor's messy scribble.
    • Result: The Scanner wins here. It's trained specifically to decipher messy handwriting. The Artist gets confused and makes more mistakes, unless you give them very specific instructions (a "task-aware prompt").
  • The Multilingual & Chart Challenge:

    • Analogy: Imagine a menu with Chinese, French, and English mixed together, with pictures of food.
    • Result: The Artist wins here. They are used to seeing different scripts and visual layouts. The Scanner struggles to read non-English letters and often breaks the connection between a chart and its label.
  • The "Long Book" Challenge:

    • Analogy: Asking a question about a specific detail in a 50-page legal contract.
    • Result: The Scanner wins again. When you have a huge document, the Artist gets lost in the noise. The Scanner breaks the document down into text, making it easier to find the needle in the haystack.
  • The "Single Page" Challenge:

    • Analogy: A simple invoice or a postcard.
    • Result: The Artist wins. Since the document is small and visual, looking at the whole picture is faster and more accurate than scanning it first and then reading the text.

The "Prompt" Surprise

The researchers also tried giving the librarians different instructions (prompts).

  • The Finding: Sometimes, giving the Artist specific instructions (like "Be careful with handwriting") helped. But other times, it actually made them worse at their job! It's like telling a chef, "Don't burn the toast," and they end up burning it because they were overthinking it. There is no "one size fits all" instruction.

The Big Takeaway

The paper concludes that there is no single "best" AI for all documents.

  • If you have handwritten notes, medical forms, or long contracts, use the Scanner (OCR). It's the reliable, methodical worker.
  • If you have colorful charts, infographics, or mixed-language documents, use the Artist (VLM). It's the creative, visual thinker.

DISCO is essentially a guidebook that tells businesses: "Don't just buy the most expensive AI and hope for the best. Look at your document first. If it's messy and long, hire the Scanner. If it's visual and colorful, hire the Artist."

This saves companies money and prevents errors by matching the right tool to the right job.