Learning Page Order in Shuffled WOO Releases

This paper investigates document page reordering in heterogeneous Dutch freedom of information releases, identifying that while specialized models achieve high accuracy on short documents, seq2seq transformers fail to generalize to longer texts due to fundamental differences in required ordering strategies, a challenge effectively addressed through model specialization rather than curriculum learning.

Efe Kahraman, Giulio Tosato

Published 2026-03-10
📖 6 min read🧠 Deep dive

Imagine you have a massive, chaotic pile of paper documents. Some are emails, some are legal contracts, some are spreadsheets, and some are scanned notes. A government agency (in the Netherlands) has dumped all these pages into a single PDF file, but they've been completely shuffled. The pages are mixed up like a deck of cards that's been thrown in the air.

Your job? Put the deck back in order.

This is exactly what the researchers in this paper tried to do. They took 5,461 of these "shuffled decks" (called WOO documents) and asked: Can a computer figure out the correct chronological order of the pages just by reading the text on them?

Here is the breakdown of their journey, explained with some everyday analogies.

1. The Problem: A Messy Puzzle with No Picture

Usually, when you try to order pages, you look for clues: "Page 1 says 'Dear Sir,' and Page 2 says 'Sincerely.'" But these government documents are a weird mix.

  • The Analogy: Imagine a puzzle where the pieces aren't from a picture of a cat, but a random mix of a map, a grocery list, a love letter, and a tax form. The end of one page might be a legal signature, and the very next page (in the real world) might be a completely unrelated email about lunch.
  • The Challenge: Because the content jumps around so wildly, there are no obvious "clues" to tell the computer which page comes next. It's like trying to solve a puzzle where the pieces don't actually fit together visually.

2. The Contenders: Five Different Strategies

The researchers tested five different "AI detectives" to see who could solve the puzzle best.

  • The "Guess and Go" (Heuristics): These are simple rules. "Pick a page, then find the page that looks most similar to it."
    • Result: Terrible. Since the pages are so different, looking for "similar" pages is like trying to find your next step in a maze by looking for a step that looks like the last one. It doesn't work.
  • The "List Maker" (BiLSTM): This model looks at all the pages at once and gives each one a score, like a teacher grading a test. "This page feels like it belongs in spot #3."
    • Result: Decent for short documents, but it gets confused as the pile gets bigger.
  • The "Line-Up" (Pointer Networks): This model acts like a game show host. It picks one page, puts it in the line, then picks the next one from the remaining pile, and so on. It builds the order step-by-step.
    • Result: Good, but it starts to stumble when the line gets too long.
  • The "Translator" (Seq2Seq Transformers): This is a very powerful, modern AI model. It tries to look at the whole shuffled pile and "translate" it into an ordered list, one page at a time.
    • Result: The Big Failure. It worked amazingly well for short documents (2–5 pages), but when the pile got big (20+ pages), it completely crashed. It went from being a genius to being worse than random guessing.
  • The "Matchmaker" (Pairwise Ranking): Instead of trying to build the whole line at once, this model asks a simple question for every possible pair of pages: "Does Page A come before Page B?" It does this for every single pair, then adds up the votes to build the final order.
    • Result: The Winner. It was the most consistent and accurate, especially for longer documents.

3. The Big Surprises

Surprise #1: The "Translator" Crashed on Long Documents

The "Translator" model (Seq2Seq) was great for short stories but failed miserably on long novels.

  • The Analogy: Imagine a student who memorizes the answers to a 5-question quiz perfectly. But when you give them a 25-question quiz, they panic and start guessing randomly.
  • Why? The researchers found that the model was relying too much on "position tags" (like labels saying "I am Page 1," "I am Page 2"). When the document got longer than what it saw in training, it got lost. Even when they tried to fix the labels, the model still failed, suggesting the whole "step-by-step" approach just doesn't work for these messy, long documents.

Surprise #2: "Baby Steps" Didn't Help (Curriculum Learning)

In education, we often teach kids simple things first (addition) before hard things (calculus). This is called "Curriculum Learning." The researchers tried teaching the AI to order short documents first, then gradually moving to longer ones.

  • The Result: It actually made things worse (39% worse on long documents).
  • The Analogy: It's like teaching a driver to park in an empty parking lot, then expecting them to drive a Formula 1 race car immediately after. The skills are different!
  • Why? The AI learned that short documents need to look at nearby pages to find order. But long documents need to look at the whole picture to find order. By forcing the AI to learn the "short way" first, it got stuck in a bad habit and couldn't switch to the "long way" later.

4. The Winning Strategy: Specialized Teams

The best solution wasn't one giant AI trying to do everything. Instead, they built five specialized teams.

  • One team only handled 2–5 page documents.
  • Another team only handled 21–25 page documents.
  • The Analogy: Instead of hiring one general contractor to fix a leaky faucet and build a skyscraper, you hire a plumber for the faucet and a structural engineer for the skyscraper.
  • The Result: This approach worked incredibly well. For documents up to 15 pages, they got the order right almost 95% of the time. Even for the massive 25-page documents, they improved the accuracy significantly compared to the "one-size-fits-all" model.

The Bottom Line

This paper teaches us that one size does not fit all in AI.

  1. Messy data is hard: When documents are a random mix of emails and spreadsheets, simple clues don't work.
  2. Step-by-step fails for long tasks: Trying to build a long list one item at a time (like the "Translator" model) causes the AI to lose its way.
  3. Specialization wins: Breaking the problem down and giving the AI specific tools for specific lengths of documents is the key to success.

The researchers have made their code and data public, so anyone can try to solve this "shuffled government document" puzzle themselves!