Learning Page Order in Shuffled WOO Releases
This paper investigates document page reordering in heterogeneous Dutch freedom of information releases, identifying that while specialized models achieve high accuracy on short documents, seq2seq transformers fail to generalize to longer texts due to fundamental differences in required ordering strategies, a challenge effectively addressed through model specialization rather than curriculum learning.