An Extreme Multi-label Text Classification (XMTC) Library Dataset: What if we took "Use of Practical AI in Digital Libraries" seriously?
This paper introduces a large bilingual (English/German) corpus of catalog records annotated with the Integrated Authority File (GND) and a machine-actionable GND taxonomy to enable ontology-aware multi-label classification and agent-assisted cataloging, aiming to develop transparent, authority-anchored AI tools that enhance the efficiency and scalability of subject indexing in digital libraries.