ScribeTokens: Fixed-Vocabulary Tokenization of Digital Ink
The paper introduces ScribeTokens, a fixed-vocabulary tokenization method for digital ink that decomposes pen movements into unit pixel steps, demonstrating superior performance over vector representations in both handwritten text generation and recognition, particularly when enhanced by a novel next-ink-token prediction pretraining strategy.