One Language, Two Scripts: Probing Script-Invariance in LLM Concept Representations
This paper demonstrates that Sparse Autoencoder features in Gemma models capture abstract semantics rather than surface orthography by showing that identical Serbian sentences written in completely different tokenized scripts (Latin and Cyrillic) activate highly overlapping features, with this script invariance increasing as model scale grows.