PatchDNA: A Flexible and Biologically-Informed Alternative to Tokenization for DNA
The paper introduces PatchDNA, a flexible, biologically-informed approach that replaces fixed tokenization with evolutionary conservation-guided patching to create more efficient DNA language models that achieve state-of-the-art performance with significantly smaller architectures while allowing dynamic adjustments without retraining.