FlashPrefill: Instantaneous Pattern Discovery and Thresholding for Ultra-Fast Long-Context Prefilling
FlashPrefill is a novel framework that achieves ultra-fast long-context prefilling by combining instantaneous block-searching for dynamic sparse patterns with a thresholding mechanism to eliminate long-tail attention scores, delivering up to a 27.78x speedup on 256K sequences while maintaining efficiency on shorter contexts.