VSPrefill: Vertical-Slash Sparse Attention with Lightweight Indexing for Long-Context Prefilling
VSPrefill is a lightweight, training-efficient sparse attention mechanism that leverages vertical-slash structural patterns and adaptive thresholding to achieve linear complexity during long-context prefilling, delivering a 4.95x speedup while preserving 98.35% of full attention accuracy on 128k context lengths.