SwiftEmbed: Ultra-Fast Text Embeddings via Static Token Lookup for Real-Time Applications
SwiftEmbed is a production-oriented, Rust-based serving system that achieves ultra-low latency (1.12 ms p50) and high throughput (50,000 RPS) for real-time applications by utilizing static token lookup and mean pooling on the distilled Potion-base-8M model, delivering strong performance in duplicate detection and semantic similarity tasks while trading off accuracy on complex classification and retrieval workloads compared to full transformer inference.