RTP-LLM: High-Performance Alibaba LLM Inference Engine
RTP-LLM is a high-performance, open-source inference engine deployed at Alibaba Group that achieves superior throughput and latency reductions compared to vLLM and SGLang through integrated optimizations like Prefill-Decode Disaggregation, hierarchical KV cache management, and modular speculative decoding.