StepCache: Step-Level Reuse with Lightweight Verification and Selective Patching for LLM Serving
StepCache is a backend-agnostic system that optimizes LLM serving for workloads with shared structures but localized constraints by implementing step-level reuse, lightweight verification, and selective patching to significantly reduce latency and token usage while guaranteeing output correctness.