ArcLight: A Lightweight LLM Inference Architecture for Many-Core CPUs
ArcLight is a lightweight LLM inference architecture designed specifically for many-core CPUs that overcomes cross-NUMA memory access bottlenecks through efficient memory management, thread scheduling, and controlled tensor parallelism, achieving up to 46% higher throughput than mainstream frameworks while maintaining broad device compatibility.