XFP: Quality-Targeted Adaptive Codebook Quantization with Sparse Outlier Separation for LLM Inference
XFP is a dynamic, calibration-free weight quantization method for LLM inference that automatically determines codebook parameters based on user-specified quality thresholds, effectively separating sparse outliers to achieve high throughput and accuracy while enabling massive models to fit within constrained memory via its quality-driven "H-Process" iteration.