What Is Missing: Interpretable Ratings for Large Language Model Outputs
This paper introduces the "What Is Missing" (WIM) rating system, which converts natural language feedback describing output deficiencies into interpretable scalar ratings via sentence embedding similarity, thereby improving preference learning signals and enabling qualitative debugging compared to traditional discrete numerical ratings.