Knowledge Graphs are Implicit Reward Models: Path-Derived Signals Enable Compositional Reasoning
This paper introduces a post-training paradigm that leverages knowledge graphs as implicit reward models to guide large language models in learning compositional reasoning from axiomatic facts, enabling a 14B model to outperform frontier systems on complex multi-hop medical queries through path-derived supervision.