Optimizing Language Models for Crosslingual Knowledge Consistency
This paper introduces Direct Consistency Optimization (DCO), a reinforcement learning-inspired method that significantly improves crosslingual knowledge consistency in large language models by deriving a structured reward function directly from the model itself, thereby eliminating the need for an explicit reward model while outperforming existing approaches.