Looking Back and Forth: Cross-Image Attention Calibration and Attentive Preference Learning for Multi-Image Hallucination Mitigation
This paper proposes CAPL, a framework that mitigates multi-image hallucinations in large vision-language models by introducing a selectable image token interaction mechanism for fine-grained cross-image alignment and a preference learning strategy that trains the model to rely on genuine visual evidence rather than textual priors.