MJ1: Multimodal Judgment via Grounded Verification
The paper introduces MJ1, a 3B-parameter multimodal judge that leverages reinforcement learning with a structured grounded verification chain and counterfactual consistency rewards to achieve state-of-the-art accuracy on MMRB2, outperforming significantly larger models by effectively grounding decisions in visual evidence.