Beyond Accuracy: Evaluating Visual Grounding In Multimodal Medical Reasoning

This paper introduces a counterfactual evaluation framework revealing that while reinforcement learning with verifiable rewards improves accuracy on medical VQA benchmarks, it often degrades genuine visual grounding by enabling models to rely on text shortcuts and hallucinate visual reasoning, necessitating new evaluation metrics and training objectives that explicitly enforce visual dependence.

Anas Zafar, Leema Krishna Murali, Ashish Vashist2026-03-05💻 cs

Geographically-Weighted Weakly Supervised Bayesian High-Resolution Transformer for 200m Resolution Pan-Arctic Sea Ice Concentration Mapping and Uncertainty Estimation using Sentinel-1, RCM, and AMSR2 Data

This study proposes a novel Geographically-Weighted Weakly Supervised Bayesian High-Resolution Transformer that fuses Sentinel-1, RCM, and AMSR2 data to generate 200m resolution pan-Arctic sea ice concentration maps with reliable uncertainty estimates, effectively overcoming challenges related to subtle feature extraction, inexact labels, and data heterogeneity.

Mabel Heffring, Lincoln Linlin Xu2026-03-05🤖 cs.LG

PinCLIP: Large-scale Foundational Multimodal Representation at Pinterest

This paper introduces PinCLIP, a large-scale foundational multimodal representation model for Pinterest that employs a novel hybrid Vision Transformer architecture and neighbor alignment objectives to overcome VLM integration challenges, resulting in significant improvements in multi-modal retrieval accuracy, cold-start content distribution, and overall user engagement.

Josh Beal, Eric Kim, Jinfeng Rao + 3 more2026-03-05💻 cs

Parallax to Align Them All: An OmniParallax Attention Mechanism for Distributed Multi-View Image Compression

The paper proposes ParaHydra, a novel distributed multi-view image compression framework featuring an OmniParallax Attention Mechanism and a Parallax Multi Information Fusion Module that adaptively aligns and integrates inter-view correlations, enabling it to significantly outperform state-of-the-art multi-view codecs in both bitrate efficiency and computational speed.

Haotian Zhang, Feiyue Long, Yixin Yu + 7 more2026-03-05💻 cs