LEMON: A Large Endoscopic MONocular Dataset and Foundation Model for Perception in Surgical Settings

This paper introduces LEMON, a large-scale endoscopic monocular dataset comprising 938 hours of high-resolution surgical footage, and LemonFM, a foundation model pretrained on this data using self-supervised augmented knowledge distillation that significantly outperforms existing models across multiple surgical perception tasks.

Chengan Che, Chao Wang, Tom Vercauteren, Sophia Tsoka, Luis C. Garcia-Peraza-Herrera2026-03-24💻 cs

All Patches Matter, More Patches Better: Enhance AI-Generated Image Detection via Panoptic Patch Learning

This paper proposes the Panoptic Patch Learning (PPL) framework to overcome the "Few-Patch Bias" in AI-generated image detection by enforcing the utilization of artifacts across all image patches through random patch replacement and patch-wise contrastive learning, thereby significantly enhancing detection robustness and generalization.

Zheng Yang, Ruoxin Chen, Zhiyuan Yan, Ke-Yue Zhang, Xinghe Fu, Shuang Wu, Xiujun Shu, Taiping Yao, Shouhong Ding, Zequn Qin, Xi Li2026-03-24💻 cs

Tiny Neural Networks for Multi-Object Tracking in a Modular Kalman Framework

This paper introduces a modular, production-ready multi-object tracking framework for embedded automotive systems that integrates three compact, task-specific neural networks (SPENT, SANT, and MANTa) into a Kalman filter pipeline to significantly improve prediction accuracy and association performance while maintaining real-time suitability, interpretability, and drop-in compatibility.

Christian Alexander Holz, Christian Bader, Markus Enzweiler, Matthias Drüppel2026-03-24🤖 cs.LG

Patho-R1: A Multimodal Reinforcement Learning-Based Pathology Expert Reasoner

This paper introduces Patho-R1, a multimodal reinforcement learning-based pathology expert that leverages high-quality, reasoning-oriented datasets derived from textbooks and experts, and is trained through a three-stage pipeline of knowledge infusion, supervised fine-tuning, and reinforcement learning to significantly improve diagnostic accuracy and reasoning plausibility across various pathology tasks.

Wenchuan Zhang, Penghao Zhang, Jingru Guo, Tao Cheng, Jie Chen, Shuwan Zhang, Zhang Zhang, Yuhao Yi, Hong Bu2026-03-24🤖 cs.AI

CompBench: Benchmarking Complex Instruction-guided Image Editing

This paper introduces CompBench, a large-scale benchmark featuring fine-grained instructions and an MLLM-human collaborative framework to rigorously evaluate and expose the limitations of current models in complex, instruction-guided image editing tasks.

Bohan Jia, Wenxuan Huang, Yuntian Tang, Junbo Qiao, Jincheng Liao, Shaosheng Cao, Fei Zhao, Zhaopeng Feng, Zhouhong Gu, Zhenfei Yin, Lei Bai, Wanli Ouyang, Lin Chen, Fei Zhao, Yao Hu, Zihan Wang, Yuan (…)2026-03-24💻 cs

Foresight Diffusion: Improving Sampling Consistency in Predictive Diffusion Models

This paper introduces Foresight Diffusion (ForeDiff), a framework that enhances sampling consistency in predictive diffusion models by decoupling condition understanding from target denoising through a separate deterministic predictive stream, thereby improving both accuracy and consistency in robot video and scientific spatiotemporal forecasting tasks.

Yu Zhang, Xingzhuo Guo, Haoran Xu, Jialong Wu, Mingsheng Long2026-03-24💻 cs