VideoChat-M1: Collaborative Policy Planning for Video Understanding via Multi-Agent Reinforcement Learning

VideoChat-M1 introduces a novel multi-agent system for video understanding that employs a learnable Collaborative Policy Planning paradigm, where multiple agents dynamically generate, execute, and refine tool invocation strategies through interaction and multi-agent reinforcement learning to achieve state-of-the-art performance across diverse video benchmarks.

Boyu Chen, Zikang Wang, Zhengrong Yue + 9 more2026-03-05💻 cs

Tracing 3D Anatomy in 2D Strokes: A Multi-Stage Projection Driven Approach to Cervical Spine Fracture Identification

This paper presents an automated, multi-stage pipeline that identifies cervical spine fractures by fusing orthogonal 2D segmentations to estimate 3D volumes of interest, which are then analyzed using a 2.5D CNN-Transformer ensemble to achieve diagnostic performance comparable to expert radiologists while reducing computational dimensionality.

Fabi Nahian Madhurja, Rusab Sarmun, Muhammad E. H. Chowdhury + 3 more2026-03-05🤖 cs.AI

When Safety Collides: Resolving Multi-Category Harmful Conflicts in Text-to-Image Diffusion via Adaptive Safety Guidance

This paper proposes Conflict-aware Adaptive Safety Guidance (CASG), a training-free framework that dynamically identifies and applies category-specific safety directions to resolve harmful conflicts in text-to-image diffusion models, thereby significantly reducing overall harmful output rates compared to existing methods.

Yongli Xiang, Ziming Hong, Zhaoqing Wang + 3 more2026-03-05💻 cs

Momentum Memory for Knowledge Distillation in Computational Pathology

The paper proposes Momentum Memory Knowledge Distillation (MoMKD), a cross-modal framework that utilizes a momentum-updated memory to aggregate genomic and histopathology information across batches and decouples branch gradients, thereby overcoming the limitations of batch-local alignment and enabling robust, generalizable cancer diagnosis using histology-only inference.

Yongxin Guo, Hao Lu, Onur C. Koyun + 3 more2026-03-05💻 cs

Beyond Dominant Patches: Spatial Credit Redistribution For Grounded Vision-Language Models

This paper introduces Spatial Credit Redistribution (SCR), a training-free inference-time method that mitigates hallucinations in Vision-Language Models by redistributing suppressed visual attention from dominant patches to their spatial neighbors, thereby significantly reducing hallucination rates across multiple benchmarks while preserving generation quality and maintaining negligible latency.

Niamul Hassan Samin, Md Arifur Rahman, Abdullah Ibne Hanif Arean + 2 more2026-03-05🤖 cs.AI