FedARKS: Federated Aggregation via Robust and Discriminative Knowledge Selection and Integration for Person Re-identification

FedARKS is a novel federated learning framework for person re-identification that overcomes the limitations of global feature reliance and uniform averaging by introducing Robust Knowledge and Knowledge Selection mechanisms to capture subtle domain-invariant details and prioritize high-quality client contributions for improved domain generalization.

Xin Xu, Binchang Ma, Zhixi Yu, Wei Liu2026-03-09💻 cs

Place-it-R1: Unlocking Environment-aware Reasoning Potential of MLLM for Video Object Insertion

Place-it-R1 is an end-to-end framework that leverages Multimodal Large Language Models (MLLMs) with Chain-of-Thought reasoning to orchestrate video diffusion via a "Think-then-Place" paradigm, ensuring physically consistent and environment-aware video object insertion through iterative refinement and user-controllable plausibility-fidelity trade-offs.

Bohai Gu, Taiyi Wu, Dazhao Du, Jian Liu, Shuai Yang, Xiaotong Zhao, Alan Zhao, Song Guo2026-03-09🤖 cs.AI

Longitudinal NSCLC Treatment Progression via Multimodal Generative Models

This paper introduces a Virtual Treatment (VT) framework that utilizes dose-aware multimodal conditional image-to-image translation, specifically leveraging diffusion-based models, to synthesize plausible longitudinal CT scans of non-small cell lung cancer (NSCLC) tumor evolution under radiotherapy, thereby supporting in-silico treatment monitoring and adaptive radiotherapy research.

Massimiliano Mantegna, Elena Mulero Ayllón, Alice Natalina Caragliano, Francesco Di Feola, Claudia Tacconi, Michele Fiore, Edy Ippolito, Carlo Greco, Sara Ramella, Philippe C. Cattin, Paolo Soda, Matteo Tortora, Valerio Guarrasi2026-03-09💻 cs

VLM-RobustBench: A Comprehensive Benchmark for Robustness of Vision-Language Models

This paper introduces VLM-RobustBench, a comprehensive benchmark evaluating the robustness of four vision-language model families across 133 corruption settings, revealing that current models are semantically strong but spatially fragile, with low-severity geometric distortions causing significantly larger performance drops than visually severe photometric corruptions.

Rohit Saxena, Alessandro Suglia, Pasquale Minervini2026-03-09🤖 cs.AI

A Semi-Supervised Framework for Breast Ultrasound Segmentation with Training-Free Pseudo-Label Generation and Label Refinement

This paper proposes a semi-supervised framework for breast ultrasound segmentation that leverages training-free, appearance-based prompts in vision-language models to generate structurally consistent pseudo-labels, which are then refined through a dual-teacher mechanism and contrastive learning to achieve fully supervised-level performance with only 2.5% labeled data.

Ruili Li, Jiayi Ding, Ruiyu Li, Yilun Jin, Shiwen Ge, Yuwen Zeng, Xiaoyong Zhang, Eichi Takaya, Jan Vrba, Noriyasu Homma2026-03-09💻 cs

Making Training-Free Diffusion Segmentors Scale with the Generative Power

This paper addresses the scalability limitations of training-free diffusion segmentors by identifying and bridging gaps in attention map aggregation and token score imbalances through proposed techniques of auto aggregation and per-pixel rescaling, thereby enabling better utilization of powerful generative models for semantic segmentation.

Benyuan Meng, Qianqian Xu, Zitai Wang, Xiaochun Cao, Longtao Huang, Qingming Huang2026-03-09💻 cs

Contrastive-to-Self-Supervised: A Two-Stage Framework for Script Similarity Learning

This paper proposes a two-stage framework that first trains a contrastive encoder on labeled invented alphabets and then uses teacher-student distillation to learn unsupervised, deformation-invariant embeddings for historically attested scripts, effectively bridging supervised discriminative learning with unsupervised discovery of latent cross-script similarities without requiring ground-truth evolutionary relationships.

Claire Roman, Philippe Meyer2026-03-09🤖 cs.AI

Towards Motion Turing Test: Evaluating Human-Likeness in Humanoid Robots

This paper introduces the Motion Turing Test framework and the HHMotion dataset to evaluate human-likeness in humanoid robots by analyzing kinematic data, revealing current motion deviations and demonstrating that a specialized baseline model outperforms multimodal large language models in automatically predicting human-likeness scores.

Mingzhe Li, Mengyin Liu, Zekai Wu, Xincheng Lin, Junsheng Zhang, Ming Yan, Zengye Xie, Changwang Zhang, Chenglu Wen, Lan Xu, Siqi Shen, Cheng Wang2026-03-09💻 cs

CRIMSON: A Clinically-Grounded LLM-Based Metric for Generative Radiology Report Evaluation

This paper introduces CRIMSON, a clinically grounded evaluation framework for chest X-ray report generation that leverages patient context, guideline-based severity weighting, and a comprehensive error taxonomy to achieve superior alignment with radiologist judgments compared to existing metrics.

Mohammed Baharoon, Thibault Heintz, Siavash Raissi, Mahmoud Alabbad, Mona Alhammad, Hassan AlOmaish, Sung Eun Kim, Oishi Banerjee, Pranav Rajpurkar2026-03-09🤖 cs.AI