Local-Global Prompt Learning via Sparse Optimal Transport

The paper proposes SOT-GLP, a novel few-shot adaptation method for vision-language models that employs shared sparse optimal transport to partition visual regions among class-specific local prompts while maintaining global alignment, thereby achieving state-of-the-art performance in both classification accuracy and out-of-distribution detection by preserving the native feature geometry.

Deniz Kizaro\u{g}lu, Ülku Tuncer Küçüktas, Emre Çakmakyurdu, Alptekin Temizel2026-03-10💻 cs

Δ\DeltaVLA: Prior-Guided Vision-Language-Action Models via World Knowledge Variation

This paper introduces Δ\DeltaVLA, a prior-guided framework that enhances robotic manipulation by modeling discrete world-knowledge variations relative to an explicit current state prior, rather than predicting absolute future states, thereby achieving state-of-the-art performance and efficiency through its novel components: the Prior-Guided World Knowledge Extractor, Latent World Variation Quantization, and Conditional Variation Attention.

Yijie Zhu, Jie He, Rui Shao, Kaishen Yuan, Tao Tan, Xiaochen Yuan, Zitong Yu2026-03-10💻 cs

This Looks Distinctly Like That: Grounding Interpretable Recognition in Stiefel Geometry against Neural Collapse

This paper introduces Adaptive Manifold Prototypes (AMP), a framework that leverages Stiefel manifold optimization to represent class prototypes as orthonormal bases, thereby preventing prototype collapse caused by Neural Collapse while achieving state-of-the-art accuracy and improved causal faithfulness in fine-grained recognition.

Junhao Jia, Jiaqi Wang, Yunyou Liu, Haodong Jing, Yueyi Wu, Xian Wu, Yefeng Zheng2026-03-10💻 cs

Rectified flow-based prediction of post-treatment brain MRI from pre-radiotherapy priors for patients with glioma

This study presents a rectified flow-based AI model that generates realistic post-treatment brain MRIs from pre-radiotherapy priors and dose maps for glioma patients, achieving high structural fidelity and significantly faster inference than diffusion models to support adaptive treatment planning.

Selena Huisman, Nordin Belkacemi, Vera Keil, Joost Verhoeff, Szabolcs David2026-03-10💻 cs

AULLM++: Structural Reasoning with Large Language Models for Micro-Expression Recognition

AULLM++ is a structural reasoning framework that leverages Large Language Models to enhance micro-expression Action Unit detection by fusing multi-granularity visual features with learned AU correlations through a three-stage evidence construction, structure modeling, and deduction-based prediction process, achieving state-of-the-art performance and superior cross-domain generalization.

Zhishu Liu, Kaishen Yuan, Bo Zhao, Hui Ma, Zitong Yu2026-03-10💻 cs

SPIRAL: A Closed-Loop Framework for Self-Improving Action World Models via Reflective Planning Agents

SPIRAL is a closed-loop framework that enhances controllable long-horizon video generation by integrating a reflective planning process with iterative action world modeling, enabling self-improvement through explicit planning, object-centric decomposition, and feedback-driven refinement.

Yu Yang, Yue Liao, Jianbiao Mei, Baisen Wang, Xuemeng Yang, Licheng Wen, Jiangning Zhang, Xiangtai Li, Hanlin Chen, Botian Shi, Yong Liu, Shuicheng Yan, Gim Hee Lee2026-03-10💻 cs

Information Maximization for Long-Tailed Semi-Supervised Domain Generalization

This paper proposes IMaX, a simple yet effective objective based on the InfoMax principle that maximizes mutual information between learned features and latent labels while mitigating class-balance bias through an α\alpha-entropic term, thereby significantly improving the performance of state-of-the-art semi-supervised domain generalization methods in long-tailed distribution scenarios.

Leo Fillioux, Omprakash Chakraborty, Quentin Gopée, Pierre Marza, Paul-Henry Cournède, Stergios Christodoulidis, Maria Vakalopoulou, Ismail Ben Ayed, Jose Dolz2026-03-10💻 cs

Alfa: Attentive Low-Rank Filter Adaptation for Structure-Aware Cross-Domain Personalized Gaze Estimation

The paper proposes Alfa, an attentive low-rank filter adaptation method that reweights pre-trained semantic features via singular value decomposition and attention mechanisms to achieve efficient, sample-efficient test-time personalization for cross-domain gaze estimation, outperforming existing methods while demonstrating applicability beyond computer vision.

He-Yen Hsieh, Wei-Te Mark Ting, H. T. Kung2026-03-10💻 cs

Spherical-GOF: Geometry-Aware Panoramic Gaussian Opacity Fields for 3D Scene Reconstruction

Spherical-GOF is a novel geometry-aware panoramic rendering framework that extends Gaussian Opacity Fields to spherical ray space, achieving superior geometric consistency and photometric quality in 3D scene reconstruction by introducing efficient spherical culling and adaptive filtering to overcome the limitations of existing perspective-based adaptations.

Zhe Yang, Guoqiang Zhao, Sheng Wu, Kai Luo, Kailun Yang2026-03-10💻 cs

OccTrack360: 4D Panoptic Occupancy Tracking from Surround-View Fisheye Cameras

This paper introduces OccTrack360, a new benchmark for 4D panoptic occupancy tracking from surround-view fisheye cameras featuring long, diverse sequences and principled voxel visibility annotations, alongside the proposed Focus on Sphere Occ (FoSOcc) framework that effectively addresses fisheye distortion and localization challenges to establish a strong baseline for future research.

Yongzhi Lin, Kai Luo, Yuanfan Zheng, Hao Shi, Mengfei Duan, Yang Liu, Kailun Yang2026-03-10💻 cs

Interactive World Simulator for Robot Policy Training and Evaluation

This paper presents the Interactive World Simulator, a fast and physically consistent framework leveraging consistency models to generate high-fidelity long-horizon video predictions that enable scalable robot policy training and reliable real-world evaluation using solely simulated data.

Yixuan Wang, Rhythm Syed, Fangyu Wu, Mengchao Zhang, Aykut Onol, Jose Barreiros, Hooshang Nayyeri, Tony Dear, Huan Zhang, Yunzhu Li2026-03-10🤖 cs.LG

DualFlexKAN: Dual-stage Kolmogorov-Arnold Networks with Independent Function Control

The paper introduces DualFlexKAN, a flexible dual-stage Kolmogorov-Arnold Network architecture that decouples input transformations and output activations to support diverse basis functions and regularization, achieving superior accuracy and convergence with significantly fewer parameters than standard KANs while mitigating their scalability limitations.

Andrés Ortiz, Nicolás J. Gallego-Molina, Carmen Jiménez-Mesa, Juan M. Górriz, Javier Ramírez2026-03-10🤖 cs.LG

PRISM: Streaming Human Motion Generation with Per-Joint Latent Decomposition

PRISM introduces a streaming human motion generation framework that employs a joint-factorized latent space and noise-free condition injection within a single foundation model to overcome representation entanglement and error accumulation, thereby unifying text-to-motion, pose-conditioned, and long-horizon sequential synthesis with state-of-the-art performance.

Zeyu Ling, Qing Shuai, Teng Zhang, Shiyang Li, Bo Han, Changqing Zou2026-03-10💻 cs