Seeing Through Uncertainty: A Free-Energy Approach for Real-Time Perceptual Adaptation in Robust Visual Navigation

This paper introduces FEP-Nav, a biologically-inspired framework that enables robust real-time visual navigation by minimizing Variational Free Energy through a dual-mechanism architecture of top-down decoding and adaptive normalization, allowing autonomous agents to maintain performance under noisy and shifting sensory conditions without gradient-based updates.

Maytus Piriyajitakonkij, Rishabh Dev Yadav, Mingfei Sun + 2 more2026-03-06💻 cs

EasyAnimate: High-Performance Video Generation Framework with Hybrid Windows Attention and Reward Backpropagation

EasyAnimate is a high-performance video generation framework that leverages diffusion transformers enhanced by Hybrid Window Attention for improved efficiency, reward backpropagation for better quality alignment, and additional optimizations like token-length training and multimodal text encoding to achieve state-of-the-art results.

Jiaqi Xu, Kunzhe Huang, Xinyi Zou + 5 more2026-03-06💻 cs

Flatness Guided Test-Time Adaptation for Vision-Language Models

This paper proposes Flatness-Guided Adaptation (FGA), a novel framework for Vision-Language Models that unifies training and test-time procedures by leveraging sharpness-aware prompt tuning to identify flat minima and a sharpness-based sample selection strategy to align them with test data, thereby achieving superior performance with reduced computational overhead compared to existing test-time adaptation methods.

Aodi Li, Liansheng Zhuang, Xiao Long + 2 more2026-03-06💻 cs

MedFuncta: A Unified Framework for Learning Efficient Medical Neural Fields

This paper introduces MedFuncta, a unified meta-learning framework that encodes diverse medical images into compact 1D latent vectors to train shared, continuous neural fields at scale, while optimizing training efficiency through sparse supervision and a novel frequency schedule, and releases the accompanying MedNF dataset with over 500,000 latent vectors to advance large-scale medical neural field research.

Paul Friedrich, Florentin Bieder, Julian McGinnis + 3 more2026-03-06💻 cs

Collaborative Learning of Local 3D Occupancy Prediction and Versatile Global Occupancy Mapping

This paper proposes LMPOcc, a plug-and-play framework that leverages a lightweight fusion module to integrate global occupancy priors into local 3D semantic prediction while simultaneously updating global maps via multi-vehicle crowdsourcing, thereby achieving state-of-the-art performance and enabling scalable, open-vocabulary 3D scene understanding.

Shanshuai Yuan, Julong Wei, Muer Tie + 3 more2026-03-06💻 cs

RESAR-BEV: An Explainable Progressive Residual Autoregressive Approach for Camera-Radar Fusion in BEV Segmentation

RESAR-BEV is an explainable, progressive residual autoregressive framework for camera-radar fusion in Bird's-Eye-View segmentation that achieves state-of-the-art performance (54.0% mIoU) and real-time speed (14.6 FPS) on the nuScenes dataset by employing a coarse-to-fine Drive-Transformer and Modifier-Transformer architecture, robust dual-path voxel encoding, and decoupled supervision to overcome multi-modal misalignment and sensor noise.

Zhiwen Zeng, Yunfei Yin, Zheng Yuan + 2 more2026-03-06💻 cs

DHECA-SuperGaze: Dual Head-Eye Cross-Attention and Super-Resolution for Unconstrained Gaze Estimation

This paper introduces DHECA-SuperGaze, a deep learning framework that enhances unconstrained gaze estimation by integrating super-resolution for low-quality images and a dual head-eye cross-attention module to model head-eye interactions, while also correcting annotation errors in the Gaze360 dataset to achieve state-of-the-art accuracy and robust generalization.

Franko Šikić, Donik Vršnak, Sven Lončarić2026-03-06💻 cs

EDITOR: Effective and Interpretable Prompt Inversion for Text-to-Image Diffusion Models

The paper proposes EDITOR, an effective and interpretable prompt inversion technique for text-to-image diffusion models that combines pre-trained captioning initialization, latent space refinement, and embedding-to-text conversion to outperform existing methods in image similarity, textual alignment, and generalizability while enabling diverse downstream applications.

Mingzhe Li, Kejing Xia, Gehao Zhang + 5 more2026-03-06💻 cs