From Data Statistics to Feature Geometry: How Correlations Shape Superposition

This paper challenges the standard view of superposition in neural networks by demonstrating that, unlike in idealized uncorrelated settings where interference is merely noise, realistic feature correlations allow models to arrange features so that interference becomes constructive, thereby naturally forming the semantic clusters and cyclical structures observed in real language models.

Lucas Prieto, Edward Stevinson, Melih Barsbey, Tolga Birdal, Pedro A. M. MedianoWed, 11 Ma🤖 cs.AI

Understanding the Use of a Large Language Model-Powered Guide to Make Virtual Reality Accessible for Blind and Low Vision People

This paper presents a study of a large language model-powered "sighted guide" for blind and low vision users in social virtual reality, revealing that participants adapt their interaction from a tool-based approach when alone to a companionable relationship in the presence of others, thereby offering key design recommendations for future accessible VR guides.

Jazmin Collins, Sharon Y Lin, Tianqi Liu, Andrea Stevenson Won, Shiri AzenkotWed, 11 Ma🤖 cs.AI

When Learning Rates Go Wrong: Early Structural Signals in PPO Actor-Critic

This paper introduces the Overfitting-Underfitting Indicator (OUI) as an efficient, early-stage metric based on hidden neuron activation patterns to distinguish optimal learning rates in PPO actor-critic training, demonstrating its superior ability to prune unpromising runs compared to traditional criteria by revealing distinct structural signatures in actor and critic networks.

Alberto Fernández-Hernández, Cristian Pérez-Corral, Jose I. Mestre, Manuel F. Dolz, Jose Duato, Enrique S. Quintana-OrtíWed, 11 Ma🤖 cs.AI

No Image, No Problem: End-to-End Multi-Task Cardiac Analysis from Undersampled k-Space

The paper proposes k-MTR, a novel framework that bypasses the traditional image reconstruction step by directly learning multi-task cardiac diagnostic features from undersampled k-space data through a shared semantic manifold, thereby eliminating reconstruction artifacts and achieving competitive performance across regression, classification, and segmentation tasks.

Yundi Zhang, Sevgi Gokce Kafali, Niklas Bubeck, Daniel Rueckert, Jiazhen PanWed, 11 Ma🤖 cs.AI

Adaptive Clinical-Aware Latent Diffusion for Multimodal Brain Image Generation and Missing Modality Imputation

The paper introduces ACADiff, an adaptive clinical-aware latent diffusion framework that synthesizes missing multimodal brain imaging data (sMRI, FDG-PET, and AV45-PET) by integrating imaging observations with GPT-4o-encoded clinical metadata, achieving superior generation quality and robust diagnostic performance even when up to 80% of modalities are missing.

Rong Zhou, Houliang Zhou, Yao Su, Brian Y. Chen, Yu Zhang, Lifang He, Alzheimer's Disease Neuroimaging InitiativeWed, 11 Ma🤖 cs.AI

Emerging Extrinsic Dexterity in Cluttered Scenes via Dynamics-aware Policy Learning

This paper introduces the Dynamics-Aware Policy Learning (DAPL) framework, which leverages explicit world modeling to learn contact-induced dynamics, enabling robots to achieve robust extrinsic dexterity in cluttered environments without hand-crafted heuristics and significantly outperforming existing manipulation methods in both simulation and real-world deployments.

Yixin Zheng, Jiangran Lyu, Yifan Zhang, Jiayi Chen, Mi Yan, Yuntian Deng, Xuesong Shi, Xiaoguang Zhao, Yizhou Wang, Zhizheng Zhang, He WangWed, 11 Ma🤖 cs.AI

MA-EgoQA: Question Answering over Egocentric Videos from Multiple Embodied Agents

This paper introduces MA-EgoQA, a novel benchmark and dataset featuring 1,700 questions across five categories designed to evaluate the ability of AI models to understand and reason over multiple long-horizon egocentric videos from embodied agents, alongside a proposed baseline model named EgoMAS that highlights current limitations in system-level multi-agent understanding.

Kangsan Kim, Yanlai Yang, Suji Kim, Woongyeong Yeo, Youngwan Lee, Mengye Ren, Sung Ju HwangWed, 11 Ma🤖 cs.AI

Ego: Embedding-Guided Personalization of Vision-Language Models

The paper proposes "Ego," an efficient personalization method for vision-language models that extracts visual tokens representing target concepts via internal attention mechanisms to serve as memory, enabling strong performance across single-concept, multi-concept, and video personalization tasks without requiring additional training stages or external modules.

Soroush Seifi, Simon Gardier, Vaggelis Dorovatas, Daniel Olmeda Reino, Rahaf AljundiWed, 11 Ma🤖 cs.AI

EXPLORE-Bench: Egocentric Scene Prediction with Long-Horizon Reasoning

This paper introduces EXPLORE-Bench, a benchmark derived from real first-person videos to evaluate the ability of multimodal large language models to perform long-horizon egocentric scene prediction, revealing significant performance gaps compared to humans and demonstrating that stepwise reasoning offers partial improvements at a computational cost.

Chengjun Yu, Xuhan Zhu, Chaoqun Du, Pengfei Yu, Wei Zhai, Yang Cao, Zheng-Jun ZhaWed, 11 Ma🤖 cs.AI