Almost-Optimal Upper and Lower Bounds for Clustering in Low Dimensional Euclidean Spaces

This paper improves the running time of (1+ε)(1+\varepsilon)-approximation algorithms for kk-median and kk-means clustering in low-dimensional Euclidean spaces to $2^{\tilde{O}(1/\varepsilon)^{d-1}} \cdot n \cdot \text{polylog}(n)andestablishesanalmostmatchinglowerboundundertheGapExponentialTimeHypothesis,demonstratingthatthisdependenceon and establishes an almost matching lower bound under the Gap Exponential Time Hypothesis, demonstrating that this dependence on 1/\varepsilonanddimension and dimension d$ is essentially optimal.

Vincent Cohen-Addad, Karthik C. S., David Saulpic, Chris SchwiegelshohnWed, 11 Ma💻 cs

VLM-Loc: Localization in Point Cloud Maps via Vision-Language Models

This paper introduces VLM-Loc, a framework that leverages large vision-language models to achieve precise text-to-point-cloud localization by transforming 3D maps into bird's-eye-view images and scene graphs for enhanced spatial reasoning, alongside the release of the CityLoc benchmark for systematic evaluation.

Shuhao Kang, Youqi Liao, Peijie Wang, Wenlong Liao, Qilin Zhang, Benjamin Busam, Xieyuanli Chen, Yun LiuWed, 11 Ma💻 cs

BrainSTR: Spatio-Temporal Contrastive Learning for Interpretable Dynamic Brain Network Modeling

BrainSTR is a spatio-temporal contrastive learning framework that enhances the interpretability of dynamic brain network modeling for neuropsychiatric diagnosis by adaptively partitioning brain states, identifying critical phases, and extracting sparse, disease-specific connectivity patterns to construct a discriminative semantic space validated across ASD, BD, and MDD datasets.

Guiliang Guo, Guangqi Wen, Lingwen Liu, Ruoxian Song, Peng Cao, Jinzhu Yang, Fei Wang, Xiaoli Liu, Osmar R. ZaianeWed, 11 Ma💻 cs

EmoSURA: Towards Accurate Evaluation of Detailed and Long-Context Emotional Speech Captions

This paper introduces EmoSURA, a novel evaluation framework that improves the assessment of long-form emotional speech captions by decomposing them into atomic perceptual units for audio-grounded verification, addressing the limitations of traditional metrics and LLM judges while providing the standardized SURABench resource.

Xin Jing, Andreas Triantafyllopoulos, Jiadong Wang, Shahin Amiriparian, Jun Luo, Björn SchullerWed, 11 Ma💻 cs

ConfCtrl: Enabling Precise Camera Control in Video Diffusion via Confidence-Aware Interpolation

ConfCtrl is a confidence-aware video interpolation framework that enables precise camera control in video diffusion for novel view synthesis by combining confidence-weighted point cloud projections with a Kalman-inspired predict-update mechanism to balance pose guidance and geometric consistency while reconstructing unseen regions.

Liudi Yang, George Eskandar, Fengyi Shen, Mohammad Altillawi, Yang Bai, Chi Zhang, Ziyuan Liu, Abhinav ValadaWed, 11 Ma💻 cs

RA-SSU: Towards Fine-Grained Audio-Visual Learning with Region-Aware Sound Source Understanding

This paper introduces a new fine-grained Audio-Visual Learning task called Region-Aware Sound Source Understanding (RA-SSU), supported by two novel datasets (f-Music and f-Lifescene) and a state-of-the-art model named SSUFormer, which utilizes specialized modules to achieve precise sound source segmentation and detailed frame-level textual descriptions.

Muyi Sun, Yixuan Wang, Hong Wang, Chen Su, Man Zhang, Xingqun Qi, Qi Li, Zhenan SunWed, 11 Ma💻 cs

Expressive Power of Property Graph Constraint Languages

This paper presents the first systematic study of the expressive power of the PG-Keys language by establishing a unifying framework to compare it with Graph Functional Dependencies (GFD) and Graph Generating Dependencies (GGD), ultimately revealing a strict hierarchy of expressiveness that clarifies PG-Keys' capabilities within the context of the upcoming GQL standard.

Stefania Dumbrava, Nadime Francis, Victor Marsault, Steven SaillyWed, 11 Ma💻 cs

Test-time Ego-Exo-centric Adaptation for Action Anticipation via Multi-Label Prototype Growing and Dual-Clue Consistency

This paper introduces Test-time Ego-Exo Adaptation for Action Anticipation (TE2^{2}A3^{3}), a novel task addressed by the Dual-Clue enhanced Prototype Growing Network (DCPGN) which utilizes a Multi-Label Prototype Growing Module and a Dual-Clue Consistency Module to effectively bridge the inter-view gap and adapt models online without target-view training data.

Zhaofeng Shi, Heqian Qiu, Lanxiao Wang, Qingbo Wu, Fanman Meng, Lili Pan, Hongliang LiWed, 11 Ma💻 cs

TIMID: Time-Dependent Mistake Detection in Videos of Robot Executions

This paper introduces TIMID, a weakly supervised video anomaly detection framework that leverages task and mistake prompts to detect complex, time-dependent errors in robot executions, addressing the limitations of existing models and out-of-the-box VLMs through a novel multi-robot simulation dataset for zero-shot evaluation.

Nerea Gallego (University of Zaragoza), Fernando Salanova (University of Zaragoza), Claudio Mannarano (University of Zaragoza, University of Torino), Cristian Mahulea (University of Zaragoza), Eduardo Montijano (University of Zaragoza)Wed, 11 Ma💻 cs

Deblurring structural edges in variable thickness topology optimization via density-gradient-informed projection

This paper introduces a density-gradient-informed (DGI) projection method combined with a robust penalization strategy to effectively eliminate low-thickness regions and deblur structural edges in variable thickness topology optimization, achieving sharp solid-void transitions with negligible impact on structural compliance.

Gabriel Stankiewicz, Chaitanya Dev, Paul SteinmannWed, 11 Ma💻 cs

MuxGel: Simultaneous Dual-Modal Visuo-Tactile Sensing via Spatially Multiplexing and Deep Reconstruction

MuxGel is a spatially multiplexed visuo-tactile sensor that overcomes the opacity trade-off in existing GelSight-style devices by using a checkerboard coating to simultaneously capture pre-contact vision and post-contact tactile signals through a single camera, with high-fidelity reconstruction achieved via a deep learning framework.

Zhixian Hu, Zhengtong Xu, Sheeraz Athar, Juan Wachs, Yu SheWed, 11 Ma💻 cs

Epistemic Closure: Autonomous Mechanism Completion for Physically Consistent Simulation

This paper introduces a Neuro-Symbolic Generative Agent that overcomes the "Implicit Context" problem in scientific discovery by autonomously validating and completing physical mechanisms through dimensionless scaling analysis, thereby preventing physical hallucinations and ensuring thermodynamically consistent simulations.

Yue Wua, Tianhao Su, Rui Hu, Mingchuan Zhao, Shunbo Hu, Deng Pan, Jizhong HuangWed, 11 Ma💻 cs

The Richest Paradigm You're Not Using: Commercial Videogames at the Intersection of Human-Computer Interaction and Cognitive Science

This paper argues that commercial videogames serve as a powerful, underutilized research environment at the intersection of human-computer interaction and cognitive science, offering ecologically valid contexts to study perception, attention, and executive functioning through a systematic framework that maps game affordances to cognitive demands.

Jaap Munneke, Jennifer E. CorbettWed, 11 Ma💻 cs