LiM-YOLO: Less is More with Pyramid Level Shift and Normalized Auxiliary Branch for Ship Detection in Optical Remote Sensing Imagery

LiM-YOLO is a streamlined ship detection model for optical remote sensing imagery that achieves state-of-the-art accuracy with fewer parameters by shifting the detection pyramid from P3-P5 to P2-P4 to better resolve small vessels and employing Group Normalization to stabilize training on high-resolution inputs.

Seon-Hoon Kim, Hyeji Sim, Youeyun Jung, Ok-Chul Jung, Yerin KimWed, 11 Ma⚡ eess

From Demonstrations to Safe Deployment: Path-Consistent Safety Filtering for Diffusion Policies

This paper introduces Path-Consistent Safety Filtering (PACS), a novel approach that ensures formal safety guarantees for diffusion policies in dynamic environments while preserving task success rates by applying set-based reachability analysis to brake trajectories in a manner consistent with the policy's training distribution.

Ralf Römer, Julian Balletshofer, Jakob Thumm, Marco Pavone, Angela P. Schoellig, Matthias AlthoffWed, 11 Ma⚡ eess

Rethinking Discrete Speech Representation Tokens for Accent Generation

This paper presents the first systematic investigation into how accent information is encoded in Discrete Speech Representation Tokens (DSRTs), introducing a unified evaluation framework that reveals layer selection is the most critical factor for retaining accents, while ASR supervision significantly diminishes them and naive codebook reduction fails to disentangle accent from phonetic and speaker information.

Jinzuomu Zhong, Yi Wang, Korin Richmond, Peter BellWed, 11 Ma⚡ eess

Randomized Space-Time Stacked Intelligent Metasurfaces for Massive Multiuser Downlink Connectivity

This paper proposes a novel randomized space-time stacked intelligent metasurface (ST-SIM) architecture that integrates a time-varying input layer to exploit multiuser diversity and enable scalable massive downlink connectivity while significantly reducing channel state information acquisition and feedback overhead through a partial-CSIT-based beamforming scheme.

Donatella Darsena, Ivan Iudice, Vincenzo Galdi, Francesco VerdeWed, 11 Ma⚡ eess

Evaluating pretrained speech embedding systems for dysarthria detection across heterogenous datasets

This paper comprehensively evaluates 17 pretrained speech embedding systems across six heterogeneous datasets for dysarthria detection, revealing significant variability in within-dataset performance and limited cross-dataset generalization, which raises critical questions about the clinical validity of models trained and tested on the same data.

Lovisa Wihlborg, Jemima Goodall, David Wheatley, Jacob J. Webber, Johnny Tam, Christine Weaver, Suvankar Pal, Siddharthan Chandran, Sohan Seth, Oliver Watts, Cassia Valentini-BotinhaoWed, 11 Ma⚡ eess

Benchmarking Humans and Machines on Complex Multilingual Speech Understanding Tasks

This paper introduces a systematic paradigm for benchmarking humans and machines on multilingual speech understanding tasks, revealing that while speech-based large language models match or exceed human performance in clean, single-speaker conditions, humans significantly outperform them in selectively attending to target speakers within complex, mixed-channel acoustic scenes, particularly in non-native languages.

Sai Samrat Kankanala, Ram Chandra, Sriram GanapathyWed, 11 Ma⚡ eess

Remote Tracking with State-Dependent Sensing in Pull-Based Systems: A POMDP Framework

This paper proposes a POMDP framework for minimizing long-term weighted distortion and transmission costs in remote tracking of Markov sources via multiple heterogeneous sensors with state-dependent accuracy, introducing truncation-based and discounted reformulation methods to solve the resulting infinite-state belief-MDP and demonstrating their superior performance and structural insights over low-complexity baselines.

Jiapei Tian, Abolfazl Zakeri, Marian Codreanu, David GundlegårdWed, 11 Ma⚡ eess