Activation Steering for Accent Adaptation in Speech Foundation Models

This paper proposes a parameter-free activation steering method that identifies accent information within a specific band of middle encoder layers in speech foundation models and corrects accent-induced representation shifts during inference, thereby significantly reducing word error rates across diverse accents without requiring model fine-tuning.

Jinuo Sun, Yang Xiao, Sung Kyun Chung, Qiuchi Hu, Gongping Huang, Eun-Jung Holden, Ting DangMon, 09 Ma⚡ eess

In-Wave Computation Aided Stacked Intelligent Metasurfaces in Next-Generation Networks: Challenges and Opportunities

This article reviews the state-of-the-art of stacked intelligent metasurfaces (SIMs) for in-wave computation in next-generation networks, highlighting their potential to jointly optimize communication, sensing, and processing while addressing current challenges in scalability, controllability, and robustness.

Mengbing Liu, Chau Yuen, Dusit Niyato, Bruno Clerckx, Lajos HanzoMon, 09 Ma⚡ eess

Ill-Posedness Analysis of CSI-Based Electromagnetic Inverse Scattering for Material Reconstruction in ISAC Systems

This paper analyzes the ill-posedness of CSI-based electromagnetic inverse scattering in ISAC systems by characterizing the coherence of scattering operators, proving that restricting the reconstruction to a Region of Interest (ROI) significantly improves conditioning and estimation bounds, and validating a corresponding ROI-constrained optimization framework through simulations.

Yubin Luo, Li Yu, Takumi Takahashi, Shaoyi Liu, Yuxiang Zhang, Jianhua Zhang, Hideki OchiaiMon, 09 Ma⚡ eess

Channel Estimation for Reconfigurable Intelligent Surface Assisted Upper Mid-Band MIMO Systems

This paper proposes a conditioning-aware channel estimation framework for RIS-assisted upper mid-band MIMO systems that overcomes severe spatial correlation and ill-conditioning in near-field propagation by transforming the high-dimensional problem into well-conditioned subproblems through greedy column grouping and piecewise phase design, thereby achieving robust performance without relying on sparsity assumptions.

Jeongjae Lee, Chanwon Kim, Songnam HongMon, 09 Ma⚡ eess

A Retrieval-Assisted Framework for Wireless Localization

This paper proposes a unified retrieval-assisted framework for wireless localization that combines channel charting for efficient low-dimensional retrieval of reference points with a graph attention network to model inter-sample correlations, thereby achieving superior accuracy and scalability compared to existing similarity-based and learning-based methods.

Haoyu Huang, Guangjin Pan, Kaixuan Huang, Shunqing Zhang, Yuhao Zhang, Musa Furkan Keskin, Zheng Xing, Henk WymeerschMon, 09 Ma⚡ eess

MAD: A Multimodal and Multi-perspective Affective Dataset with Hierarchical Annotations

The paper introduces MAD, a novel multimodal affective dataset featuring synchronized physiological signals (EEG, ECG, EOG, EMG, PPG, BCG) and tri-view RGB-D facial videos from 18 participants, accompanied by a hierarchical three-level annotation framework to facilitate comprehensive research in emotion recognition and cross-modal affective analysis.

Shengwei Guo, Yunqing Qiao, Wenzhan Zhang, Bo Liu, Yong Wang, Guobing SunMon, 09 Ma⚡ eess

Classification of Autistic and Non-Autistic Children's Speech: A Cross-Linguistic Study in Finnish, French, and Slovak

This cross-linguistic study demonstrates that while certain acoustic-prosodic markers of autism in children's speech generalize across Finnish, French, and Slovak, robust classification performance requires language-specific modeling due to significant variations in feature importance and transferability across typologically distinct languages.

Sofoklis Kakouros, Ida-Lotta MyllyläMon, 09 Ma⚡ eess

Cross-linguistic Prosodic Analysis of Autistic and Non-autistic Child Speech in Finnish, French and Slovak

This study analyzes a multilingual corpus of Finnish, French, and Slovak child speech to demonstrate that autistic speakers exhibit a distinct, cross-linguistic prosodic profile characterized by increased intensity variability, clearer voice quality, and reduced temporal dynamics, thereby challenging deficiency-based models in favor of a complex, language-independent acoustic signature.

Ida-Lotta Myllylä, Sofoklis KakourosMon, 09 Ma⚡ eess

Doctor or Patient? Synergizing Diarization and ASR for Code-Switched Hinglish Medical Conditions Extraction

This paper presents a competitive open-source cascaded system that combines EEND-VC speaker diarization and fine-tuned Qwen3 ASR to achieve first place in the DISPLACE-M challenge by effectively extracting medical conditions from overlapping, code-switched Hinglish clinical dialogues.

Séverin Baroudi, Yanis Labrak, Shashi Kumar, Joonas Kalda, Sergio Burdisso, Pawel Cyrta, Juan Ignacio Alvarez-Trejos, Petr Motlicek, Hervé Bredin, Ricard MarxerMon, 09 Ma⚡ eess

A Unified Multicarrier Waveform Framework for Next-generation Wireless Networks: Principles, Performance, and Challenges

This paper proposes a unified multicarrier waveform framework for 6G networks by systematically characterizing and comparing state-of-the-art one-dimensional and two-dimensional modulation schemes to guide their selection based on channel conditions and performance requirements.

Xingyao Zhang, Haoran Yin, Yanqun Tang, Yao Ge, Yong Zeng, Miaowen Wen, Zilong Liu, Yong Liang Guan, Hüseyin Arslan, Giuseppe CaireMon, 09 Ma⚡ eess

Can LLMs Help Localize Fake Words in Partially Fake Speech?

This paper investigates the use of a text-trained large language model adapted for speech to localize fake words in partially edited audio, revealing that while the model effectively identifies edits by leveraging specific training patterns like word-level polarity substitutions, it struggles to generalize to unseen editing styles.

Lin Zhang, Thomas Thebaud, Zexin Cai, Sanjeev Khudanpur, Daniel Povey, Leibny Paola García-Perera, Matthew Wiesner, Nicholas AndrewsFri, 13 Ma⚡ eess