Temporal Pooling Strategies for Training-Free Anomalous Sound Detection with Self-Supervised Audio Embeddings

This paper addresses the underexplored role of temporal pooling in training-free anomalous sound detection by proposing and evaluating adaptive strategies, specifically Relative Deviation Pooling (RDP) and a hybrid approach, which achieve state-of-the-art performance across multiple benchmarks and outperform previously reported trained systems.

Kevin Wilkinghoff, Sarthak Yadav, Zheng-Hua Tan2026-03-06💻 cs

A Large-Scale Probing Analysis of Speaker-Specific Attributes in Self-Supervised Speech Representations

This study conducts a large-scale probing analysis of 11 self-supervised speech models to reveal a hierarchical encoding of speaker attributes, challenging the assumption that final layers are purely linguistic by showing that larger models recover speaker identity in deep layers while intermediate representations better capture dynamic prosody than specialized embeddings.

Aemon Yat Fei Chiu, Kei Ching Fung, Roger Tsz Yeung Li + 2 more2026-03-06💻 cs

Better audio representations are more brain-like: linking model-brain alignment with performance in downstream auditory tasks

This study demonstrates that recent self-supervised audio models with superior performance on diverse downstream tasks exhibit stronger alignment with human auditory cortex activity, suggesting that brain-like representations emerge naturally as a byproduct of learning to reconstruct naturalistic audio data.

Leonardo Pepino, Pablo Riera, Juan Kamienkowski + 1 more2026-03-05🤖 cs.LG