Global Minimizers of Sigmoid Contrastive Loss

This paper theoretically characterizes the global minimizers of sigmoid contrastive loss as (m,brel)(\mathsf{m}, \mathsf{b}_{\mathsf{rel}})-Constellations, providing a rigorous explanation for the success of SigLIP models, the origin of the modality gap, and the necessary dimensionality for high-quality representations while proposing an improved reparameterization for training dynamics.

Kiril Bangachev, Guy Bresler, Iliyas Noman, Yury Polyanskiy2026-03-12🤖 cs.LG

Deep Learning for Clouds and Cloud Shadow Segmentation in Methane Satellite and Airborne Imaging Spectroscopy

This study demonstrates that deep learning architectures, specifically U-Net and Spectral Channel Attention Networks, significantly outperform conventional machine learning methods in accurately segmenting clouds and cloud shadows for high-resolution MethaneSAT and MethaneAIR imagery, thereby improving the reliability of atmospheric methane concentration retrievals.

Manuel Perez-Carrasco, Maya Nasr, Sebastien Roche + 12 more2026-03-12🤖 cs.LG

Multi-modal Data Spectrum: Multi-modal Datasets are Multi-dimensional

This paper presents a large-scale empirical study across 23 visual question-answering benchmarks that reveals significant variations in intra- and inter-modality dependencies, uncovering that many benchmarks inadvertently amplify image-only reliance while showing limited true multi-modal interaction, thereby proposing a quantitative framework for principled dataset design and evaluation.

Divyam Madaan, Varshan Muhunthan, Kyunghyun Cho, Sumit Chopra2026-03-12💬 cs.CL

One-Prompt Strikes Back: Sparse Mixture of Experts for Prompt-based Continual Learning

The paper proposes SMoPE, a novel framework for prompt-based continual learning that integrates task-specific and shared prompt strategies via a sparse Mixture of Experts architecture to mitigate knowledge interference, balance expert utilization, and significantly reduce computational costs while achieving state-of-the-art performance.

Minh Le, Bao-Ngoc Dao, Huy Nguyen, Quyen Tran, Anh Nguyen, Nhat Ho2026-03-12🤖 cs.LG

Composer: A Search Framework for Hybrid Neural Architecture Design

The paper introduces Composer, a principled search framework that efficiently discovers hybrid neural architectures by exploring small-scale designs and extrapolating them to larger scales, resulting in models that outperform Llama 3.2 in accuracy, validation loss, and efficiency.

Bilge Acun, Prasoon Sinha, Newsha Ardalani, Sangmin Bae, Alicia Golden, Chien-Yu Lin, Meghana Madhyastha, Fei Sun, Neeraja J. Yadwadkar, Carole-Jean Wu2026-03-12🤖 cs.LG

Uncovering Semantic Selectivity of Latent Groups in Higher Visual Cortex with Mutual Information-Guided Diffusion

This paper introduces MIG-Vis, a method combining variational autoencoders with mutual information-guided diffusion models to directly visualize and validate that neural populations in the macaque inferior temporal cortex are organized into structured, semantically meaningful latent groups encoding specific visual features like object pose and category transformations.

Yule Wang, Joseph Yu, Chengrui Li, Weihan Li, Anqi Wu2026-03-12🧬 q-bio

A Systematic Evaluation of Self-Supervised Learning for Label-Efficient Sleep Staging with Wearable EEG

This paper presents the first systematic evaluation of self-supervised learning for label-efficient sleep staging using wearable EEG, demonstrating that a specialized SSL pipeline significantly outperforms supervised baselines and general-purpose foundation models by achieving clinical-grade accuracy with only 5–10% of labeled data.

Emilio Estevan, María Sierra-Torralba, Eduardo López-Larraz, Luis Montesano2026-03-12🤖 cs.AI

Geopolitics, Geoeconomics, and Sovereign Risk: Different Shocks, Different Channels

This paper distinguishes between geopolitical and geoeconomic shocks by demonstrating that while geopolitical risks directly reprice sovereign default risk, geoeconomic shocks transmit through monetary policy and the global financial cycle, creating a "scissors pattern" in sovereign CDS spreads that necessitates different policy responses for liquidity provision versus persistent risk premia.

Alvaro Ortiz, Tomasa Rodrigo, Pablo Saborido2026-03-12📊 stat

HyWA: Hypernetwork Weight Adapting Personalized Voice Activity Detection

The paper proposes HyWA, a novel Personalized Voice Activity Detection (PVAD) approach that utilizes a hypernetwork to generate personalized weights for selected layers of a standard VAD model, demonstrating consistent performance improvements and enhanced deployment flexibility compared to existing speaker-conditioning methods.

Mahsa Ghazvini Nejad, Hamed Jafarzadeh Asl, Amin Edraki, Mohammadreza Sadeghi, Masoud Asgharian, Yuanhao Yu, Vahid Partovi Nia2026-03-12⚡ eess

Predicting kernel regression learning curves from only raw data statistics

This paper introduces the Hermite eigenstructure ansatz (HEA), a theoretical framework that accurately predicts kernel regression learning curves on real datasets using only the empirical data covariance and target function decomposition, by approximating kernel eigenstructures as Hermite polynomials and demonstrating that MLPs in the feature-learning regime follow similar learning patterns.

Dhruva Karkada, Joseph Turnbull, Yuxi Liu, James B. Simon2026-03-12🤖 cs.LG