cs.LG papers | Gist.Science

Zero-Shot Transferable Solution Method for Parametric Optimal Control Problems

This paper introduces a zero-shot transferable solution method for parametric optimal control problems that utilizes function encoder policies to learn reusable neural basis functions offline, enabling efficient online adaptation to varying objectives with minimal computational overhead and near-optimal performance.

Xingjian Li, Kelvin Kan, Deepanshu Verma, Krishna Kumar, Stanley Osher, Ján Drgona2026-03-12🤖 cs.LG

Global Minimizers of Sigmoid Contrastive Loss

This paper theoretically characterizes the global minimizers of sigmoid contrastive loss as $(\mathsf{m}, \mathsf{b}_{\mathsf{rel}})$ -Constellations, providing a rigorous explanation for the success of SigLIP models, the origin of the modality gap, and the necessary dimensionality for high-quality representations while proposing an improved reparameterization for training dynamics.

Kiril Bangachev, Guy Bresler, Iliyas Noman, Yury Polyanskiy2026-03-12🤖 cs.LG

Deep Learning for Clouds and Cloud Shadow Segmentation in Methane Satellite and Airborne Imaging Spectroscopy

This study demonstrates that deep learning architectures, specifically U-Net and Spectral Channel Attention Networks, significantly outperform conventional machine learning methods in accurately segmenting clouds and cloud shadows for high-resolution MethaneSAT and MethaneAIR imagery, thereby improving the reliability of atmospheric methane concentration retrievals.

Manuel Perez-Carrasco, Maya Nasr, Sebastien Roche + 12 more2026-03-12🤖 cs.LG

Multi-modal Data Spectrum: Multi-modal Datasets are Multi-dimensional

This paper presents a large-scale empirical study across 23 visual question-answering benchmarks that reveals significant variations in intra- and inter-modality dependencies, uncovering that many benchmarks inadvertently amplify image-only reliance while showing limited true multi-modal interaction, thereby proposing a quantitative framework for principled dataset design and evaluation.

Divyam Madaan, Varshan Muhunthan, Kyunghyun Cho, Sumit Chopra2026-03-12💬 cs.CL

Proposing a Framework for Machine Learning Adoption on Legacy Systems

This paper proposes a pragmatic, API-based framework that decouples machine learning model lifecycles from legacy production systems via a lightweight, browser-based interface, enabling small and medium-sized enterprises to adopt ML without costly hardware upgrades or operational downtime while empowering domain experts through interactive, human-in-the-loop control.

Ashiqur Rahman, Hamed Alhoori2026-03-12🤖 cs.LG

One-Prompt Strikes Back: Sparse Mixture of Experts for Prompt-based Continual Learning

The paper proposes SMoPE, a novel framework for prompt-based continual learning that integrates task-specific and shared prompt strategies via a sparse Mixture of Experts architecture to mitigate knowledge interference, balance expert utilization, and significantly reduce computational costs while achieving state-of-the-art performance.

Minh Le, Bao-Ngoc Dao, Huy Nguyen, Quyen Tran, Anh Nguyen, Nhat Ho2026-03-12🤖 cs.LG

RADAR: Reasoning-Ability and Difficulty-Aware Routing for Reasoning LLMs

RADAR is a lightweight, interpretable routing framework that optimizes the performance-cost tradeoff for reasoning LLMs by leveraging psychometric-inspired item response modeling to dynamically match query difficulties with appropriate model-budget pairs across diverse benchmarks.

Nigel Fernandez, Branislav Kveton, Ryan A. Rossi, Andrew S. Lan, Zichao Wang2026-03-12🤖 cs.AI

Composer: A Search Framework for Hybrid Neural Architecture Design

The paper introduces Composer, a principled search framework that efficiently discovers hybrid neural architectures by exploring small-scale designs and extrapolating them to larger scales, resulting in models that outperform Llama 3.2 in accuracy, validation loss, and efficiency.

Bilge Acun, Prasoon Sinha, Newsha Ardalani, Sangmin Bae, Alicia Golden, Chien-Yu Lin, Meghana Madhyastha, Fei Sun, Neeraja J. Yadwadkar, Carole-Jean Wu2026-03-12🤖 cs.LG

Uncovering Semantic Selectivity of Latent Groups in Higher Visual Cortex with Mutual Information-Guided Diffusion

This paper introduces MIG-Vis, a method combining variational autoencoders with mutual information-guided diffusion models to directly visualize and validate that neural populations in the macaque inferior temporal cortex are organized into structured, semantically meaningful latent groups encoding specific visual features like object pose and category transformations.

Yule Wang, Joseph Yu, Chengrui Li, Weihan Li, Anqi Wu2026-03-12🧬 q-bio

Communication Enables Cooperation in LLM Agents: A Comparison with Curriculum-Based Approaches

This paper demonstrates that in multi-agent LLM systems, simple communication protocols are a more robust mechanism for achieving cooperation in social dilemmas than curriculum learning, which can inadvertently induce "learned pessimism" and reduce alignment depending on the sequence of training games.

Hachem Madmoun, Salem Lahlou2026-03-12🤖 cs.LG

A Systematic Evaluation of Self-Supervised Learning for Label-Efficient Sleep Staging with Wearable EEG

This paper presents the first systematic evaluation of self-supervised learning for label-efficient sleep staging using wearable EEG, demonstrating that a specialized SSL pipeline significantly outperforms supervised baselines and general-purpose foundation models by achieving clinical-grade accuracy with only 5–10% of labeled data.

Emilio Estevan, María Sierra-Torralba, Eduardo López-Larraz, Luis Montesano2026-03-12🤖 cs.AI

Geopolitics, Geoeconomics, and Sovereign Risk: Different Shocks, Different Channels

This paper distinguishes between geopolitical and geoeconomic shocks by demonstrating that while geopolitical risks directly reprice sovereign default risk, geoeconomic shocks transmit through monetary policy and the global financial cycle, creating a "scissors pattern" in sovereign CDS spreads that necessitates different policy responses for liquidity provision versus persistent risk premia.

Alvaro Ortiz, Tomasa Rodrigo, Pablo Saborido2026-03-12📊 stat

HyWA: Hypernetwork Weight Adapting Personalized Voice Activity Detection

The paper proposes HyWA, a novel Personalized Voice Activity Detection (PVAD) approach that utilizes a hypernetwork to generate personalized weights for selected layers of a standard VAD model, demonstrating consistent performance improvements and enhanced deployment flexibility compared to existing speaker-conditioning methods.

Mahsa Ghazvini Nejad, Hamed Jafarzadeh Asl, Amin Edraki, Mohammadreza Sadeghi, Masoud Asgharian, Yuanhao Yu, Vahid Partovi Nia2026-03-12⚡ eess

Reveal-to-Revise: Explainable Bias-Aware Generative Modeling with Multimodal Attention

This paper introduces "Reveal-to-Revise," an explainable, bias-aware generative framework that unifies cross-modal attention, Grad-CAM++ attribution, and iterative feedback to achieve state-of-the-art performance and fairness in multimodal image generation and text classification tasks.

Noor Islam S. Mohammad, Md Muntaqim Meherab2026-03-12🤖 cs.LG

Absolute indices for determining compactness, separability and number of clusters

This paper introduces novel absolute cluster validity indices that quantify the compactness and separability of clusters to determine the true number of clusters, demonstrating their effectiveness across synthetic and real-world datasets compared to existing relative indices.

Adil M. Bagirov, Ramiz M. Aliguliyev, Nargiz Sultanova, Sona Taheri2026-03-12📊 stat

Predicting kernel regression learning curves from only raw data statistics

This paper introduces the Hermite eigenstructure ansatz (HEA), a theoretical framework that accurately predicts kernel regression learning curves on real datasets using only the empirical data covariance and target function decomposition, by approximating kernel eigenstructures as Hermite polynomials and demonstrating that MLPs in the feature-learning regime follow similar learning patterns.

Dhruva Karkada, Joseph Turnbull, Yuxi Liu, James B. Simon2026-03-12🤖 cs.LG

Revisiting Value Iteration: Unified Analysis of Discounted and Average-Reward Cases

This paper presents a unified geometry-based analysis demonstrating that Value Iteration achieves geometric convergence in both discounted and average-reward settings under a unique unichain optimal policy assumption, thereby resolving the discrepancy between classical theoretical bounds and observed empirical performance.

Arsenii Mustafin, Xinyi Sheng, Dominik Baumann2026-03-12🤖 cs.LG

KV Cache Transform Coding for Compact Storage in LLM Inference

KVTC is a lightweight, model-agnostic transform coder that achieves up to 20 $\times$ (or higher) compression of Key-Value caches for large language models by combining PCA-based decorrelation, adaptive quantization, and entropy coding, thereby enabling memory-efficient serving with reusable caches while maintaining high reasoning and long-context accuracy.

Konrad Staniszewski, Adrian Łancucki2026-03-12💬 cs.CL

Causal Regime Detection in Energy Markets With Augmented Time Series Structural Causal Models

This paper introduces Augmented Time Series Causal Models (ATSCM), a novel framework that integrates neural causal discovery with counterfactual reasoning to dynamically model complex, time-varying causal relationships in energy markets and enable interpretable scenario analysis for electricity price formation.

Dennis Thumm2026-03-12📊 stat

Towards Causal Market Simulators

This paper proposes the Time-series Neural Causal Model VAE (TNCM-VAE), a novel framework that integrates variational autoencoders with structural causal models to generate synthetic financial time series that preserve both temporal dependencies and causal relationships, thereby enabling robust counterfactual analysis and risk assessment.

Dennis Thumm, Luis Ontaneda Mijares2026-03-12📊 stat

← Previous Next →