cs.LG papers | Gist.Science

The Geometric Inductive Bias of Grokking: Bypassing Phase Transitions via Architectural Topology

By employing an interventional approach that modifies Transformer architecture, this paper demonstrates that enforcing spherical topology and uniform attention routing eliminates the delayed generalization phenomenon known as grokking in modular addition tasks, provided these architectural priors align with the task's intrinsic symmetries.

Alper Yıldırım2026-03-06🤖 cs.AI

Boosting ASR Robustness via Test-Time Reinforcement Learning with Audio-Text Semantic Rewards

This paper introduces ASR-TRA, a novel test-time reinforcement learning framework that leverages audio-text semantic rewards and causal intervention to overcome confirmation bias in existing adaptation methods, thereby significantly improving ASR robustness and accuracy in noisy and accented environments without ground-truth labels.

Linghan Fang, Tianxin Xie, Li Liu2026-03-06🤖 cs.AI

SlideSparse: Fast and Flexible (2N-2):2N Structured Sparsity

SlideSparse is the first system to enable Sparse Tensor Core acceleration for accuracy-preserving $(2N-2):2N$ structured sparsity on commodity GPUs by employing a sliding window decomposition and activation lifting technique, achieving near-theoretical speedups for LLMs without hardware modifications.

Hanyong Shao, Yingbo Hao, Ting Song + 10 more2026-03-06🤖 cs.LG

Recursive Inference Machines for Neural Reasoning

This paper introduces Recursive Inference Machines (RIMs), a neural reasoning framework that unifies neural backbones with classical recursive inference mechanisms to enhance the performance of models like Tiny Recursive Models on complex reasoning benchmarks such as ARC-AGI and Sudoku Extreme, as well as tabular data classification tasks.

Mieszko Komisarczyk, Saurabh Mathur, Maurice Kraus + 2 more2026-03-06🤖 cs.AI

A Behaviour-Aware Federated Forecasting Framework for Distributed Stand-Alone Wind Turbines

This paper proposes a privacy-preserving, two-stage federated learning framework that clusters distributed wind turbines based on long-term behavioral statistics using Double Roulette Selection and recursive Auto-split refinement to train localized LSTM models, achieving competitive short-term forecasting accuracy that outperforms geographic partitioning while maintaining data locality.

Bowen Li, Xiufeng Liu, Maria Sinziiana Astefanoaei2026-03-06🤖 cs.LG

Beyond Word Error Rate: Auditing the Diversity Tax in Speech Recognition through Dataset Cartography

This paper proposes a robust auditing framework for automatic speech recognition systems that moves beyond traditional Word Error Rate by introducing the Sample Difficulty Index and semantic metrics to quantify and mitigate the "diversity tax" disproportionately affecting marginalized speakers.

Ting-Hui Cheng, Line H. Clemmensen, Sneha Das2026-03-06🤖 cs.LG

Whispering to a Blackbox: Bootstrapping Frozen OCR with Visual Prompts

This paper introduces "Whisperer," a sample-efficient visual prompting framework that bootstraps frozen OCR models by using a four-stage behavioral cloning curriculum to learn diffusion-based preprocessors that enhance degraded text inputs, achieving an 8% absolute reduction in Character Error Rate without modifying the downstream model's weights.

Samandar Samandarov, Nazirjon Ismoiljonov, Abdullah Sattorov + 1 more2026-03-06🤖 cs.AI

Layer by layer, module by module: Choose both for optimal OOD probing of ViT

This paper demonstrates that distribution shift is the primary cause of performance degradation in deeper layers of Vision Transformers and reveals that optimal out-of-distribution probing requires selecting between feedforward network activations and normalized self-attention outputs depending on the severity of the shift.

Ambroise Odonnat, Vasilii Feofanov, Laetitia Chapel + 2 more2026-03-06🤖 cs.LG

Bayesian Supervised Causal Clustering

This paper proposes Bayesian Supervised Causal Clustering (BSCC), a novel framework that identifies homogeneous patient subgroups by simultaneously clustering individuals based on their covariate profiles and treatment effects, and validates its practical utility through simulations and real-world data from the International Stroke Trial.

Luwei Wang, Nazir Lone, Sohan Seth2026-03-06🤖 cs.LG

Knowledge Divergence and the Value of Debate for Scalable Oversight

This paper establishes a formal geometric framework linking AI debate and RLAIF by demonstrating that the value of debate scales with knowledge divergence between models, transitioning from negligible benefit to essential oversight as representations diverge, while identifying specific regimes where debate unlocks inaccessible outcomes or risks coordination failure.

Robin Young2026-03-06🤖 cs.LG

Latent Policy Steering through One-Step Flow Policies

The paper proposes Latent Policy Steering (LPS), a robust offline reinforcement learning method that achieves state-of-the-art performance by using a differentiable one-step MeanFlow policy to backpropagate original-action-space Q-gradients directly to a latent actor, thereby eliminating the need for proxy latent critics and sensitive hyperparameter tuning while ensuring policies remain within dataset support.

Hokyun Im, Andrey Kolobov, Jianlong Fu + 1 more2026-03-06🤖 cs.LG

WavSLM: Single-Stream Speech Language Modeling via WavLM Distillation

WavSLM is a single-stream speech language model that achieves competitive speech generation and consistency without text supervision by quantizing and distilling WavLM representations into a single codebook for autoregressive next-chunk prediction.

Luca Della Libera, Cem Subakan, Mirco Ravanelli2026-03-06🤖 cs.AI

How important are the genes to explain the outcome - the asymmetric Shapley value as an honest importance metric for high-dimensional features

This paper proposes using asymmetric Shapley values as a superior metric for quantifying the importance of high-dimensional genomic features in clinical prediction models, addressing limitations of traditional approaches by accounting for collinearity and known causal directions, and provides efficient algorithms validated through a colorectal cancer progression study.

Mark A. van de Wiel, Jeroen Goedhart, Martin Jullum + 1 more2026-03-06🤖 cs.LG

GALACTIC: Global and Local Agnostic Counterfactuals for Time-series Clustering

This paper introduces GALACTIC, a unified framework that bridges local and global counterfactual explainability for unsupervised time-series clustering by generating minimal perturbations to cross cluster boundaries and employing a provably efficient submodular optimization algorithm to derive concise, non-redundant global summaries of these transitions.

Christos Fragkathoulas, Eleni Psaroudaki, Themis Palpanas + 1 more2026-03-06🤖 cs.AI

FairFinGAN: Fairness-aware Synthetic Financial Data Generation

The paper proposes FairFinGAN, a WGAN-based framework that integrates fairness constraints via a classifier to generate synthetic financial data that effectively mitigates bias against protected attributes while maintaining high utility for downstream predictive tasks.

Tai Le Quy, Dung Nguyen Tuan, Trung Nguyen Thanh + 3 more2026-03-06🤖 cs.LG

Bayes with No Shame: Admissibility Geometries of Predictive Inference

This paper demonstrates that predictive inference is governed by four distinct, pairwise non-nested admissibility geometries—Blackwell risk dominance, anytime-valid supermartingales, marginal coverage, and Cesàro approachability—each offering a unique certificate of optimality and proving that admissibility is irreducibly relative to the chosen criterion rather than a universal property.

Nicholas G. Polson, Daniel Zantedeschi2026-03-06🔢 math

On the Statistical Optimality of Optimal Decision Trees

This paper establishes a comprehensive statistical theory for globally optimal empirical risk minimization decision trees by deriving sharp oracle inequalities and minimax optimal rates over a novel piecewise sparse heterogeneous anisotropic Besov space, thereby providing rigorous theoretical guarantees for their performance in high-dimensional regression and classification under both sub-Gaussian and heavy-tailed noise settings.

Zineng Xu, Subhroshekhar Ghosh, Yan Shuo Tan2026-03-06🔢 math

Preserving Continuous Symmetry in Discrete Spaces: Geometric-Aware Quantization for SO(3)-Equivariant GNNs

This paper proposes Geometric-Aware Quantization (GAQ), a framework that enables efficient, low-bit inference for SO(3)-equivariant Graph Neural Networks by decoupling magnitude and direction to rigorously preserve continuous symmetry, thereby achieving significant speedups and memory reductions on molecular simulation benchmarks without compromising physical consistency.

Haoyu Zhou, Ping Xue, Hao Zhang + 1 more2026-03-06🤖 cs.LG

InfoFlow KV: Information-Flow-Aware KV Recomputation for Long Context

This paper proposes InfoFlow KV, an information-flow-aware method that uses attention-norm signals and global positional reordering to selectively recompute key-value caches, thereby improving the efficiency and accuracy of retrieval-augmented generation for long-context tasks.

Xin Teng, Canyu Zhang, Shaoyi Zheng + 3 more2026-03-06🤖 cs.LG

Learning Causal Structure of Time Series using Best Order Score Search

This paper introduces TS-BOSS, a scalable, score-based algorithm for learning causal structures in multivariate time series that extends the Best Order Score Search framework with dynamic Bayesian networks and grow-shrink trees, demonstrating superior performance in high auto-correlation regimes compared to standard constraint-based methods.

Irene Gema Castillo Mansilla, Urmi Ninad2026-03-06🤖 cs.AI

← Previous Next →