stat.ML papers | Gist.Science

On-Average Stability of Multipass Preconditioned SGD and Effective Dimension

This paper establishes a new on-average stability analysis for multipass Preconditioned SGD to derive generalization bounds dependent on effective dimension, revealing how mismatches between population risk curvature and gradient noise geometry can lead to suboptimal performance if preconditioning is improperly chosen.

Simon Vary, Tyler Farghly, Ilja Kuzborskij, Patrick RebeschiniFri, 13 Ma📊 stat

BTZSC: A Benchmark for Zero-Shot Text Classification Across Cross-Encoders, Embedding Models, Rerankers and LLMs

This paper introduces BTZSC, a comprehensive benchmark of 22 datasets designed to systematically evaluate and compare the zero-shot text classification capabilities of NLI cross-encoders, embedding models, rerankers, and instruction-tuned LLMs, revealing that modern rerankers currently achieve state-of-the-art performance while embedding models offer the best accuracy-latency trade-off.

Ilias AarabFri, 13 Ma💬 cs.CL

Chemical Reaction Networks Learn Better than Spiking Neural Networks

This paper mathematically proves and numerically demonstrates that chemical reaction networks without hidden layers can learn classification tasks more accurately and efficiently than spiking neural networks requiring hidden layers, offering a theoretical basis for the superior learning capabilities of biochemical systems.

Sophie Jaffard, Ivo F. SbalzariniFri, 13 Ma📊 stat

Wasserstein Gradient Flows for Batch Bayesian Optimal Experimental Design

This paper proposes a novel approach to batch Bayesian optimal experimental design by lifting the optimization problem to the space of probability measures, characterizing the optimal design as a Gibbs distribution, and developing scalable particle-based algorithms via Wasserstein gradient flows to efficiently explore complex, non-convex utility landscapes.

Louis SharrockFri, 13 Ma📊 stat

A Quantitative Characterization of Forgetting in Post-Training

This paper provides a theoretical framework for quantifying forgetting in continual post-training of generative models by demonstrating how the choice of divergence objective (forward vs. reverse KL), replay strategies, and geometric overlap between old and new task distributions determine whether models suffer from mass forgetting or controlled component drift.

Krishnakumar Balasubramanian, Shiva Prasad KasiviswanathanFri, 13 Ma📊 stat

SSRCA: a novel machine learning pipeline to perform sensitivity analysis for agent-based models

This paper introduces SSRCA, a novel machine learning pipeline that effectively performs sensitivity analysis on complex agent-based models by identifying sensitive parameters, revealing common output patterns, and determining the specific input values that generate them, as demonstrated through a tumor spheroid growth model where it outperforms the Sobol' Method in robustness.

Edward H. Rohr, John T. Nardini2026-03-11🧬 q-bio

Accounting for shared covariates in semi-parametric Bayesian additive regression trees

This paper proposes a novel extension to semi-parametric Bayesian additive regression trees (BART) that resolves non-identifiability and bias issues by modifying tree-generation moves to allow shared covariates between linear and non-parametric components, thereby enabling the modeling of complex interactions while maintaining competitive performance across simulation and real-world applications.

Estevão B. Prado, Andrew C. Parnell, Keefe Murphy + 3 more2026-03-10🤖 cs.LG

Convergence and complexity of block majorization-minimization for constrained block-Riemannian optimization

This paper establishes the asymptotic convergence and $\widetilde{O}(\epsilon^{-2})$ iteration complexity of block majorization-minimization algorithms for smooth nonconvex optimization problems with block constraints on Riemannian manifolds, demonstrating their broad applicability and superior performance over standard Euclidean approaches.

Yuchen Li, Laura Balzano, Deanna Needell + 1 more2026-03-10📊 stat

Zeroth-Order primal-dual Alternating Projection Gradient Algorithms for Nonconvex Minimax Problems with Coupled linear Constraints

This paper proposes two novel single-loop zeroth-order primal-dual algorithms, ZO-PDAPG and ZO-RMPDPG, that achieve state-of-the-art iteration complexity guarantees for solving nonconvex-(strongly) concave minimax problems with coupled linear constraints under both deterministic and stochastic settings.

Huiling Zhang, Zi Xu, Yuhong Dai2026-03-06🔢 math

Towards a Fairer Non-negative Matrix Factorization

This paper proposes a min-max formulation for Non-negative Matrix Factorization (NMF) to mitigate group bias, deriving specific optimization algorithms and demonstrating through experiments that while this approach can improve fairness, it may increase individual error, necessitating application-specific trade-off considerations.

Lara Kassab, Erin George, Deanna Needell + 3 more2026-03-06💻 cs

An Experimental Study on Fairness-aware Machine Learning for Credit Scoring Problems

This paper presents a comprehensive experimental study demonstrating that fairness-aware machine learning models achieve a superior balance between predictive accuracy and fairness compared to traditional classification models in the context of credit scoring.

Huyen Giang Thi Thu, Thang Viet Doan, Ha-Bang Ban + 1 more2026-03-06💻 cs

Curse of Dimensionality in Neural Network Optimization

This paper demonstrates that training shallow neural networks with Lipschitz continuous activation functions to approximate smooth target functions suffers from the curse of dimensionality, as the population risk decays at a rate bounded by a power of time that depends inversely on the input dimension, regardless of whether the optimization is analyzed via empirical or population risk or through 2-Wasserstein gradient flow dynamics.

Sanghoon Na, Haizhao Yang2026-03-06🔢 math

Generalization Bounds for Markov Algorithms through Entropy Flow Computations

This paper extends entropy flow-based generalization bounds from specific noisy algorithms to all learning processes governed by time-homogeneous Markov dynamics by introducing a new exact entropy flow formula and linking generalization error to ergodic properties via modified logarithmic Sobolev inequalities.

Benjamin Dupuis, Maxime Haddouche, George Deligiannidis + 1 more2026-03-06💻 cs

Double Momentum and Error Feedback for Clipping with Fast Rates and Differential Privacy

This paper introduces Clip21-SGD2M, a novel federated learning algorithm that combines clipping, heavy-ball momentum, and error feedback to achieve both optimal convergence rates for non-convex problems with heterogeneous data and near-optimal differential privacy guarantees without restrictive assumptions.

Rustem Islamov, Samuel Horvath, Aurelien Lucchi + 2 more2026-03-06🔢 math

Variational Formulation of Particle Flow

This paper presents a variational inference formulation of log-homotopy particle flow as a Fisher-Rao gradient flow, deriving Gaussian and Gaussian mixture approximations that recover the Exact Daum and Huang flow under linear Gaussian assumptions while enhancing expressiveness for multi-modal estimation.

Yinzhuang Yi, Jorge Cortés, Nikolay Atanasov2026-03-06💻 cs

Ice Cream Doesn't Cause Drowning: Benchmarking LLMs Against Statistical Pitfalls in Causal Inference

This paper introduces CausalPitfalls, a comprehensive benchmark designed to rigorously evaluate and expose the significant limitations of large language models in handling statistical causal inference pitfalls, such as Simpson's paradox, through both direct and code-assisted prompting protocols.

Jin Du, Li Chen, Xun Xian + 6 more2026-03-06💻 cs

Highly Efficient and Effective LLMs with Multi-Boolean Architectures

This paper introduces a novel framework that enables direct finetuning of large language models using multi-kernel Boolean parameters without latent weights, significantly reducing complexity while outperforming existing ultra low-bit quantization and binarization techniques.

Ba-Hien Tran, Van Minh Nguyen2026-03-06💻 cs

Enabling stratified sampling in high dimensions via nonlinear dimensionality reduction

This paper proposes a method to enable effective stratified sampling in high-dimensional spaces by using neural active manifolds to identify a one-dimensional latent space that captures model variability, allowing for the creation of input partitions that align with model level sets to significantly reduce variance in uncertainty propagation.

Gianluca Geraci, Daniele E. Schiavazzi, Andrea Zanoni2026-03-06🔢 math

Bures-Wasserstein Flow Matching for Graph Generation

This paper introduces BWFlow, a graph generation framework that overcomes the limitations of independent node-edge modeling by utilizing Bures-Wasserstein optimal transport on Markov random fields to construct a smooth, theoretically grounded probability path for the joint evolution of graph components, resulting in improved training convergence and sampling efficiency.

Keyue Jiang, Jiahao Cui, Xiaowen Dong + 1 more2026-03-06💻 cs

Dropping Just a Handful of Preferences Can Change Top Large Language Model Rankings

This paper introduces a fast method to evaluate the robustness of LLM rankings, revealing that top model positions in crowdsourced platforms like Chatbot Arena are surprisingly sensitive to the removal of a tiny fraction of preference data, whereas rankings from expert-annotated benchmarks like MT-bench remain more stable.

Jenny Y. Huang, Yunyi Shen, Dennis Wei + 1 more2026-03-06💻 cs

← Previous Next →