stat.ML papers | Gist.Science

Amortizing Maximum Inner Product Search with Learned Support Functions

This paper proposes "amortized MIPS," a learning-based framework that leverages the mathematical properties of support functions to train neural networks (SupportNet and KeyNet) that directly predict optimal keys for Maximum Inner Product Search, thereby amortizing computational costs for queries drawn from a fixed distribution.

Theo X. Olausson, João Monteiro, Michal Klein, Marco CuturiTue, 10 Ma🤖 cs.LG

Bayesian Transformer for Probabilistic Load Forecasting in Smart Grids

This paper proposes a Bayesian Transformer framework that integrates Monte Carlo Dropout, variational feed-forward layers, and stochastic attention into a PatchTST backbone to deliver well-calibrated, state-of-the-art probabilistic load forecasts with robust uncertainty estimates under extreme weather distributional shifts across multiple global power grids.

Sajib Debnath, Md. Uzzal MiaTue, 10 Ma🤖 cs.LG

Reject, Resample, Repeat: Understanding Parallel Reasoning in Language Model Inference

This paper introduces a particle filtering framework to rigorously analyze the accuracy-cost tradeoffs of parallel inference methods in large language models, establishing theoretical guarantees and identifying fundamental limits while demonstrating that sampling error alone does not fully predict final model accuracy.

Noah Golowich, Fan Chen, Dhruv Rohatgi, Raghav Singhal, Carles Domingo-Enrich, Dylan J. Foster, Akshay KrishnamurthyTue, 10 Ma🤖 cs.LG

Cost-Driven Representation Learning for Linear Quadratic Gaussian Control: Part II

This paper establishes finite-sample guarantees for cost-driven state representation learning in infinite-horizon time-invariant Linear Quadratic Gaussian (LQG) control by analyzing two approaches—explicit latent modeling and implicit MuZero-like dynamics—while introducing a key technical proof of persistency of excitation for a novel stochastic process arising from quadratic regression.

Yi Tian, Kaiqing Zhang, Russ Tedrake, Suvrit SraTue, 10 Ma🤖 cs.LG

Adversarial Latent-State Training for Robust Policies in Partially Observable Domains

This paper introduces an adversarial latent-initial-state POMDP framework that theoretically establishes a minimax principle and finite-sample guarantees, while empirically demonstrating that targeted adversarial training significantly reduces robustness gaps in partially observable reinforcement learning.

Angad Singh AhujaTue, 10 Ma🤖 cs.LG

Making LLMs Optimize Multi-Scenario CUDA Kernels Like Experts

This paper introduces MSKernelBench, a comprehensive benchmark covering diverse multi-scenario GPU kernels, and CUDAMaster, a multi-agent, hardware-aware system that leverages this benchmark to achieve significant speedups, often matching or surpassing closed-source libraries like cuBLAS, thereby advancing general-purpose automated CUDA kernel optimization beyond current ML-focused methods.

Yuxuan Han, Meng-Hao Guo, Zhengning Liu, Wenguang Chen, Shi-Min HuTue, 10 Ma🤖 cs.LG

Combining Adam and its Inverse Counterpart to Enhance Generalization of Deep Learning Optimizers

This paper proposes DualAdam, a novel optimizer that combines the standard Adam algorithm with its inverse variant, InvAdam, to mathematically demonstrate and empirically validate improved generalization by helping models escape sharp minima and converge to flatter ones.

Tao Shi, Liangming Chen, Long Jin, Mengchu ZhouTue, 10 Ma🤖 cs.LG

Combinatorial Allocation Bandits with Nonlinear Arm Utility

This paper introduces the Combinatorial Allocation Bandits (CAB) framework to optimize arm satisfaction in matching platforms rather than just maximizing match counts, proposing and analyzing both Upper Confidence Bound and Thompson Sampling algorithms that achieve near-optimal regret bounds under a generalized linear model.

Yuki Shibukawa, Koichi Tanaka, Yuta Saito, Shinji ItoTue, 10 Ma🤖 cs.LG

NEST: Network- and Memory-Aware Device Placement For Distributed Deep Learning

NEST is a novel device placement framework that unifies network, compute, and memory awareness through structured dynamic programming to jointly optimize hybrid parallelism strategies, achieving up to 2.43 times higher throughput and improved scalability compared to state-of-the-art baselines.

Irene Wang, Vishnu Varma Venkata, Arvind Krishnamurthy, Divya MahajanTue, 10 Ma🤖 cs.LG

Latent Autoencoder Ensemble Kalman Filter for Data assimilation

This paper proposes the Latent Autoencoder Ensemble Kalman Filter (LAE-EnKF), a novel data assimilation method that learns a stable, linear state-space model in a latent space to overcome the performance limitations of standard EnKF on strongly nonlinear and chaotic systems while maintaining computational efficiency.

Xin T. Tong, Yanyan Wang, Liang YanTue, 10 Ma🤖 cs.LG

Khatri-Rao Clustering for Data Summarization

This paper introduces the Khatri-Rao clustering paradigm, which extends traditional centroid-based methods like k-Means and deep clustering by modeling centroids as interactions of multiple succinct protocentroids, thereby achieving more compact and accurate data summaries with reduced redundancy.

Martino Ciaperoni, Collin Leiber, Aristides Gionis, Heikki MannilaTue, 10 Ma🤖 cs.LG

Time series forecasting with Hahn Kolmogorov-Arnold networks

The paper introduces HaKAN, a lightweight and interpretable time series forecasting model that leverages Hahn polynomial-based Kolmogorov-Arnold Networks (KANs) with channel independence and patching to effectively capture both global and local temporal patterns, outperforming recent state-of-the-art Transformer and MLP-based methods.

Md Zahidul Hasan, A. Ben Hamza, Nizar BouguilaThu, 12 Ma📊 stat

Sampling via Stochastic Interpolants by Langevin-based Velocity and Initialization Estimation in Flow ODEs

This paper proposes a novel sampling method for unnormalized Boltzmann densities that leverages a sequence of Langevin samplers to efficiently simulate a probability flow ODE derived from linear stochastic interpolants by generating intermediate samples and robustly estimating the velocity field, while providing theoretical convergence guarantees and demonstrating effectiveness on challenging multimodal distributions and Bayesian inference tasks.

Chenguang Duan, Yuling Jiao, Gabriele Steidl, Christian Wald, Jerry Zhijian Yang, Ruizhe ZhangThu, 12 Ma📊 stat

The Bayesian Geometry of Transformer Attention

This paper introduces "Bayesian wind tunnels" to rigorously demonstrate that small transformers perform exact Bayesian inference through a specific geometric mechanism involving residual streams as belief substrates and attention-based routing, a capability that capacity-matched MLPs fundamentally lack.

Naman Agarwal, Siddhartha R. Dalal, Vishal MisraThu, 12 Ma📊 stat

Absolute indices for determining compactness, separability and number of clusters

This paper introduces novel absolute cluster validity indices that quantify the compactness and separability of clusters to determine the true number of clusters, demonstrating their effectiveness across synthetic and real-world datasets compared to existing relative indices.

Adil M. Bagirov, Ramiz M. Aliguliyev, Nargiz Sultanova, Sona TaheriThu, 12 Ma📊 stat

An Algorithm to perform Covariance-Adjusted Support Vector Classification in Non-Euclidean Spaces

This paper proposes a Cholesky-based algorithm for Covariance-Adjusted Support Vector Classification that overcomes the sub-optimality of traditional max-margin KKT conditions in non-Euclidean spaces by incorporating class covariance structures, resulting in significantly improved classification performance across multiple datasets.

Satyajeet Sahoo, Jhareswar MaitiThu, 12 Ma📊 stat

Optimal Transport Aggregation for Distributed Mixture-of-Experts

This paper proposes an optimal transport-based aggregation framework that efficiently combines locally trained Mixture-of-Experts models into a global estimator with a single communication step, achieving performance comparable to centralized training while significantly reducing computational and communication costs.

Faïcel Chamroukhi, Nhat Thien PhamThu, 12 Ma📊 stat

Disjunctive Branch-and-Bound for Certifiably Optimal Low-Rank Matrix Completion

This paper introduces a disjunctive branch-and-bound framework combined with novel convex relaxations to solve low-rank matrix completion problems to certifiable optimality, significantly reducing optimality and test errors compared to existing heuristic methods for matrices up to 2500 dimensions.

Dimitris Bertsimas, Ryan Cory-Wright, Sean Lo, Jean PauphiletThu, 12 Ma📊 stat

Online LLM watermark detection via e-processes

This paper introduces a unified framework for online LLM watermark detection based on e-processes, which provides anytime-valid statistical guarantees and enhances detection power through empirically adaptive methods applicable to various sequential testing problems.

Weijie Su, Ruodu Wang, Zinan ZhaoThu, 12 Ma📊 stat

Tensor Train Completion from Fiberwise Observations Along a Single Mode

This paper proposes a fast, deterministic tensor completion method that leverages standard linear algebra to recover tensors from fiberwise observations along a single mode, offering efficient recovery guarantees without relying on random sampling assumptions.

Shakir Showkat Sofi, Lieven De LathauwerThu, 12 Ma⚡ eess

← Previous Next →