Why the Brain Consolidates: Predictive Forgetting for Optimal Generalisation

This paper proposes that memory consolidation serves a computational role beyond mere stabilization, utilizing "predictive forgetting" to compress stored representations into a form that optimizes generalization by selectively retaining information that predicts future outcomes, a process necessitated by high-capacity encoding constraints and validated through simulations across diverse neural and transformer models.

Zafeirios Fountas, Adnan Oomerjee, Haitham Bou-Ammar + 2 more2026-03-06💻 cs

Distributional Equivalence in Linear Non-Gaussian Latent-Variable Cyclic Causal Models: Characterization and Learning

This paper presents the first structural-assumption-free causal discovery method for linear non-Gaussian latent-variable cyclic models by establishing a graphical criterion for distributional equivalence, introducing edge rank constraints, and providing an algorithm to recover models up to this equivalence class.

Haoyue Dai, Immanuel Albrecht, Peter Spirtes + 1 more2026-03-06💻 cs

The Inductive Bias of Convolutional Neural Networks: Locality and Weight Sharing Reshape Implicit Regularization

This paper demonstrates that the architectural inductive biases of locality and weight sharing in convolutional neural networks fundamentally alter implicit regularization by coupling learned filters to low-dimensional patch manifolds, thereby enabling generalization on high-dimensional spherical data where fully connected networks provably fail.

Tongtong Liang, Esha Singh, Rahul Parhi + 2 more2026-03-06💻 cs

How Does the ReLU Activation Affect the Implicit Bias of Gradient Descent on High-dimensional Neural Network Regression?

This paper demonstrates that for high-dimensional random data, gradient descent on shallow ReLU networks exhibits an implicit bias that approximates the minimum L2L_2-norm solution with high probability, bridging the gap between worst-case non-existence and exact orthogonality results through a novel primal-dual analysis.

Kuo-Wei Lai, Guanghui Wang, Molei Tao + 1 more2026-03-06🔢 math

How important are the genes to explain the outcome - the asymmetric Shapley value as an honest importance metric for high-dimensional features

This paper proposes using asymmetric Shapley values as a superior metric for quantifying the importance of high-dimensional genomic features in clinical prediction models, addressing limitations of traditional approaches by accounting for collinearity and known causal directions, and provides efficient algorithms validated through a colorectal cancer progression study.

Mark A. van de Wiel, Jeroen Goedhart, Martin Jullum + 1 more2026-03-06🤖 cs.LG

Bayes with No Shame: Admissibility Geometries of Predictive Inference

This paper demonstrates that predictive inference is governed by four distinct, pairwise non-nested admissibility geometries—Blackwell risk dominance, anytime-valid supermartingales, marginal coverage, and Cesàro approachability—each offering a unique certificate of optimality and proving that admissibility is irreducibly relative to the chosen criterion rather than a universal property.

Nicholas G. Polson, Daniel Zantedeschi2026-03-06🔢 math

On the Statistical Optimality of Optimal Decision Trees

This paper establishes a comprehensive statistical theory for globally optimal empirical risk minimization decision trees by deriving sharp oracle inequalities and minimax optimal rates over a novel piecewise sparse heterogeneous anisotropic Besov space, thereby providing rigorous theoretical guarantees for their performance in high-dimensional regression and classification under both sub-Gaussian and heavy-tailed noise settings.

Zineng Xu, Subhroshekhar Ghosh, Yan Shuo Tan2026-03-06🔢 math

Thermodynamic Response Functions in Singular Bayesian Models

This paper establishes a unified thermodynamic response framework for singular Bayesian models, demonstrating that posterior tempering induces a hierarchy of observables that naturally interpret complex learning-theoretic quantities like the real log canonical threshold and WAIC as free-energy derivatives, thereby revealing phase-transition-like structural reorganizations in models such as neural networks and Gaussian mixtures.

Sean Plummer2026-03-06🔢 math

Sample-Optimal Locally Private Hypothesis Selection and the Provable Benefits of Interactivity

This paper presents a sample-optimal, locally differentially private algorithm for hypothesis selection that achieves the information-theoretic lower bound of Θ(k/(α2min{ε2,1}))\Theta(k/(\alpha^2 \min\{\varepsilon^2, 1\})) using only O(loglogk)O(\log \log k) rounds of interaction, thereby demonstrating the provable power of interactivity to overcome the Ω(klogk)\Omega(k \log k) sample complexity barrier inherent in non-interactive approaches.

Alireza F. Pour, Hassan Ashtiani, Shahab Asoodeh2026-03-05🤖 cs.LG