cs.LG papers | Gist.Science

Function Induction and Task Generalization: An Interpretability Study with Off-by-One Addition

This paper employs interpretability techniques on the off-by-one addition task to reveal that large language models achieve task-level generalization through a reusable "function induction" mechanism, where multiple attention heads collaboratively learn and compose abstract functions to solve unseen problems.

Qinyuan Ye, Robin Jia, Xiang Ren2026-03-05🤖 cs.AI

Finite-Dimensional Gaussian Approximation for Deep Neural Networks: Universality in Random Weights

This paper establishes Gaussian approximation bounds for the finite-dimensional distributions of deep neural networks with randomly initialized weights and Lipschitz activations, proving convergence to a Gaussian limit as layer widths grow and deriving specific convergence rates that depend on the network depth.

Krishnakumar Balasubramanian, Nathan Ross2026-03-05🤖 cs.LG

Self-Supervised Inductive Logic Programming

This paper introduces "Poker," a new self-supervised Inductive Logic Programming system that learns recursive logic programs from positive examples and a general second-order background theory by automatically generating and labeling synthetic negative examples, thereby overcoming the need for expert-curated negative data and task-specific background theories that limit existing methods like Louise.

Stassa Patsantzis2026-03-05🤖 cs.AI

Effective Sample Size and Generalization Bounds for Temporal Networks

This paper proposes a dependence-aware evaluation methodology for Temporal Convolutional Networks that controls for effective sample size rather than raw sequence length, providing new generalization bounds and demonstrating that stronger temporal dependence can paradoxically reduce generalization gaps when properly accounted for.

Barak Gahtan, Alex M. Bronstein2026-03-05🤖 cs.AI

ObfusQAte: A Proposed Framework to Evaluate LLM Robustness on Obfuscated Factual Question Answering

This paper introduces ObfusQAte and the ObfusQA framework, a novel benchmark designed to evaluate Large Language Model robustness by testing their performance on factual questions subjected to multi-tiered obfuscation, revealing that models often fail or hallucinate when faced with nuanced linguistic variations.

Shubhra Ghosh, Abhilekh Borah, Aditya Kumar Guru + 1 more2026-03-05🤖 cs.AI

Subsampling Factorization Machine Annealing

This paper introduces Subsampling Factorization Machine Annealing (SFMA), an enhanced optimization algorithm that utilizes probabilistic training on subsampled datasets to achieve a superior balance between exploration and exploitation, thereby outperforming standard Factorization Machine Annealing in speed, accuracy, and scalability for large-scale black-box optimization problems.

Yusuke Hama, Tadashi Kadowaki2026-03-05⚛️ quant-ph

On the Generalization Limits of Quantum Generative Adversarial Networks with Pure State Generators

This paper demonstrates through numerical experiments and analytical derivation that Quantum Generative Adversarial Networks (QGANs) with pure-state generators fail to generalize beyond the average training data representation, a limitation theoretically explained by a fidelity-based lower bound on discriminator quality.

Jasmin Frkatovic, Akash Malemath, Ivan Kankeu + 7 more2026-03-05⚛️ quant-ph

Zono-Conformal Prediction: Zonotope-Based Uncertainty Quantification for Regression and Classification Tasks

This paper introduces zono-conformal prediction, a data-efficient method that utilizes zonotopes to generate less conservative, statistically valid uncertainty sets for both regression and classification tasks by directly embedding these sets into the base predictor via a single linear program.

Laura Lützow, Michael Eichelbeck, Mykel J. Kochenderfer + 1 more2026-03-05🤖 cs.AI

Adaptive Quantized Planetary Crater Detection System for Autonomous Space Exploration

This foundational concept paper proposes the Adaptive Quantized Planetary Crater Detection System (AQ-PCDSys), an architecture integrating quantized neural networks and adaptive multi-sensor fusion to enable real-time, high-fidelity crater detection on resource-constrained, radiation-hardened space exploration hardware.

Aditri Paul, Archan Paul2026-03-05🤖 cs.AI

Performance Assessment Strategies for Generative AI Applications in Healthcare

This paper discusses the limitations of current quantitative benchmarks for evaluating Generative AI in healthcare and advocates for comprehensive assessment strategies that incorporate clinical context, human expertise, and cost-effective computational models to ensure generalizability in real-world medical environments.

Victor Garcia, Mariia Sidulova, Aldo Badano2026-03-05🤖 cs.AI

QDFlow: A Python package for physics simulations of quantum dot devices

QDFlow is an open-source Python package that simulates realistic quantum dot device data with ground-truth labels by combining a self-consistent Thomas-Fermi solver, dynamic capacitance modeling, and customizable noise modules to address the scarcity of experimental datasets needed for machine learning applications.

Donovan L. Buterakos, Sandesh S. Kalantre, Joshua Ziegler + 2 more2026-03-05⚛️ quant-ph

Segment-to-Act: Label-Noise-Robust Action-Prompted Video Segmentation Towards Embodied Intelligence

This paper addresses the unexplored challenge of label noise in action-based video object segmentation by introducing the ActiSeg-NL benchmark, analyzing the impact of textual and mask annotation noise, and proposing a Parallel Mask Head Mechanism to enhance robustness for embodied intelligence applications.

Wenxin Li, Kunyu Peng, Di Wen + 4 more2026-03-05🤖 cs.LG

Deep Hierarchical Learning with Nested Subspace Networks for Large Language Models

This paper introduces Nested Subspace Networks (NSNs), a novel architectural paradigm that re-parameterizes linear layers to enable a single large language model to be dynamically adjusted across a continuous spectrum of compute budgets at inference time, achieving a smooth and predictable trade-off between efficiency and performance without requiring retraining or multiple specialist models.

Paulius Rauba, Mihaela van der Schaar2026-03-05🤖 cs.LG

Bridging Computational Social Science and Deep Learning: Cultural Dissemination-Inspired Graph Neural Networks

This paper introduces AxelGNN, a novel Graph Neural Network architecture inspired by Axelrod's cultural dissemination model that utilizes similarity-gated interactions, segment-wise feature copying, and global polarization to effectively address oversmoothing and heterophily challenges while achieving competitive performance across diverse graph types.

Asela Hevapathige2026-03-05🤖 cs.AI

Best-of- $\infty$ -- Asymptotic Performance of Test-Time LLM Ensembling

This paper analyzes the asymptotic performance of best-of- $N$ LLM ensembling via majority voting as $N \to \infty$ , proposing an adaptive generation scheme to efficiently allocate inference budgets and an optimal weighted ensemble method formulated as a mixed-integer linear program to outperform individual models.

Junpei Komiyama, Daisuke Oba, Masafumi Oyamada2026-03-05🤖 cs.AI

CAD-Tokenizer: Towards Text-based CAD Prototyping via Modality-Specific Tokenization

The paper proposes CAD-Tokenizer, a framework that employs modality-specific tokenization via a sequence-based VQ-VAE to overcome the limitations of standard LLM tokenizers, thereby significantly enhancing the quality and instruction-following capabilities of unified text-guided CAD prototyping.

Ruiyu Wang, Shizhao Sun, Weijian Ma + 1 more2026-03-05🤖 cs.LG

Talking Trees: Reasoning-Assisted Induction of Decision Trees for Tabular Data

This paper proposes a lightweight, interpretable approach where reasoning-capable LLMs act as agents to induce decision trees for small tabular datasets, achieving competitive performance with state-of-the-art black-box models while offering human-readable reasoning traces and the ability to incorporate fairness and monotonicity constraints.

George Yakushev, Alina Shutova, Ivan Rubachev + 3 more2026-03-05🤖 cs.LG

Scalable Second-order Riemannian Optimization for $K$ -means Clustering

This paper proposes a scalable second-order cubic-regularized Riemannian Newton algorithm for $K$ -means clustering that reformulates the problem as a smooth unconstrained optimization on a product manifold, enabling linear-time subproblem solutions and achieving faster convergence with optimal statistical accuracy compared to state-of-the-art first-order methods.

Peng Xu, Chun-Ying Hou, Xiaohui Chen + 1 more2026-03-05🤖 cs.LG

Erase or Hide? Suppressing Spurious Unlearning Neurons for Robust Unlearning

This paper introduces Ssiuu, a novel unlearning method that employs attribution-guided regularization to eliminate spurious neurons and ensure the faithful, robust removal of sensitive knowledge from large language models, thereby preventing its resurfacing during subsequent retraining.

Nakyeong Yang, Dong-Kyum Kim, Jea Kwon + 3 more2026-03-05🤖 cs.LG

The Lie of the Average: How Class Incremental Learning Evaluation Deceives You?

This paper argues that mainstream Class Incremental Learning evaluation protocols are biased due to insufficient sequence sampling, and proposes EDGE, a new protocol that leverages inter-task similarity to identify extreme sequences for accurately characterizing the full performance distribution.

Guannan Lai, Da-Wei Zhou, Xin Yang + 1 more2026-03-05🤖 cs.LG

← Previous Next →

cs.LG