cs.LG papers | Gist.Science

AMiD: Knowledge Distillation for LLMs with $α$ -mixture Assistant Distribution

This paper introduces AMiD, a unified framework for knowledge distillation in large language models that employs a novel $\alpha$ -mixture assistant distribution to systematically generalize the interpolation path and divergence, thereby overcoming training instability and achieving superior performance compared to previous fragmented approaches.

Donghyeok Shin, Yeongmin Kim, Suhyeon Jo + 2 more2026-03-05🤖 cs.AI

Buzz, Choose, Forget: A Meta-Bandit Framework for Bee-Like Decision Making

This paper introduces MAYA, a sequential imitation learning model based on multi-armed bandits that effectively reproduces and predicts individual bees' foraging decisions by accounting for their limited memory through an optimal temporal window, outperforming existing baselines while offering interpretability for ecological applications.

Emmanuelle Claeys, Elena Kerjean, Jean-Michel Loubes2026-03-05🤖 cs.LG

Key and Value Weights Are Probably All You Need: On the Necessity of the Query, Key, Value weight Triplet in Self-Attention Transformers

This paper theoretically proves and empirically validates that removing Query weights from self-attention mechanisms in transformers is sufficient to maintain performance while reducing parameters by 25% and providing implicit regularization through simplified optimization.

Marko Karbevski, Antonij Mijoski2026-03-05🤖 cs.AI

Implicit Bias of Per-sample Adam on Separable Data: Departure from the Full-batch Regime

This paper demonstrates that the implicit bias of per-sample Adam on separable data can deviate from the full-batch $\ell_\infty$ -max-margin behavior, potentially converging to the $\ell_2$ -max-margin classifier or a data-adaptive Mahalanobis-norm margin depending on the dataset, whereas Signum consistently converges to the $\ell_\infty$ -max-margin regardless of batch size.

Beomhan Baek, Minhak Song, Chulhee Yun2026-03-05🤖 cs.AI

CNFP: Optimizing Cloud-Native Network Function Placement with Diffusion Models on the Cloud Continuum

This paper proposes CNFP, a novel diffusion-based framework that leverages Denoising Diffusion Probabilistic Models and Graph Neural Networks to efficiently generate feasible, high-quality placements for Cloud-Native Network Functions across the cloud continuum, overcoming the scalability and generalization limitations of traditional optimization methods.

Álvaro Vázquez Rodríguez, Manuel Fernández-Veiga, Carlos Giraldo-Rodríguez2026-03-05🤖 cs.LG

NeuCLIP: Efficient Large-Scale CLIP Training with Neural Normalizer Optimization

NeuCLIP is a novel optimization framework that reformulates the contrastive loss using convex and variational analysis to replace inefficient per-sample normalizer estimators with a compact neural network, enabling more accurate and efficient large-scale CLIP training across datasets ranging from millions to billions of samples.

Xiyuan Wei, Chih-Jen Lin, Tianbao Yang2026-03-05🤖 cs.LG

Implicit Bias of the JKO Scheme

This paper characterizes the second-order implicit bias of the Jordan-Kinderlehrer-Otto (JKO) scheme as a Wasserstein gradient flow on a modified energy functional that subtracts a term proportional to the squared metric curvature of the original energy, thereby explaining the scheme's unique stability and dissipation properties through its deceleration in directions of rapidly changing curvature.

Peter Halmos, Boris Hanin2026-03-05🤖 cs.AI

Better audio representations are more brain-like: linking model-brain alignment with performance in downstream auditory tasks

This study demonstrates that recent self-supervised audio models with superior performance on diverse downstream tasks exhibit stronger alignment with human auditory cortex activity, suggesting that brain-like representations emerge naturally as a byproduct of learning to reconstruct naturalistic audio data.

Leonardo Pepino, Pablo Riera, Juan Kamienkowski + 1 more2026-03-05🤖 cs.LG

EnECG: Efficient Ensemble Learning for Electrocardiogram Multi-task Foundation Model

EnECG is an efficient ensemble learning framework that integrates multiple specialized foundation models using a lightweight Low-Rank Adaptation (LoRA) strategy and a Mixture of Experts mechanism to achieve high-performance, multi-task ECG analysis while significantly reducing computational costs.

Yuhao Xu, Xiaoda Wang, Jiaying Lu + 6 more2026-03-05🤖 cs.AI

Soft Quality-Diversity Optimization

This paper introduces "Soft QD," a novel differentiable framework for Quality-Diversity optimization that eliminates the need for discrete behavior space discretization, thereby addressing scalability challenges in high-dimensional problems and enabling the development of the competitive SQUAD algorithm.

Saeed Hedayatian, Stefanos Nikolaidis2026-03-05🤖 cs.LG

Weight Space Representation Learning via Neural Field Adaptation

This paper proposes that constraining neural field optimization through low-rank adaptation (LoRA) induces structured, high-quality weight space representations that outperform existing methods in reconstruction, generation, and analysis tasks.

Zhuoqian Yang, Mathieu Salzmann, Sabine Süsstrunk2026-03-05🤖 cs.AI

ceLLMate: Sandboxing Browser AI Agents

The paper introduces ceLLMate, a browser-level sandboxing framework that mitigates prompt injection attacks in browser-using AI agents by enforcing security policies at the HTTP layer to bridge the semantic gap between low-level UI actions and network requests, achieving effective protection with minimal latency overhead.

Luoxi Meng, Henry Feng, Ilia Shumailov + 1 more2026-03-05🤖 cs.LG

NRR-Core: Non-Resolution Reasoning as a Computational Framework for Contextual Identity and Ambiguity Preservation

This paper proposes Non-Resolution Reasoning (NRR), a novel computational framework that challenges the premature ambiguity collapse in current AI by introducing principles of non-identity and parallel interpretation retention to preserve contextual identity and maintain multiple valid meanings until resolution is explicitly required.

Kei Saito2026-03-05🤖 cs.AI

Learning under Distributional Drift: Prequential Reproducibility as an Intrinsic Statistical Resource

This paper introduces an intrinsic drift budget based on Fisher-Rao distance to quantify cumulative distributional motion in closed-loop learning, establishing tight prequential reproducibility bounds that reveal an irreducible accuracy floor proportional to the average drift rate.

Sofiya Zaichyk2026-03-05🤖 cs.LG

BumpNet: A Sparse MLP Framework for Learning PDE Solutions

The paper introduces BumpNet, a sparse multilayer perceptron framework that utilizes trainable sigmoid-based basis functions to efficiently solve partial differential equations and learn operators through universal approximation capabilities, while integrating with architectures like PINNs, EDNNs, and DeepONets.

Shao-Ting Chiu, Ioannis G. Kevrekidis, Ulisses Braga-Neto2026-03-05🤖 cs.LG

Online Robust Reinforcement Learning with General Function Approximation

This paper proposes a fully online distributionally robust reinforcement learning algorithm with general function approximation that learns robust policies through interaction alone, achieving sublinear regret bounds independent of state and action space sizes by leveraging the robust Bellman-Eluder dimension.

Debamita Ghosh, George K. Atia, Yue Wang2026-03-05🤖 cs.LG

OASI: Objective-Aware Surrogate Initialization for Multi-Objective Bayesian Optimization in TinyML Keyword Spotting

This paper proposes Objective-Aware Surrogate Initialization (OASI), a method that seeds multi-objective Bayesian optimization with Pareto-biased solutions to efficiently discover memory-feasible, high-accuracy keyword spotting models for resource-constrained TinyML hardware.

Soumen Garai, Danilo Pau, Suman Samui2026-03-05🤖 cs.LG

Generalization of RLVR Using Causal Reasoning as a Testbed

This paper demonstrates that Reinforcement Learning with Verifiable Rewards (RLVR) significantly enhances the generalization of causal reasoning in large language models compared to supervised fine-tuning, but only when the models possess sufficient initial reasoning competence and are trained on specific combinations of model scale and query complexity.

Brian Lu, Hongyu Zhao, Shuo Sun + 3 more2026-03-05🤖 cs.AI

Deterministic Coreset for Lp Subspace

This paper presents the first deterministic iterative algorithm for constructing an $\varepsilon$ -coreset that guarantees $\ell_p$ subspace embedding for any $p \in [1,\infty)$ , achieving an optimal size of $O(d^{\max\{1,p/2\}}/\varepsilon^2)$ by removing previously necessary logarithmic factors and enabling deterministic solutions for $\ell_p$ regression.

Rachit Chhaya, Anirban Dasgupta, Dan Feldman + 1 more2026-03-05🤖 cs.LG

Succeeding at Scale: Automated Dataset Construction and Query-Side Adaptation for Multi-Tenant Search

This paper introduces DevRev-Search, an automated benchmark for technical support retrieval, and proposes an Index-Preserving Adaptation strategy that fine-tunes only the query encoder to achieve scalable, high-performance multi-tenant search without the prohibitive cost of re-indexing.

Prateek Jain, Shabari S Nair, Ritesh Goru + 4 more2026-03-05🤖 cs.AI

← Previous Next →

cs.LG

AMiD: Knowledge Distillation for LLMs with ααα-mixture Assistant Distribution