cs.LG papers | Gist.Science

EsoLang-Bench: Evaluating Genuine Reasoning in Large Language Models via Esoteric Programming Languages

The paper introduces EsoLang-Bench, a novel benchmark utilizing esoteric programming languages to expose the limitations of large language models' genuine reasoning capabilities by revealing a dramatic performance gap between their high scores on standard benchmarks and near-zero accuracy on tasks requiring the acquisition of new languages through documentation and experimentation rather than memorization.

Aman Sharma, Paras Chopra2026-03-11🤖 cs.AI

On Catastrophic Forgetting in Low-Rank Decomposition-Based Parameter-Efficient Fine-Tuning

This paper empirically demonstrates that catastrophic forgetting in low-rank decomposition-based parameter-efficient fine-tuning is primarily driven by update subspace geometry, revealing that tensor-based and structurally aligned methods outperform traditional shared matrix approaches in sequential learning scenarios.

Muhammad Ahmad, Jingjing Zheng, Yankai Cao2026-03-11🤖 cs.LG

ActiveUltraFeedback: Efficient Preference Data Generation using Active Learning

The paper introduces ActiveUltraFeedback, an efficient active learning pipeline that leverages uncertainty estimates and novel selection strategies like Double Reverse Thompson Sampling to generate high-quality preference data, enabling Large Language Models to achieve superior alignment performance with as little as one-sixth of the annotated data required by static baselines.

Davit Melikidze, Marian Schneider, Jessica Lam, Martin Wertich, Ido Hakimi, Barna Pásztor, Andreas Krause2026-03-11🤖 cs.AI

Physics-informed neural operator for predictive parametric phase-field modelling

This paper introduces PF-PINO, a physics-informed neural operator framework that embeds phase-field governing equation residuals into the training loss to significantly improve the accuracy, generalization, and long-term stability of predictive parametric phase-field modelling compared to conventional methods.

Nanxi Chen, Airong Chen, Rujin Ma2026-03-11🔬 cond-mat.mtrl-sci

Mousse: Rectifying the Geometry of Muon with Curvature-Aware Preconditioning

Mousse is a novel optimizer that improves upon the Muon algorithm by integrating Shampoo's Kronecker-factored preconditioning to adaptively handle the heavy-tailed curvature of deep neural networks, thereby achieving faster training convergence with negligible computational overhead.

Yechen Zhang, Shuhao Xing, Junhao Huang, Kai Lv, Yunhua Zhou, Xipeng Qiu, Qipeng Guo, Kai Chen2026-03-11🤖 cs.AI

A Multi-Prototype-Guided Federated Knowledge Distillation Approach in AI-RAN Enabled Multi-Access Edge Computing System

This paper proposes a Multi-Prototype-Guided Federated Knowledge Distillation (MP-FedKD) approach for AI-RAN enabled Multi-Access Edge Computing systems, which addresses non-IID data challenges and mitigates information loss from single-prototype averaging by integrating self-knowledge distillation, a conditional hierarchical agglomerative clustering strategy, and a novel loss function to outperform state-of-the-art baselines in accuracy and error metrics.

Luyao Zou, Hayoung Oh, Chu Myaet Thwal, Apurba Adhikary, Seohyeon Hong, Zhu Han2026-03-11🤖 cs.LG

Upper Generalization Bounds for Neural Oscillators

This paper derives upper PAC generalization bounds for neural oscillators based on second-order ODEs and MLPs, demonstrating that their estimation errors grow polynomially with model size and time while showing that constraining MLP Lipschitz constants via regularization enhances generalization performance in modeling nonlinear structural systems.

Zifeng Huang, Konstantin M. Zuev, Yong Xia, Michael Beer2026-03-11🤖 cs.LG

Global universality via discrete-time signatures

This paper establishes global universal approximation theorems for path-dependent functionals on spaces of piecewise linear paths using linear functionals of discrete-time signatures, demonstrating their applicability to Brownian motion-driven systems such as random and stochastic ordinary differential equations.

Mihriban Ceylan, David J. Prömel2026-03-11🤖 cs.LG

What is Missing? Explaining Neurons Activated by Absent Concepts

This paper identifies that deep neural networks frequently encode the absence of concepts to drive neuron activation—a phenomenon largely overlooked by standard explainable AI methods—and proposes simple extensions to attribution and feature visualization techniques to effectively reveal and leverage these "missing" concepts for better model interpretation and debiasing.

Robin Hesse, Simone Schaub-Meyer, Janina Hesse, Bernt Schiele, Stefan Roth2026-03-11🤖 cs.LG

A Hybrid Quantum-Classical Framework for Financial Volatility Forecasting Based on Quantum Circuit Born Machines

This paper proposes and validates a hybrid quantum-classical framework that integrates a Long Short-Term Memory (LSTM) network with a Quantum Circuit Born Machine (QCBM) to significantly improve financial volatility forecasting accuracy on high-frequency stock market data compared to traditional classical models.

Yixiong Chen2026-03-11⚛️ quant-ph

Exploiting Label-Aware Channel Scoring for Adaptive Channel Pruning in Split Learning

This paper proposes ACP-SL, an adaptive channel pruning scheme for Split Learning that utilizes a label-aware channel importance scoring module to compress smashed data, thereby significantly reducing communication overhead while improving test accuracy and training efficiency.

Jialei Tan, Zheng Lin, Xiangming Cai, Ruoxi Zhu, Zihan Fang, Pingping Chen, Wei Ni2026-03-11🤖 cs.AI

Information Theoretic Bayesian Optimization over the Probability Simplex

This paper introduces $\alpha$ -GaBO, a novel family of Bayesian optimization algorithms that leverages information geometry to construct Matérn kernels and geometric optimizers tailored for the probability simplex, demonstrating superior performance over constrained Euclidean approaches in optimizing mixtures and robotic control tasks.

Federico Pavesi, Antonio Candelieri, Noémie Jaquier2026-03-11🤖 cs.LG

Good Reasoning Makes Good Demonstrations: Implicit Reasoning Quality Supervision via In-Context Reinforcement Learning

This paper introduces In-Context RLVR, a method that leverages a model's own in-context learning ability to measure "Demonstration Utility" via Evidence Gain, thereby implicitly reweighting rewards to prioritize high-quality reasoning traces over merely correct but flawed solutions during Reinforcement Learning with Verifiable Rewards training.

Tiehua Mei, Minxuan Lv, Leiyu Pan, Zhenpeng Su, Hongru Hou, Hengrui Chen, Ao Xu, Deqing Yang2026-03-11🤖 cs.LG

Correction of Transformer-Based Models with Smoothing Pseudo-Projector

This paper introduces the smoothing pseudo-projector, a lightweight, multigrid-inspired module that corrects hidden representations in transformer-based models to suppress noise from label-irrelevant inputs, thereby improving training dynamics and robustness without altering the core architecture.

Vitaly Bulgakov2026-03-11🤖 cs.AI

A Unified Hierarchical Multi-Task Multi-Fidelity Framework for Data-Efficient Surrogate Modeling in Manufacturing

This paper proposes a novel hierarchical multi-task multi-fidelity (H-MT-MF) framework for Gaussian process-based surrogate modeling that unifies inter-task information sharing and fidelity-dependent uncertainty handling to significantly improve prediction accuracy and data efficiency in manufacturing systems with heterogeneous data sources.

Manan Mehta, Zhiqiao Dong, Yuhang Yang, Chenhui Shao2026-03-11🤖 cs.LG

A Graph-Based Approach to Spectrum Demand Prediction Using Hierarchical Attention Networks

This paper introduces HR-GAT, a hierarchical resolution graph attention network that leverages geospatial data to predict spectrum demand with 21% higher accuracy than baseline models, effectively addressing spatial autocorrelation challenges to enable more efficient spectrum sharing and policy-making.

Mohamad Alkadamani, Halim Yanikomeroglu, Amir Ghasemi2026-03-11🤖 cs.AI

GAST: Gradient-aligned Sparse Tuning of Large Language Models with Data-layer Selection

The paper proposes GAST, a novel Parameter-Efficient Fine-Tuning method that unifies data-layer selection and layer-sparse strategies to adaptively match impactful data points with specific model layers, thereby overcoming the limitations of existing single-dimension approaches and achieving superior performance.

Kai Yao, Zhenghan Song, Kaixin Wu, Mingjie Zhong, Danzhao Cheng, Zhaorui Tan, Yixin Ji, Penglei Gao2026-03-11🤖 cs.LG

CarbonBench: A Global Benchmark for Upscaling of Carbon Fluxes Using Zero-Shot Learning

The paper introduces CarbonBench, the first standardized benchmark comprising over 1.3 million global observations from 567 sites, designed to rigorously evaluate and compare zero-shot spatial transfer learning methods for upscaling terrestrial carbon fluxes across diverse, unseen ecosystems and climate regimes.

Aleksei Rozanov, Arvind Renganathan, Yimeng Zhang, Vipin Kumar2026-03-11🤖 cs.LG

MSSR: Memory-Aware Adaptive Replay for Continual LLM Fine-Tuning

The paper proposes MSSR, a memory-aware adaptive replay framework that estimates sample-level memory strength to dynamically schedule rehearsal intervals, effectively mitigating catastrophic forgetting while maintaining fast adaptation in continual LLM fine-tuning.

Yiyang Lu, Yu He, Jianlong Chen, Hongyuan Zha2026-03-11🤖 cs.AI

OptEMA: Adaptive Exponential Moving Average for Stochastic Optimization with Zero-Noise Optimality

This paper introduces OptEMA, a novel adaptive Exponential Moving Average optimizer that achieves nearly optimal convergence rates in both stochastic and zero-noise regimes without requiring prior knowledge of Lipschitz constants or manual hyperparameter tuning.

Ganzhao Yuan2026-03-11🤖 cs.LG

← Previous Next →