cs.LG papers | Gist.Science

Cooperative Game-Theoretic Credit Assignment for Multi-Agent Policy Gradients via the Core

This paper proposes CORA, a cooperative game-theoretic credit assignment method that utilizes core allocation and coalition sampling to effectively distribute global advantages among agents in multi-agent reinforcement learning, thereby overcoming the limitations of uniform sharing and enhancing coordinated optimal behavior.

Mengda Ji, Genjiu Xu, Keke Jia, Zekun Duan, Yong Qiu, Jianjun Ge, Mingqiang LiWed, 11 Ma🤖 cs.AI

Rating Quality of Diverse Time Series Data by Meta-learning from LLM Judgment

The paper proposes TSRating, a novel meta-learning framework that leverages LLMs to generate quality comparisons across diverse domains and trains an efficient TSRater model to accurately and adaptively evaluate time series data quality without requiring extensive hypergradient computations.

Shunyu Wu, Dan Li, Wenjie Feng, Haozheng Ye, Jian Lou, See-Kiong NgWed, 11 Ma🤖 cs.AI

SATURN: SAT-based Reinforcement Learning to Unleash LLMs Reasoning

The paper introduces Saturn, a reinforcement learning framework that leverages Boolean Satisfiability (SAT) problems to overcome scalability, verifiability, and difficulty control limitations in training large language models, resulting in significant reasoning improvements across SAT, math, and programming benchmarks.

Huanyu Liu, Ge Li, Jia Li, Hao Zhu, Kechi Zhang, Yihong DongWed, 11 Ma🤖 cs.AI

UltraEdit: Training-, Subject-, and Memory-Free Lifelong Editing in Language Models

The paper introduces UltraEdit, a training-, subject-, and memory-free approach for lifelong language model editing that achieves unprecedented scalability and efficiency by computing parameter shifts in a single step, enabling 7B models to be edited on consumer GPUs with over 2 million updates while outperforming existing methods in speed, memory usage, and accuracy.

Xiaojie Gu, Ziying Huang, Jia-Chen Gu, Kai ZhangWed, 11 Ma🤖 cs.AI

Stepwise Guided Policy Optimization: Coloring your Incorrect Reasoning in GRPO

This paper introduces Stepwise Guided Policy Optimization (SGPO), a framework that enhances Group Relative Policy Optimization (GRPO) by utilizing a step-wise judge model to provide learning signals from all-negative sample groups, thereby enabling large language models to learn from incorrect reasoning and improving performance across various reasoning benchmarks.

Peter Chen, Xiaopeng Li, Ziniu Li, Xi Chen, Tianyi LinWed, 11 Ma🤖 cs.AI

A Consequentialist Critique of Binary Classification Evaluation: Theory, Practice, and Tools

This paper critiques the prevalent reliance on fixed-threshold metrics in machine learning evaluation by advocating for a consequentialist framework that prioritizes proper scoring rules like the Brier score, supported by a new decision-theoretic mapping, a practical Python package called `briertools`, and a clipped Brier score variant to bridge the gap between theoretical utility and current practices.

Gerardo Flores, Abigail Schiff, Alyssa H. Smith, Julia A Fukuyama, Ashia C. WilsonWed, 11 Ma🤖 cs.AI

HyConEx: Hypernetwork classifier with counterfactual explanations for tabular data

The paper introduces HyConEx, a novel deep hypernetwork-based classifier for tabular data that uniquely integrates prediction and explanation by simultaneously generating class labels and local counterfactual examples to interpret model decisions.

Patryk Marszałek, Kamil Ksi\k{a}\.zek, Oleksii Furman, Ulvi Movsum-zada, Przemysław Spurek, Marek SmiejaWed, 11 Ma🤖 cs.AI

DRUPI: Dataset Reduction Using Privileged Information

The paper introduces DRUPI (Dataset Condensation using Privileged Information), a framework that enhances dataset condensation by synthesizing auxiliary privileged information, such as feature or attention labels, alongside reduced data to significantly improve model training performance across various benchmarks.

Shaobo Wang, Youxin Jiang, Tianle Niu, Yantai Yang, Ruiji Zhang, Shuhao Hu, Shuaiyu Zhang, Chenghao Sun, Weiya Li, Conghui He, Xuming Hu, Linfeng ZhangWed, 11 Ma🤖 cs.AI

Robust Training of Neural Networks at Arbitrary Precision and Sparsity

This paper introduces a unified framework that models quantization and sparsification as additive noise to derive a principled, noise-corrective gradient path, enabling the stable training of neural networks at arbitrary low precisions and sparsity levels without relying on heuristic estimators like the Straight-Through Estimator.

Chengxi Ye, Grace Chu, Yanfeng Liu, Yichi Zhang, Lukasz Lew, Li Zhang, Mark Sandler, Andrew HowardWed, 11 Ma🤖 cs.AI

Sparse Variational Student-t Processes for Heavy-tailed Modeling

This paper introduces Sparse Variational Student-t Processes (SVTP), a scalable framework that extends sparse inducing point methods to Student-t processes via novel inference algorithms and natural gradient optimization, achieving superior robustness to outliers and heavy-tailed data with significantly faster convergence and lower prediction error compared to sparse Gaussian processes on large datasets.

Jian Xu, Delu Zeng, John PaisleyWed, 11 Ma🤖 cs.AI

FinTexTS: Financial Text-Paired Time-Series Dataset via Semantic-Based and Multi-Level Pairing

The paper introduces FinTexTS, a large-scale financial text-paired time-series dataset constructed via a novel semantic-based and multi-level pairing framework that overcomes the limitations of simple keyword matching by leveraging LLMs to align news articles with stock prices across macro, sector, related company, and target-company levels, thereby significantly improving stock price forecasting performance.

Jaehoon Lee, Suhwan Park, Tae Yoon Lim, Seunghan Lee, Jun Seo, Dongwan Kang, Hwanil Choi, Minjae Kim, Sungdong Yoo, SoonYoung Lee, Yongjae Lee, Wonbin AhnWed, 11 Ma🤖 cs.AI

AlphaApollo: A System for Deep Agentic Reasoning

AlphaApollo is an agentic reasoning system that enhances foundation models' performance on complex, long-horizon tasks by orchestrating multi-turn agentic reasoning, turn-level reinforcement learning for tool-use optimization, and a propose-judge-update evolution loop with verification.

Zhanke Zhou, Chentao Cao, Xiao Feng, Xuan Li, Zongze Li, Xiangyu Lu, Jiangchao Yao, Weikai Huang, Tian Cheng, Jianghangfan Zhang, Tangyu Jiang, Linrui Xu, Yiming Zheng, Brando Miranda, Tongliang Liu, Sanmi Koyejo, Masashi Sugiyama, Bo HanWed, 11 Ma🤖 cs.AI

On the Impact of the Utility in Semivalue-based Data Valuation

This paper addresses the sensitivity of semivalue-based data valuation to utility selection by introducing a "spatial signature" framework that embeds data points into a lower-dimensional space, enabling a practical methodology to quantify and ensure the robustness of valuation results against utility variations.

Mélissa Tamine, Benjamin Heymann, Maxime Vono, Patrick LoiseauWed, 11 Ma🤖 cs.AI

From Data Statistics to Feature Geometry: How Correlations Shape Superposition

This paper challenges the standard view of superposition in neural networks by demonstrating that, unlike in idealized uncorrelated settings where interference is merely noise, realistic feature correlations allow models to arrange features so that interference becomes constructive, thereby naturally forming the semantic clusters and cyclical structures observed in real language models.

Lucas Prieto, Edward Stevinson, Melih Barsbey, Tolga Birdal, Pedro A. M. MedianoWed, 11 Ma🤖 cs.AI

Towards a Neural Debugger for Python

This paper introduces "neural debuggers," a new class of language models that emulate traditional debugging interactions like setting breakpoints and stepping through code to enable both forward and inverse execution prediction, thereby laying the foundation for more powerful agentic coding systems and automated debugging.

Maximilian Beck, Jonas Gehring, Jannik Kossen, Gabriel SynnaeveWed, 11 Ma🤖 cs.AI

When Learning Rates Go Wrong: Early Structural Signals in PPO Actor-Critic

This paper introduces the Overfitting-Underfitting Indicator (OUI) as an efficient, early-stage metric based on hidden neuron activation patterns to distinguish optimal learning rates in PPO actor-critic training, demonstrating its superior ability to prune unpromising runs compared to traditional criteria by revealing distinct structural signatures in actor and critic networks.

Alberto Fernández-Hernández, Cristian Pérez-Corral, Jose I. Mestre, Manuel F. Dolz, Jose Duato, Enrique S. Quintana-OrtíWed, 11 Ma🤖 cs.AI

MSSR: Memory-Aware Adaptive Replay for Continual LLM Fine-Tuning

The paper proposes MSSR, a memory-aware adaptive replay framework that estimates sample-level memory strength to dynamically schedule rehearsal intervals, effectively mitigating catastrophic forgetting while maintaining fast adaptation in continual LLM fine-tuning.

Yiyang Lu, Yu He, Jianlong Chen, Hongyuan ZhaWed, 11 Ma🤖 cs.AI

A Graph-Based Approach to Spectrum Demand Prediction Using Hierarchical Attention Networks

This paper introduces HR-GAT, a hierarchical resolution graph attention network that leverages geospatial data to predict spectrum demand with 21% higher accuracy than baseline models, effectively addressing spatial autocorrelation challenges to enable more efficient spectrum sharing and policy-making.

Mohamad Alkadamani, Halim Yanikomeroglu, Amir GhasemiWed, 11 Ma🤖 cs.AI

Correction of Transformer-Based Models with Smoothing Pseudo-Projector

This paper introduces the smoothing pseudo-projector, a lightweight, multigrid-inspired module that corrects hidden representations in transformer-based models to suppress noise from label-irrelevant inputs, thereby improving training dynamics and robustness without altering the core architecture.

Vitaly BulgakovWed, 11 Ma🤖 cs.AI

Exploiting Label-Aware Channel Scoring for Adaptive Channel Pruning in Split Learning

This paper proposes ACP-SL, an adaptive channel pruning scheme for Split Learning that utilizes a label-aware channel importance scoring module to compress smashed data, thereby significantly reducing communication overhead while improving test accuracy and training efficiency.

Jialei Tan, Zheng Lin, Xiangming Cai, Ruoxi Zhu, Zihan Fang, Pingping Chen, Wei NiWed, 11 Ma🤖 cs.AI

← Previous Next →