cs.LG papers | Gist.Science

RePo: Language Models with Context Re-Positioning

This paper introduces RePo, a novel mechanism that leverages a differentiable module to dynamically re-position tokens based on contextual dependencies rather than fixed linear indices, thereby reducing extraneous cognitive load and enhancing LLM performance on tasks involving noisy contexts, structured data, and long-range dependencies.

Huayang Li, Tianyu Zhao, Deng Cai + 1 more2026-03-06💻 cs

Agentic Multi-Persona Framework for Evidence-Aware Fake News Detection

The paper introduces AMPEND-LS, an agentic multi-persona framework that synergizes Large Language Models and Small Language Models with evidence-grounded reasoning to achieve state-of-the-art, explainable, and robust multimodal fake news detection.

Roopa Bukke, Soumya Pandey, Suraj Kumar + 2 more2026-03-06💻 cs

Parallel Token Prediction for Language Models

This paper introduces Parallel Token Prediction (PTP), a framework that accelerates language model inference by predicting multiple tokens in a single forward pass through deterministic functions of random input variables, achieving a 2.4x speedup over autoregressive decoding.

Felix Draxler, Justus Will, Farrin Marouf Sofian + 3 more2026-03-06💻 cs

Uncertainty-Aware Flow Field Reconstruction Using SVGP Kolmogorov-Arnold Networks

This paper introduces an SVGP-KAN framework that reconstructs time-resolved flow fields from sparse velocimetry data with accuracy comparable to classical methods while providing well-calibrated epistemic uncertainty estimates to guide experimental design.

Y. Sungtaek Ju2026-03-06🔬 physics

Prediction of Cellular Malignancy Using Electrical Impedance Signatures and Supervised Machine Learning

This study demonstrates that integrating cellular bioelectrical impedance signatures with supervised machine learning, particularly Random Forest, enables highly accurate (~90%) prediction of cellular malignancy, offering a promising foundation for future real-time diagnostic tools.

Shadeeb Hossain2026-03-06💻 cs

Controlled LLM Training on Spectral Sphere

The paper introduces the Spectral Sphere Optimizer (SSO), a novel parallel training algorithm that enforces strict module-wise spectral constraints on both weights and updates to achieve full Maximal Update Parametrization alignment, resulting in superior convergence, stability, and performance across diverse large-scale architectures compared to AdamW and Muon.

Tian Xie, Haoming Luo, Haoyu Tang + 9 more2026-03-06💻 cs

BPE: Behavioral Profiling Ensemble

The paper proposes the Behavioral Profiling Ensemble (BPE) framework, a model-centric approach that constructs intrinsic behavioral profiles for base learners to dynamically derive aggregation weights, demonstrating superior predictive accuracy and efficiency over state-of-the-art Dynamic Ensemble Selection methods across 42 real-world datasets.

Yanxin Liu, Yunqi Zhang2026-03-06💻 cs

EmboTeam: Grounding LLM Reasoning into Reactive Behavior Trees via PDDL for Embodied Multi-Robot Collaboration

EmboTeam is a novel framework that enhances embodied multi-robot collaboration by cascading LLM-based instruction parsing into formal PDDL planning and reactive behavior tree execution, achieving significantly higher task success rates on the new MACE-THOR benchmark compared to existing baselines.

Haishan Zeng, Mengna Wang, Peng Li2026-03-06💻 cs

ButterflyMoE: Sub-Linear Ternary Experts via Structured Butterfly Orbits

ButterflyMoE achieves sub-linear memory scaling for Mixture-of-Experts models on edge devices by representing diverse experts as geometric rotations of a shared ternary substrate, enabling a 150 $\times$ memory reduction with negligible accuracy loss.

Aryan Karmore2026-03-06💻 cs

Yuan3.0 Ultra: A Trillion-Parameter Enterprise-Oriented MoE LLM

This paper introduces Yuan3.0 Ultra, an open-source, trillion-parameter Mixture-of-Experts large language model that utilizes a novel Layer-Adaptive Expert Pruning algorithm to significantly improve pre-training efficiency and reduce model size while achieving state-of-the-art performance on enterprise-oriented benchmarks.

YuanLab. ai, :, Shawn Wu + 25 more2026-03-06💻 cs

Agentic Very Long Video Understanding

This paper introduces EGAgent, an agentic framework that leverages entity scene graphs and hybrid search tools to enable state-of-the-art compositional reasoning and recall over continuous, multi-day egocentric video streams, addressing the limitations of existing models in long-horizon video understanding.

Aniket Rege, Arka Sadhu, Yuliang Li + 5 more2026-03-06💻 cs

Self-Distilled Reasoner: On-Policy Self-Distillation for Large Language Models

This paper introduces On-Policy Self-Distillation (OPSD), a framework where a single large language model acts as both teacher and student by leveraging privileged reasoning traces to supervise its own weaker policy, thereby achieving superior mathematical reasoning performance and significantly higher token efficiency compared to traditional off-policy distillation and reinforcement learning methods.

Siyan Zhao, Zhihui Xie, Mengchen Liu + 4 more2026-03-06💻 cs

A Scalable Inter-edge Correlation Modeling in CopulaGNN for Link Sign Prediction

This paper proposes a scalable extension of CopulaGNN for link sign prediction that overcomes computational intractability by representing edge correlations via a Gramian of edge embeddings and reformulating conditional probabilities, thereby achieving linear convergence and competitive performance on signed graphs.

Jinkyu Sung, Myunggeum Jee, Joonseok Lee2026-03-06💻 cs

Improved Convergence Rates of Muon Optimizer for Nonconvex Optimization

This paper establishes sharper convergence guarantees for the Muon optimizer by providing a direct, simplified analysis that achieves faster convergence rates under broader problem settings than existing restrictive theoretical frameworks.

Shuntaro Nagashima, Hideaki Iiduka2026-03-06🔢 math

Latent-IMH: Efficient Bayesian Inference for Inverse Problems with Approximate Operators

This paper introduces Latent-IMH, an efficient Bayesian sampling method for inverse problems with expensive operators that leverages cost-effective approximations to shift computational costs to an offline phase, achieving significantly faster performance than state-of-the-art methods like NUTS.

Youguang Chen, George Biros2026-03-06🔢 math

Mobility-Embedded POIs: Learning What A Place Is and How It Is Used from Human Movement

This paper introduces Mobility-Embedded POIs (ME-POIs), a framework that enhances general-purpose point-of-interest representations by integrating large-scale human mobility data with language model embeddings to capture both place identity and real-world usage functions, thereby outperforming existing text-only and mobility-only baselines across diverse map enrichment tasks.

Maria Despoina Siampou, Shushman Choudhury, Shang-Ling Hsu + 2 more2026-03-06💻 cs

YuriiFormer: A Suite of Nesterov-Accelerated Transformers

This paper proposes a variational framework that interprets transformer layers as optimization iterations, enabling the design of a Nesterov-accelerated transformer architecture that outperforms standard baselines on language modeling tasks.

Aleksandr Zimin, Yury Polyanskiy, Philippe Rigollet2026-03-06🔢 math

MiTA Attention: Efficient Fast-Weight Scaling via a Mixture of Top-k Activations

This paper proposes MiTA attention, a unified framework that efficiently scales fast weights in Transformers by compressing the N-width MLP into a narrower one and constructing deformable experts via a Mixture of Top-k Activations strategy, thereby enabling effective handling of extremely long sequences.

Qishuai Wen, Zhiyuan Huang, Xianghan Meng + 2 more2026-03-06💻 cs

Adaptive Rollout Allocation for Online Reinforcement Learning with Verifiable Rewards

This paper introduces VIP, a Variance-Informed Predictive allocation strategy that dynamically optimizes rollout distribution across training prompts using Gaussian process-based variance estimation to minimize gradient variance and significantly improve sampling efficiency in online reinforcement learning with verifiable rewards.

Hieu Trung Nguyen, Bao Nguyen, Wenao Ma + 3 more2026-03-06💻 cs

Position: Beyond Model-Centric Prediction -- Agentic Time Series Forecasting

This paper proposes Agentic Time Series Forecasting (ATSF), a paradigm shift from traditional static, model-centric prediction to a dynamic, iterative process driven by agentic workflows that integrate perception, planning, action, reflection, and memory to enable adaptive and continual forecasting.

Mingyue Cheng, Xiaoyu Tao, Qi Liu + 2 more2026-03-06💻 cs

← Previous Next →