cs.LG papers | Gist.Science

Controlled LLM Training on Spectral Sphere

The paper introduces the Spectral Sphere Optimizer (SSO), a novel parallel training algorithm that enforces strict module-wise spectral constraints on both weights and updates to achieve full Maximal Update Parametrization alignment, resulting in superior convergence, stability, and performance across diverse large-scale architectures compared to AdamW and Muon.

Tian Xie, Haoming Luo, Haoyu Tang + 9 more2026-03-06💻 cs

BPE: Behavioral Profiling Ensemble

The paper proposes the Behavioral Profiling Ensemble (BPE) framework, a model-centric approach that constructs intrinsic behavioral profiles for base learners to dynamically derive aggregation weights, demonstrating superior predictive accuracy and efficiency over state-of-the-art Dynamic Ensemble Selection methods across 42 real-world datasets.

Yanxin Liu, Yunqi Zhang2026-03-06💻 cs

EmboTeam: Grounding LLM Reasoning into Reactive Behavior Trees via PDDL for Embodied Multi-Robot Collaboration

EmboTeam is a novel framework that enhances embodied multi-robot collaboration by cascading LLM-based instruction parsing into formal PDDL planning and reactive behavior tree execution, achieving significantly higher task success rates on the new MACE-THOR benchmark compared to existing baselines.

Haishan Zeng, Mengna Wang, Peng Li2026-03-06💻 cs

ButterflyMoE: Sub-Linear Ternary Experts via Structured Butterfly Orbits

ButterflyMoE achieves sub-linear memory scaling for Mixture-of-Experts models on edge devices by representing diverse experts as geometric rotations of a shared ternary substrate, enabling a 150 $\times$ memory reduction with negligible accuracy loss.

Aryan Karmore2026-03-06💻 cs

Yuan3.0 Ultra: A Trillion-Parameter Enterprise-Oriented MoE LLM

This paper introduces Yuan3.0 Ultra, an open-source, trillion-parameter Mixture-of-Experts large language model that utilizes a novel Layer-Adaptive Expert Pruning algorithm to significantly improve pre-training efficiency and reduce model size while achieving state-of-the-art performance on enterprise-oriented benchmarks.

YuanLab. ai, :, Shawn Wu + 25 more2026-03-06💻 cs

Agentic Very Long Video Understanding

This paper introduces EGAgent, an agentic framework that leverages entity scene graphs and hybrid search tools to enable state-of-the-art compositional reasoning and recall over continuous, multi-day egocentric video streams, addressing the limitations of existing models in long-horizon video understanding.

Aniket Rege, Arka Sadhu, Yuliang Li + 5 more2026-03-06💻 cs

Self-Distilled Reasoner: On-Policy Self-Distillation for Large Language Models

This paper introduces On-Policy Self-Distillation (OPSD), a framework where a single large language model acts as both teacher and student by leveraging privileged reasoning traces to supervise its own weaker policy, thereby achieving superior mathematical reasoning performance and significantly higher token efficiency compared to traditional off-policy distillation and reinforcement learning methods.

Siyan Zhao, Zhihui Xie, Mengchen Liu + 4 more2026-03-06💻 cs

A Scalable Inter-edge Correlation Modeling in CopulaGNN for Link Sign Prediction

This paper proposes a scalable extension of CopulaGNN for link sign prediction that overcomes computational intractability by representing edge correlations via a Gramian of edge embeddings and reformulating conditional probabilities, thereby achieving linear convergence and competitive performance on signed graphs.

Jinkyu Sung, Myunggeum Jee, Joonseok Lee2026-03-06💻 cs

Improved Convergence Rates of Muon Optimizer for Nonconvex Optimization

This paper establishes sharper convergence guarantees for the Muon optimizer by providing a direct, simplified analysis that achieves faster convergence rates under broader problem settings than existing restrictive theoretical frameworks.

Shuntaro Nagashima, Hideaki Iiduka2026-03-06🔢 math

Latent-IMH: Efficient Bayesian Inference for Inverse Problems with Approximate Operators

This paper introduces Latent-IMH, an efficient Bayesian sampling method for inverse problems with expensive operators that leverages cost-effective approximations to shift computational costs to an offline phase, achieving significantly faster performance than state-of-the-art methods like NUTS.

Youguang Chen, George Biros2026-03-06🔢 math

Mobility-Embedded POIs: Learning What A Place Is and How It Is Used from Human Movement

This paper introduces Mobility-Embedded POIs (ME-POIs), a framework that enhances general-purpose point-of-interest representations by integrating large-scale human mobility data with language model embeddings to capture both place identity and real-world usage functions, thereby outperforming existing text-only and mobility-only baselines across diverse map enrichment tasks.

Maria Despoina Siampou, Shushman Choudhury, Shang-Ling Hsu + 2 more2026-03-06💻 cs

YuriiFormer: A Suite of Nesterov-Accelerated Transformers

This paper proposes a variational framework that interprets transformer layers as optimization iterations, enabling the design of a Nesterov-accelerated transformer architecture that outperforms standard baselines on language modeling tasks.

Aleksandr Zimin, Yury Polyanskiy, Philippe Rigollet2026-03-06🔢 math

MiTA Attention: Efficient Fast-Weight Scaling via a Mixture of Top-k Activations

This paper proposes MiTA attention, a unified framework that efficiently scales fast weights in Transformers by compressing the N-width MLP into a narrower one and constructing deformable experts via a Mixture of Top-k Activations strategy, thereby enabling effective handling of extremely long sequences.

Qishuai Wen, Zhiyuan Huang, Xianghan Meng + 2 more2026-03-06💻 cs

Adaptive Rollout Allocation for Online Reinforcement Learning with Verifiable Rewards

This paper introduces VIP, a Variance-Informed Predictive allocation strategy that dynamically optimizes rollout distribution across training prompts using Gaussian process-based variance estimation to minimize gradient variance and significantly improve sampling efficiency in online reinforcement learning with verifiable rewards.

Hieu Trung Nguyen, Bao Nguyen, Wenao Ma + 3 more2026-03-06💻 cs

Position: Beyond Model-Centric Prediction -- Agentic Time Series Forecasting

This paper proposes Agentic Time Series Forecasting (ATSF), a paradigm shift from traditional static, model-centric prediction to a dynamic, iterative process driven by agentic workflows that integrate perception, planning, action, reflection, and memory to enable adaptive and continual forecasting.

Mingyue Cheng, Xiaoyu Tao, Qi Liu + 2 more2026-03-06💻 cs

On the Non-Identifiability of Steering Vectors in Large Language Models

This paper demonstrates that steering vectors in large language models are fundamentally non-identifiable, as numerous distinct interventions—including orthogonal perturbations—produce behaviorally indistinguishable results, thereby revealing inherent limits in interpreting these vectors as unique internal representations without additional structural constraints.

Sohan Venkatesh, Ashish Mahendran Kurapath2026-03-06💻 cs

LatentChem: From Textual CoT to Latent Thinking in Chemical Reasoning

LatentChem introduces a latent reasoning interface that decouples chemical computation from textual generation, enabling models to perform multi-step reasoning in continuous latent space which spontaneously emerges as a more efficient and accurate alternative to explicit Chain-of-Thought, achieving a 59.88% win rate and 10.84 $\times$ speedup over baselines.

Xinwu Ye, Yicheng Mao, Jia Zhang + 16 more2026-03-06🔬 physics

Beyond the Unit Hypersphere: Embedding Magnitude in Contrastive Learning

This paper demonstrates that systematically learning embedding magnitudes by independently controlling query and document normalization significantly improves retrieval and RAG performance—particularly in out-of-domain scenarios—by revealing that magnitude encodes distinct, beneficial roles for queries and documents that are lost when assuming magnitude is noise.

Xincan Feng, Taro Watanabe2026-03-06💻 cs

Supervised Metric Regularization Through Alternating Optimization for Multi-Regime Physics-Informed Neural Networks

This paper introduces Topology-Aware PINNs (TAPINN), a novel framework that employs supervised metric regularization and alternating optimization to effectively resolve spectral bias and mode collapse in multi-regime physics-informed neural networks, achieving superior convergence stability and accuracy compared to standard and hypernetwork-based baselines.

Enzo Nicolas Spotorno, Josafat Ribeiro Leal, Antonio Augusto Frohlich2026-03-06🔬 physics

Empirical Stability Analysis of Kolmogorov-Arnold Networks in Hard-Constrained Recurrent Physics-Informed Discovery

This paper empirically demonstrates that while Kolmogorov-Arnold Networks (KANs) can compete with MLPs on simple univariate residuals in hard-constrained recurrent physics-informed architectures, they suffer from severe hyperparameter fragility, instability in deeper configurations, and consistent failure on multiplicative terms, ultimately revealing limitations in their additive inductive bias for modeling state coupling in oscillatory systems.

Enzo Nicolas Spotorno, Josafat Leal Filho, Antonio Augusto Medeiros Frohlich2026-03-06🔬 physics

← Previous Next →