cs.AI papers | Gist.Science

Yuan3.0 Ultra: A Trillion-Parameter Enterprise-Oriented MoE LLM

This paper introduces Yuan3.0 Ultra, an open-source, trillion-parameter Mixture-of-Experts large language model that utilizes a novel Layer-Adaptive Expert Pruning algorithm to significantly improve pre-training efficiency and reduce model size while achieving state-of-the-art performance on enterprise-oriented benchmarks.

YuanLab. ai, :, Shawn Wu + 25 more2026-03-06💻 cs

Where is the multimodal goal post? On the Ability of Foundation Models to Recognize Contextually Important Moments

This paper introduces a new dataset derived from football highlight reels to evaluate foundation models' ability to identify contextually important video moments, revealing that current state-of-the-art models perform near chance levels due to their reliance on single dominant modalities and failure to effectively synthesize cross-modal information.

Aditya K Surikuchi, Raquel Fernández, Sandro Pezzelle2026-03-06💻 cs

A Scalable Inter-edge Correlation Modeling in CopulaGNN for Link Sign Prediction

This paper proposes a scalable extension of CopulaGNN for link sign prediction that overcomes computational intractability by representing edge correlations via a Gramian of edge embeddings and reformulating conditional probabilities, thereby achieving linear convergence and competitive performance on signed graphs.

Jinkyu Sung, Myunggeum Jee, Joonseok Lee2026-03-06💻 cs

Mobility-Embedded POIs: Learning What A Place Is and How It Is Used from Human Movement

This paper introduces Mobility-Embedded POIs (ME-POIs), a framework that enhances general-purpose point-of-interest representations by integrating large-scale human mobility data with language model embeddings to capture both place identity and real-world usage functions, thereby outperforming existing text-only and mobility-only baselines across diverse map enrichment tasks.

Maria Despoina Siampou, Shushman Choudhury, Shang-Ling Hsu + 2 more2026-03-06💻 cs

PerfGuard: A Performance-Aware Agent for Visual Content Generation

PerfGuard is a performance-aware agent framework for visual content generation that enhances task planning and execution reliability by systematically modeling tool performance boundaries through Performance-Aware Selection Modeling, Adaptive Preference Update, and Capability-Aligned Planning Optimization.

Zhipeng Chen, Zhongrui Zhang, Chao Zhang + 5 more2026-03-06💻 cs

YuriiFormer: A Suite of Nesterov-Accelerated Transformers

This paper proposes a variational framework that interprets transformer layers as optimization iterations, enabling the design of a Nesterov-accelerated transformer architecture that outperforms standard baselines on language modeling tasks.

Aleksandr Zimin, Yury Polyanskiy, Philippe Rigollet2026-03-06🔢 math

Replacing Parameters with Preferences: Federated Alignment of Heterogeneous Vision-Language Models

This paper proposes MoR, a federated alignment framework that replaces parameter exchange with preference-based learning using a Mixture-of-Rewards mechanism and GRPO to effectively align heterogeneous Vision-Language Models while preserving data privacy and accommodating diverse client constraints.

Shule Lu, Yujing Wang, Hainan Zhang + 5 more2026-03-06💻 cs

Adaptive Rollout Allocation for Online Reinforcement Learning with Verifiable Rewards

This paper introduces VIP, a Variance-Informed Predictive allocation strategy that dynamically optimizes rollout distribution across training prompts using Gaussian process-based variance estimation to minimize gradient variance and significantly improve sampling efficiency in online reinforcement learning with verifiable rewards.

Hieu Trung Nguyen, Bao Nguyen, Wenao Ma + 3 more2026-03-06💻 cs

Towards Exploratory and Focused Manipulation with Bimanual Active Perception: A New Problem, Benchmark and Strategy

This paper introduces the Exploratory and Focused Manipulation (EFM) problem to address visual occlusion in robot manipulation, proposing the EFM-10 benchmark and a Bimanual Active Perception (BAP) strategy that effectively leverages dual-arm coordination for active vision and force sensing.

Yuxin He, Ruihao Zhang, Tianao Shen + 2 more2026-03-06💻 cs

On the Non-Identifiability of Steering Vectors in Large Language Models

This paper demonstrates that steering vectors in large language models are fundamentally non-identifiable, as numerous distinct interventions—including orthogonal perturbations—produce behaviorally indistinguishable results, thereby revealing inherent limits in interpreting these vectors as unique internal representations without additional structural constraints.

Sohan Venkatesh, Ashish Mahendran Kurapath2026-03-06💻 cs

LatentChem: From Textual CoT to Latent Thinking in Chemical Reasoning

LatentChem introduces a latent reasoning interface that decouples chemical computation from textual generation, enabling models to perform multi-step reasoning in continuous latent space which spontaneously emerges as a more efficient and accurate alternative to explicit Chain-of-Thought, achieving a 59.88% win rate and 10.84 $\times$ speedup over baselines.

Xinwu Ye, Yicheng Mao, Jia Zhang + 16 more2026-03-06🔬 physics

Supervised Metric Regularization Through Alternating Optimization for Multi-Regime Physics-Informed Neural Networks

This paper introduces Topology-Aware PINNs (TAPINN), a novel framework that employs supervised metric regularization and alternating optimization to effectively resolve spectral bias and mode collapse in multi-regime physics-informed neural networks, achieving superior convergence stability and accuracy compared to standard and hypernetwork-based baselines.

Enzo Nicolas Spotorno, Josafat Ribeiro Leal, Antonio Augusto Frohlich2026-03-06🔬 physics

Empirical Stability Analysis of Kolmogorov-Arnold Networks in Hard-Constrained Recurrent Physics-Informed Discovery

This paper empirically demonstrates that while Kolmogorov-Arnold Networks (KANs) can compete with MLPs on simple univariate residuals in hard-constrained recurrent physics-informed architectures, they suffer from severe hyperparameter fragility, instability in deeper configurations, and consistent failure on multiplicative terms, ultimately revealing limitations in their additive inductive bias for modeling state coupling in oscillatory systems.

Enzo Nicolas Spotorno, Josafat Leal Filho, Antonio Augusto Medeiros Frohlich2026-03-06🔬 physics

Learning to Select Like Humans: Explainable Active Learning for Medical Imaging

This paper proposes an explainability-guided active learning framework that improves medical image analysis by strategically selecting samples based on both predictive uncertainty and attention misalignment with expert-defined regions, thereby achieving superior data efficiency and clinical interpretability compared to traditional methods.

Ifrat Ikhtear Uddin, Longwei Wang, Xiao Qin + 2 more2026-03-06💻 cs

Pailitao-VL: Unified Embedding and Reranker for Real-Time Multi-Modal Industrial Search

Pailitao-VL is a unified multi-modal retrieval system that achieves state-of-the-art, real-time industrial search performance by replacing traditional contrastive embeddings with an absolute ID-recognition paradigm and evolving reranking into a compare-and-calibrate listwise policy, thereby overcoming granularity, noise, and latency challenges in large-scale production environments.

Lei Chen, Chen Ju, Xu Chen + 13 more2026-03-06💻 cs

Zombie Agents: Persistent Control of Self-Evolving LLM Agents via Self-Reinforcing Injections

This paper introduces "Zombie Agents," a persistent black-box attack on self-evolving LLM agents that covertly implants payloads into long-term memory during benign sessions to survive across interactions and trigger unauthorized actions in future sessions, demonstrating that current per-session defenses are insufficient against such memory-based compromises.

Xianglin Yang, Yufei He, Shuo Ji, Bryan Hooi, Jin Song Dong2026-03-06🔒 cs.CR

SubQuad: Near-Quadratic-Free Structure Inference with Distribution-Balanced Objectives in Adaptive Receptor framework

SubQuad is an end-to-end pipeline that overcomes the computational and data imbalance bottlenecks in large-scale adaptive immune repertoire analysis by integrating near-subquadratic MinHash retrieval, GPU-accelerated affinity kernels, and fairness-constrained clustering to enable scalable, bias-aware discovery of clinically relevant clonotypes.

Rong Fu, Zijian Zhang, Kun Liu + 3 more2026-03-06💻 cs

Curriculum Learning for Efficient Chain-of-Thought Distillation via Structure-Aware Masking and GRPO

This paper proposes a three-stage curriculum learning framework that leverages structure-aware masking and Group Relative Policy Optimization (GRPO) to efficiently distill Chain-of-Thought reasoning into compact student models, achieving significant accuracy gains and output length reduction on GSM8K by progressively guiding the model from structural understanding to self-optimized brevity and targeted knowledge internalization.

Bowen Yu, Maolin Wang, Sheng Zhang + 7 more2026-03-06💻 cs

The Convergence of Schema-Guided Dialogue Systems and the Model Context Protocol

This paper argues that Schema-Guided Dialogue and the Model Context Protocol converge into a unified paradigm for deterministic LLM-agent interaction, proposing five foundational schema design principles to address critical gaps in failure handling and tool relationships while enabling scalable AI governance.

Andreas Schlapbach2026-03-06💻 cs

Give Users the Wheel: Towards Promptable Recommendation Paradigm

This paper proposes Decoupled Promptable Sequential Recommendation (DPR), a model-agnostic framework that enables conventional sequential recommenders to dynamically steer retrieval using natural language prompts by modulating latent user representations through a specialized fusion module, Mixture-of-Experts architecture, and a three-stage training strategy, thereby achieving superior performance in intent-driven tasks without sacrificing collaborative filtering efficiency.

Fuyuan Lyu, Chenglin Luo, Qiyuan Zhang + 6 more2026-03-06💻 cs

← Previous Next →