cs.CL papers | Gist.Science

From Word to World: Can Large Language Models be Implicit Text-based World Models?

This paper proposes a three-level framework to evaluate large language models as implicit text-based world models, demonstrating that while they can enhance agent learning through coherent state prediction and synthetic experience generation, their effectiveness is critically dependent on behavioral coverage and environment complexity.

Yixia Li, Hongru Wang, Jiahao Qiu + 7 more2026-03-06💻 cs

Parallel Token Prediction for Language Models

This paper introduces Parallel Token Prediction (PTP), a framework that accelerates language model inference by predicting multiple tokens in a single forward pass through deterministic functions of random input variables, achieving a 2.4x speedup over autoregressive decoding.

Felix Draxler, Justus Will, Farrin Marouf Sofian + 3 more2026-03-06💻 cs

When Do Tools and Planning Help Large Language Models Think? A Cost- and Latency-Aware Benchmark

This paper presents a cost- and latency-aware benchmark demonstrating that while tool-augmented planning significantly improves accuracy for complex knowledge-intensive tasks like Event-QA, it often incurs prohibitive latency costs and offers no benefit—or even degrades performance—for tasks like persuasive response generation where simple one-shot prompting is more efficient.

Subha Ghoshal, Ali Al-Bustami2026-03-06💻 cs

Identifying Good and Bad Neurons for Task-Level Controllable LLMs

The paper proposes NeuronLLM, a novel framework that improves task-level controllability in Large Language Models by identifying both facilitative "good" and inhibitive "bad" neurons through contrastive learning and augmented question sets to overcome the limitations of existing ability-specific methods.

Wenjie Li, Guansong Pang, Hezhe Qiao + 2 more2026-03-06💻 cs

F-Actor: Controllable Conversational Behaviour in Full-Duplex Models

This paper introduces F-Actor, the first open, instruction-following full-duplex conversational speech model that efficiently adapts to dynamic contexts and controls various dialogue behaviors through a single-stage training protocol requiring only 2,000 hours of data.

Maike Züfle, Ondrej Klejch, Nicholas Sanders + 3 more2026-03-06💻 cs

The unreasonable effectiveness of pattern matching

This paper demonstrates that large language models can effectively recover meaning from nonsense text by relying on structural pattern matching, arguing that this capability is a fundamental component of intelligence rather than mere mimicry.

Gary Lupyan, Blaise Agüera y Arcas2026-03-06💻 cs

Yuan3.0 Ultra: A Trillion-Parameter Enterprise-Oriented MoE LLM

This paper introduces Yuan3.0 Ultra, an open-source, trillion-parameter Mixture-of-Experts large language model that utilizes a novel Layer-Adaptive Expert Pruning algorithm to significantly improve pre-training efficiency and reduce model size while achieving state-of-the-art performance on enterprise-oriented benchmarks.

YuanLab. ai, :, Shawn Wu + 25 more2026-03-06💻 cs

Where is the multimodal goal post? On the Ability of Foundation Models to Recognize Contextually Important Moments

This paper introduces a new dataset derived from football highlight reels to evaluate foundation models' ability to identify contextually important video moments, revealing that current state-of-the-art models perform near chance levels due to their reliance on single dominant modalities and failure to effectively synthesize cross-modal information.

Aditya K Surikuchi, Raquel Fernández, Sandro Pezzelle2026-03-06💻 cs

Self-Distilled Reasoner: On-Policy Self-Distillation for Large Language Models

This paper introduces On-Policy Self-Distillation (OPSD), a framework where a single large language model acts as both teacher and student by leveraging privileged reasoning traces to supervise its own weaker policy, thereby achieving superior mathematical reasoning performance and significantly higher token efficiency compared to traditional off-policy distillation and reinforcement learning methods.

Siyan Zhao, Zhihui Xie, Mengchen Liu + 4 more2026-03-06💻 cs

Adaptive Rollout Allocation for Online Reinforcement Learning with Verifiable Rewards

This paper introduces VIP, a Variance-Informed Predictive allocation strategy that dynamically optimizes rollout distribution across training prompts using Gaussian process-based variance estimation to minimize gradient variance and significantly improve sampling efficiency in online reinforcement learning with verifiable rewards.

Hieu Trung Nguyen, Bao Nguyen, Wenao Ma + 3 more2026-03-06💻 cs

LatentChem: From Textual CoT to Latent Thinking in Chemical Reasoning

LatentChem introduces a latent reasoning interface that decouples chemical computation from textual generation, enabling models to perform multi-step reasoning in continuous latent space which spontaneously emerges as a more efficient and accurate alternative to explicit Chain-of-Thought, achieving a 59.88% win rate and 10.84 $\times$ speedup over baselines.

Xinwu Ye, Yicheng Mao, Jia Zhang + 16 more2026-03-06🔬 physics

The Convergence of Schema-Guided Dialogue Systems and the Model Context Protocol

This paper argues that Schema-Guided Dialogue and the Model Context Protocol converge into a unified paradigm for deterministic LLM-agent interaction, proposing five foundational schema design principles to address critical gaps in failure handling and tool relationships while enabling scalable AI governance.

Andreas Schlapbach2026-03-06💻 cs

Assessing Risks of Large Language Models in Mental Health Support: A Framework for Automated Clinical AI Red Teaming

This paper introduces a simulation-based clinical red teaming framework that pairs AI psychotherapists with dynamic patient agents to evaluate mental health support systems, revealing critical safety gaps such as the validation of delusions and failure to de-escalate suicide risk in AI agents tested against Alcohol Use Disorder scenarios.

Ian Steenstra, Paola Pedrelli, Weiyan Shi + 2 more2026-03-06💻 cs

Jailbreak Foundry: From Papers to Runnable Attacks for Reproducible Benchmarking

The paper introduces Jailbreak Foundry (JBF), a multi-agent system that automatically translates jailbreak research papers into executable modules within a unified harness, enabling rapid, reproducible, and standardized benchmarking of large language model security against rapidly evolving attack techniques.

Zhicheng Fang, Jingjie Zheng, Chenxu Fu, Wei Xu2026-03-06🔒 cs.CR

Learn Hard Problems During RL with Reference Guided Fine-tuning

This paper introduces Reference-Guided Fine-Tuning (ReGFT), a method that synthesizes model-aligned positive trajectories using partial human reference solutions to overcome reward sparsity and significantly enhance the performance and training efficiency of reinforcement learning for mathematical reasoning.

Yangzhen Wu, Shanda Li, Zixin Wen + 5 more2026-03-06💻 cs

VoxKnesset: A Large-Scale Longitudinal Hebrew Speech Dataset for Aging Speaker Modeling

This paper introduces VoxKnesset, a large-scale open-access dataset of 2,300 hours of longitudinal Hebrew parliamentary speech spanning 2009–2025, which is used to benchmark and demonstrate the challenges of speaker verification and age prediction over time, revealing significant performance degradation in standard models as speakers age.

Yanir Marmor, Arad Zulti, David Krongauz + 4 more2026-03-06💻 cs

FreeAct: Freeing Activations for LLM Quantization

FreeAct is a novel quantization framework that improves Large Language Model performance by relaxing rigid one-to-one transformation constraints to dynamically allocate token-specific activation transformations, thereby addressing the distinct distribution patterns in diffusion and multimodal models.

Xiaohao Liu, Xiaobo Xia, Manyi Zhang + 6 more2026-03-06💻 cs

Incremental Graph Construction Enables Robust Spectral Clustering of Texts

This paper introduces an incremental $k$ -NN graph construction method that guarantees connectivity by design, thereby enabling robust spectral clustering of text embeddings that outperforms standard approaches in low- $k$ regimes where disconnected components typically degrade performance.

Marko Pranjić, Boshko Koloski, Nada Lavrač + 2 more2026-03-06💻 cs

A theoretical model of dynamical grammatical gender shifting based on set-valued set function

This paper proposes a mathematical framework based on a set-valued set function within a Template-Based and Modular Cognitive model to formally explain the nonlinear dynamics of grammatical gender shifting and noun-to-noun derivation, using empirical data from Riffian to challenge conventional views on word formation.

Mohamed El Idrissi2026-03-06💻 cs

Why Are Linear RNNs More Parallelizable?

This paper establishes a theoretical foundation for the superior parallelizability of linear RNNs by demonstrating that they correspond to log-depth arithmetic circuits ( $\mathsf{NC}^1$ -complete), whereas nonlinear RNNs are fundamentally limited by their ability to solve $\mathsf{L}$ - and $\mathsf{P}$ -complete problems, thereby explaining why linear variants can be efficiently parallelized like transformers while traditional nonlinear RNNs cannot.

William Merrill, Hongjian Jiang, Yanhong Li + 2 more2026-03-06💻 cs

← Previous Next →