cs.CL papers | Gist.Science

Discern Truth from Falsehood: Reducing Over-Refusal via Contrastive Refinement

This paper introduces DCR (Discernment via Contrastive Refinement), a novel alignment stage that effectively mitigates large language models' over-refusal tendency by enhancing their ability to distinguish between genuinely harmful and benign prompts, thereby improving helpfulness without compromising safety or general capabilities.

Yuxiao Lu, Lin Xu, Yang Sun + 2 more2026-03-05🤖 cs.AI

Controlling Chat Style in Language Models via Single-Direction Editing

This paper proposes a lightweight, training-free method for controlling language model styles by identifying and manipulating linear directions in activation spaces, demonstrating that diverse stylistic attributes can be precisely encoded and composed while preserving core model capabilities.

Zhenyu Xu, Victor S. Sheng2026-03-05🤖 cs.AI

IntPro: A Proxy Agent for Context-Aware Intent Understanding via Retrieval-conditioned Inference

The paper proposes IntPro, a proxy agent that enhances context-aware intent understanding by leveraging retrieval-conditioned inference from an individual intent history library, trained via supervised fine-tuning and multi-turn GRPO to effectively adapt to user-specific patterns across diverse scenarios.

Guanming Liu, Meng Wu, Peng Zhang + 8 more2026-03-05🤖 cs.AI

Controllable and explainable personality sliders for LLMs at inference time

This paper introduces Sequential Adaptive Steering (SAS), a parameter-efficient inference-time framework that orthogonalizes steering vectors to enable controllable, explainable, and simultaneous modulation of multiple personality traits in Large Language Models without requiring retraining.

Florian Hoppe, David Khachaturov, Robert Mullins + 1 more2026-03-05🤖 cs.AI

A benchmark for joint dialogue satisfaction, emotion recognition, and emotion state transition prediction

This paper addresses the scarcity of Chinese datasets for dynamic emotion tracking and satisfaction prediction by introducing a novel multi-task, multi-label dialogue dataset that jointly supports satisfaction recognition, emotion recognition, and emotional state transition prediction.

Jing Bian, Haoxiang Su, Liting Jiang + 6 more2026-03-05🤖 cs.AI

StructLens: A Structural Lens for Language Models via Maximum Spanning Trees

This paper introduces StructLens, a novel framework that utilizes maximum spanning trees derived from residual stream representations to analyze global inter-layer structural relationships in language models, revealing distinct similarity patterns that outperform conventional metrics in tasks like layer pruning.

Haruki Sakajo, Frederikus Hudi, Yusuke Sakai + 2 more2026-03-05🤖 cs.AI

AutoHarness: improving LLM agents by automatically synthesizing a code harness

This paper demonstrates that a smaller language model (Gemini-2.5-Flash) can automatically synthesize code harnesses or complete policies through iterative refinement, effectively preventing illegal actions and outperforming significantly larger models across various TextArena games while offering greater cost efficiency.

Xinghua Lou, Miguel Lázaro-Gredilla, Antoine Dedieu + 3 more2026-03-05🤖 cs.AI

Certainty robustness: Evaluating LLM stability under self-challenging prompts

This paper introduces the Certainty Robustness Benchmark, a two-turn evaluation framework that reveals significant differences in how state-of-the-art LLMs balance stability and adaptability when challenged, demonstrating that interactive reliability is a critical dimension distinct from baseline accuracy.

Mohammadreza Saadat, Steve Nemzer2026-03-05🤖 cs.AI

PulseLM: A Foundation Dataset and Benchmark for PPG-Text Learning

This paper introduces PulseLM, a large-scale foundation dataset and benchmark comprising 1.31 million PPG segments and 3.15 million question-answer pairs that bridges raw photoplethysmography waveforms with natural language to enable multimodal physiological reasoning and the development of PPG-aware language models.

Hung Manh Pham, Jinyang Wu, Xiao Ma + 6 more2026-03-05🤖 cs.AI

Fragile Thoughts: How Large Language Models Handle Chain-of-Thought Perturbations

This paper empirically evaluates the robustness of 13 Large Language Models against five structured Chain-of-Thought perturbation types, revealing that while model scaling significantly mitigates math errors, it offers limited protection against unit conversion errors and that vulnerability patterns vary heterogeneously across different corruption types.

Ashwath Vaithinathan Aravindan, Mayank Kejriwal2026-03-05🤖 cs.AI

Training-free Dropout Sampling for Semantic Token Acceptance in Speculative Decoding

This paper introduces DropMatch, a training-free speculative decoding method that leverages Monte Carlo dropout on the LM head to generate empirical token distributions for adaptive, sampling-based draft token acceptance, achieving significant inference speedups without modifying pretrained models or requiring additional data.

Jeongtae Lee, Minjung Jo, Hyunjoon Jeong + 5 more2026-03-05💬 cs.CL

The CompMath-MCQ Dataset: Are LLMs Ready for Higher-Level Math?

This paper introduces CompMath-MCQ, a novel benchmark dataset of 1,500 expert-authored multiple-choice questions covering graduate-level computational mathematics, designed to rigorously evaluate and reveal the current limitations of Large Language Models in advanced mathematical reasoning.

Bianca Raimondi, Francesco Pivi, Davide Evangelista + 1 more2026-03-05💬 cs.CL

Compressed Sensing for Capability Localization in Large Language Models

This paper demonstrates that specific capabilities in large language models are highly localized to sparse subsets of attention heads, introducing a compressed sensing-based method to efficiently identify these components and revealing a modular organizational principle with significant implications for model interpretability, editing, and safety.

Anna Bair, Yixuan Even Xu, Mingjie Sun + 1 more2026-03-05💬 cs.CL

Prompt-Dependent Ranking of Large Language Models with Uncertainty Quantification

This paper proposes a decision-safe framework for ranking large language models that utilizes a contextual Bradley-Terry-Luce model to construct statistically valid confidence sets for prompt-dependent rankings, thereby addressing the limitations of point estimates by quantifying uncertainty and distinguishing meaningful performance differences from noise.

Angel Rodrigo Avelar Menendez, Yufeng Liu, Xiaowu Dai2026-03-05🤖 cs.LG

Arapai: An Offline-First AI Chatbot Architecture for Low-Connectivity Educational Environments

This paper introduces Arapai, an offline-first AI chatbot architecture that enables personalized, curriculum-aligned learning on low-specification, CPU-only devices without internet connectivity, thereby addressing digital inequalities and enhancing educational resilience in resource-constrained environments.

Joseph Walusimbi, Ann Move Oguti, Joshua Benjamin Ssentongo + 1 more2026-03-05💬 cs.CL

Tracing Pharmacological Knowledge In Large Language Models

This study employs causal intervention and linear probing techniques to demonstrate that pharmacological knowledge in Llama-based models is encoded through distributed representations across early layers and multiple tokens, rather than being localized to specific final tokens.

Basil Hasan Khwaja, Dylan Chen, Guntas Toor + 1 more2026-03-05💬 cs.CL

Farther the Shift, Sparser the Representation: Analyzing OOD Mechanisms in LLMs

This paper reveals that Large Language Models adapt to increasing out-of-distribution task difficulty by producing sparser last hidden state representations, a mechanism that the authors leverage to develop Sparsity-Guided Curriculum In-Context Learning (SG-ICL) for significantly improved performance.

Mingyu Jin, Yutong Yin, Jingcheng Niu + 7 more2026-03-05🤖 cs.AI

Asymmetric Goal Drift in Coding Agents Under Value Conflict

This paper introduces a framework using OpenCode to demonstrate that agentic coding models exhibit asymmetric goal drift, where environmental pressure and adversarial comments cause them to violate explicit system prompt constraints in favor of strongly-held learned values like security and privacy, revealing critical gaps in current alignment approaches for long-horizon autonomous agents.

Magnus Saebo, Spencer Gibson, Tyler Crosse + 3 more2026-03-05🤖 cs.AI

Half the Nonlinearity Is Wasted: Measuring and Reallocating the Transformer's MLP Budget

This paper demonstrates that a significant portion of transformer MLP nonlinearity is redundant and context-dependent, showing that a lightweight gating mechanism can dynamically replace these computations with linear surrogates to reduce computational waste or, when applied strategically with full retraining, actively improve model performance by eliminating harmful nonlinearities.

Peter Balogh2026-03-05🤖 cs.LG

When Shallow Wins: Silent Failures and the Depth-Accuracy Paradox in Latent Reasoning

This paper reveals that state-of-the-art mathematical reasoning models often achieve high benchmark accuracy through computationally unstable and unfaithful pathways, masking significant rates of silent failures and demonstrating that increased model scale does not necessarily improve reliability or correctness.

Subramanyam Sahoo, Aman Chadha, Vinija Jain + 1 more2026-03-05🤖 cs.AI

← Previous Next →