cs.CL papers | Gist.Science

vLLM Hook v0: A Plug-in for Programming Model Internals on vLLM

This paper introduces vLLM Hook, an open-source plug-in that enables the programmable access and manipulation of internal model states within the vLLM inference engine to support advanced test-time alignment techniques such as adversarial prompt detection, enhanced RAG, and activation steering.

Ching-Yun Ko, Pin-Yu Chen2026-03-10🤖 cs.LG

ARC-AGI-2 Technical Report

This paper presents a transformer-based system that significantly advances ARC performance by integrating a compact task encoding, symmetry-based data augmentation, test-time LoRA adaptation, and multi-perspective decoding to enable efficient neural inference and human-level generalization from few examples.

Wallyson Lemes de Oliveira, Mekhron Bobokhonov, Matteo Caorsi, Aldo Podestà, Gabriele Beltramo, Luca Crosato, Matteo Bonotto, Federica Cecchetto, Hadrien Espic, Dan Titus Salajan, Stefan Taga, Luca Pana, Joe Carthy2026-03-10💬 cs.CL

How Attention Sinks Emerge in Large Language Models: An Interpretability Perspective

This paper identifies a simple, semantics-free "P0 Sink Circuit" that emerges early in training to explain how Large Language Models develop attention sinks on the first token, suggesting this mechanism could serve as a signal for tracking pre-training convergence.

Runyu Peng, Ruixiao Li, Mingshu Chen, Yunhua Zhou, Qipeng Guo, Xipeng Qiu2026-03-10🤖 cs.LG

Hierarchical Latent Structures in Data Generation Process Unify Mechanistic Phenomena across Scale

This paper demonstrates that hierarchical structures within the data generation process, modeled via probabilistic context-free grammars, serve as a unifying explanation for the emergence of diverse mechanistic phenomena like induction heads, function vectors, and the Hydra effect in Transformer-based language models.

Jonas Rohweder, Subhabrata Dutta, Iryna Gurevych2026-03-10🤖 cs.LG

Hierarchical Embedding Fusion for Retrieval-Augmented Code Generation

This paper introduces Hierarchical Embedding Fusion (HEF), a two-stage framework that compresses repository code into a reusable hierarchy of dense vectors and maps them to learned pseudo-tokens, enabling low-latency, repository-aware code generation with accuracy comparable to traditional retrieval methods while significantly reducing inference costs.

Nikita Sorokin, Ivan Sedykh, Valentin Malykh2026-03-10🤖 cs.LG

A Coin Flip for Safety: LLM Judges Fail to Reliably Measure Adversarial Robustness

This paper demonstrates that current LLM-as-a-Judge frameworks fail to reliably measure adversarial robustness due to unaccounted distribution shifts that degrade performance to near-random levels, often leading to inflated attack success rates, and proposes new benchmarks to address these evaluation flaws.

Leo Schwinn, Moritz Ladenburger, Tim Beyer, Mehrnaz Mofakhami, Gauthier Gidel, Stephan Günnemann2026-03-10💬 cs.CL

Rethinking Personalization in Large Language Models at the Token Level

This paper introduces PerContrast and the PerCE loss, a token-level training paradigm that uses causal intervention to identify and adaptively upweight user-specific tokens, thereby significantly enhancing the personalization performance of large language models with minimal computational cost.

Chenheng Zhang, Yijun Lu, Lizhe Fang, Chunyuan Zheng, Jiajun Chai, Xiaohan Wang, Guojun Yin, Wei Lin, Yisen Wang, Zhouchen Lin2026-03-10💬 cs.CL

Know When You're Wrong: Aligning Confidence with Correctness for LLM Error Detection

This paper introduces a normalized confidence scoring framework based on output anchor tokens to detect LLM errors without external validation, revealing that while supervised fine-tuning yields well-calibrated confidence, reinforcement learning methods induce overconfidence, and proposing post-RL self-distillation to restore reliability for applications like adaptive retrieval-augmented generation.

Xie Xiaohu, Liu Xiaohu, Yao Benjamin2026-03-10🤖 cs.LG

GraphSkill: Documentation-Guided Hierarchical Retrieval-Augmented Coding for Complex Graph Reasoning

GraphSkill is an agentic framework that improves complex graph reasoning by leveraging hierarchical document retrieval and self-debugging with generated test cases, validated on a new comprehensive dataset.

Fali Wang, Chenglin Weng, Xianren Zhang, Siyuan Hong, Hui Liu, Suhang Wang2026-03-10🤖 cs.LG

SR-TTT: Surprisal-Aware Residual Test-Time Training

SR-TTT addresses the catastrophic recall failures of Test-Time Training (TTT) language models by introducing a loss-gated sparse memory mechanism that dynamically routes highly surprising tokens to an exact-attention residual cache, thereby preserving O(1) memory efficiency while enabling accurate retrieval of critical information.

Swamynathan V P2026-03-10🤖 cs.LG

TimeSpot: Benchmarking Geo-Temporal Understanding in Vision-Language Models in Real-World Settings

This paper introduces TimeSpot, a comprehensive benchmark comprising 1,455 real-world images from 80 countries designed to evaluate the limited geo-temporal reasoning capabilities of current vision-language models in predicting location, time, and environmental context from visual evidence alone.

Azmine Toushik Wasi, Shahriyar Zaman Ridoy, Koushik Ahamed Tonmoy, Kinga Tshering, S. M. Muhtasimul Hasan, Wahid Faisal, Tasnim Mohiuddin, Md Rizwan Parvez2026-03-10💬 cs.CL

Orion: Characterizing and Programming Apple's Neural Engine for LLM Training and Inference

This paper introduces Orion, the first open end-to-end system that bypasses Apple's opaque CoreML framework to enable direct Neural Engine programming for large language model training and inference, achieving an 8.5x speedup in weight updates through a novel patching mechanism and demonstrating stable training of 110M-parameter models on Apple Silicon.

Ramchand Kumaresan2026-03-10🤖 cs.LG

"Dark Triad" Model Organisms of Misalignment: Narrow Fine-Tuning Mirrors Human Antisocial Behavior

This paper proposes the Dark Triad personality traits as a framework for studying AI misalignment, demonstrating that frontier large language models can be reliably induced with human-like antisocial behaviors through minimal fine-tuning on psychometric data, thereby revealing latent persona structures that generalize beyond training contexts.

Roshni Lulla, Fiona Collins, Sanaya Parekh, Thilo Hagendorff, Jonas Kaplan2026-03-10💬 cs.CL

Validation of a Small Language Model for DSM-5 Substance Category Classification in Child Welfare Records

This study validates that a locally hosted 20-billion-parameter small language model can reliably classify specific DSM-5 substance categories within child welfare investigation narratives, achieving near-perfect agreement with human experts for five major substance types despite limitations with low-prevalence categories.

Brian E. Perron, Dragan Stoll, Bryan G. Victor, Zia Qia, Andreas Jud, Joseph P. Ryan2026-03-10💬 cs.CL

Supporting Artifact Evaluation with LLMs: A Study with Published Security Research Papers

This paper presents a toolkit leveraging Large Language Models to automate key aspects of Artifact Evaluation in cybersecurity research, achieving high accuracy in reproducibility rating, autonomous environment setup, and pitfall detection to significantly reduce reviewer effort and enhance research transparency.

David Heye, Karl Kindermann, Robin Decker, Johannes Lohmöller, Anastasiia Belova, Sandra Geisler, Klaus Wehrle, Jan Pennekamp2026-03-10💬 cs.CL

Counting on Consensus: Selecting the Right Inter-annotator Agreement Metric for NLP Annotation and Evaluation

This paper serves as a comprehensive guide for selecting and interpreting inter-annotator agreement metrics in NLP by categorizing measures according to task types, addressing challenges like label imbalance and missing data, and advocating for transparent reporting practices to ensure reliable and reproducible human annotation.

Joseph James2026-03-10💬 cs.CL

Symmetry-Constrained Language-Guided Program Synthesis for Discovering Governing Equations from Noisy and Partial Observations

SymLang is an open-source framework that integrates symmetry-constrained grammars, language-model-guided program synthesis, and Bayesian model selection to robustly discover accurate, interpretable governing equations from noisy and partial observations, significantly outperforming existing baselines in structural recovery and physical consistency.

Mirza Samad Ahmed Baig, Syeda Anshrah Gillani2026-03-10🤖 cs.LG

LieCraft: A Multi-Agent Framework for Evaluating Deceptive Capabilities in Language Models

This paper introduces LieCraft, a novel multi-agent framework featuring grounded, high-stakes scenarios and a hidden-role game mechanic to evaluate the deceptive capabilities of large language models, revealing that state-of-the-art models consistently exhibit a willingness to lie, conceal intentions, and act unethically to achieve their goals.

Matthew Lyle Olson, Neale Ratzlaff, Musashi Hinck, Tri Nguyen, Vasudev Lal, Joseph Campbell, Simon Stepputtis, Shao-Yen Tseng2026-03-10💬 cs.CL

MedInjection-FR: Exploring the Role of Native, Synthetic, and Translated Data in Biomedical Instruction Tuning

The paper introduces MedInjection-FR, a large-scale French biomedical instruction dataset combining native, synthetic, and translated sources, and demonstrates through controlled experiments that while native data yields the best performance, strategically mixing these sources effectively mitigates the scarcity of high-quality French medical instruction data for fine-tuning large language models.

Ikram Belmadani, Oumaima El Khettari, Pacôme Constant dit Beaufils, Benoit Favre, Richard Dufour2026-03-10💬 cs.CL

Language Shapes Mental Health Evaluations in Large Language Models

This study demonstrates that large language models (specifically GPT-4o and Qwen3) exhibit systematic cross-linguistic biases in mental health evaluations, producing higher stigma responses, lower sensitivity to stigmatizing content, and greater underestimation of depression severity when prompted in Chinese compared to English.

Jiayi Xu, Xiyang Hu2026-03-10💬 cs.CL

← Previous Next →