cs.CL papers | Gist.Science

Improving Search Agent with One Line of Code

This paper introduces Search Agent Policy Optimization (SAPO), a method that resolves catastrophic training instability in Tool-based Agentic Reinforcement Learning by applying a conditional token-level KL constraint to prevent Importance Sampling Distribution Drift, achieving significant performance gains with only a single line of code modification to standard GRPO.

Jian Li, Dongsheng Chen, Zhenhua Xu, Yizhang Jin, Jiafu Wu, Chengjie Wang, Xiaotong Yuan, Yabiao WangThu, 12 Ma🤖 cs.LG

Dissecting Chronos: Sparse Autoencoders Reveal Causal Feature Hierarchies in Time Series Foundation Models

This paper pioneers the application of sparse autoencoders to the Chronos-T5 time series foundation model, revealing a depth-dependent causal hierarchy where mid-encoder features responsible for change detection are more critical to forecasting accuracy than the semantically rich but less causally influential features in the final encoder layer.

Anurag MishraThu, 12 Ma🤖 cs.LG

CLIPO: Contrastive Learning in Policy Optimization Generalizes RLVR

The paper introduces CLIPO, a method that integrates contrastive learning into policy optimization to generalize Reinforcement Learning with Verifiable Rewards (RLVR) by capturing invariant structures across correct reasoning paths, thereby mitigating hallucinations and improving the generalization and robustness of Large Language Models.

Sijia Cui, Pengyu Cheng, Jiajun Song, Yongbo Gai, Guojun Zhang, Zhechao Yu, Jianhe Lin, Xiaoxi Jiang, Guanjun JiangThu, 12 Ma🤖 cs.LG

Lost in the Middle at Birth: An Exact Theory of Transformer Position Bias

This paper argues that the "Lost in the Middle" phenomenon in large language models is an inherent geometric property of causal decoder architectures present at initialization, caused by the interplay of causal masking and residual connections that creates a structurally hostile "dead zone" in the middle of the context, a bias that persists even after standard pretraining.

Borun D ChowdhuryThu, 12 Ma🤖 cs.LG

The Prediction-Measurement Gap: Toward Meaning Representations as Scientific Instruments

This paper argues that to bridge the prediction-measurement gap in computational social science, text embeddings must be reoriented from prediction-focused optimization toward "scientific usability," prioritizing geometric legibility, interpretability, and robustness to enable reliable, traceable semantic inference.

Hubert PlisieckiThu, 12 Ma💬 cs.CL

The Generation-Recognition Asymmetry: Six Dimensions of a Fundamental Divide in Formal Language Theory

This paper proposes a unified framework for the generation-recognition asymmetry in formal language theory by identifying six distinct dimensions of divergence, challenging the oversimplified view that generation is inherently easy while parsing is hard, and exploring the implications of these operational differences for fields ranging from compiler design to large language models.

Romain PeyrichouThu, 12 Ma💬 cs.CL

Reason and Verify: A Framework for Faithful Retrieval-Augmented Generation

This paper proposes a domain-specific Retrieval-Augmented Generation framework that integrates explicit rationale generation with a fine-grained verification taxonomy to enhance faithfulness and reduce hallucinations in biomedical question answering, achieving competitive performance on BioASQ and PubMedQA benchmarks using a relatively small model.

Eeham Khan, Luis Rodriguez, Marc QueudotThu, 12 Ma💬 cs.CL

Lost in Backpropagation: The LM Head is a Gradient Bottleneck

This paper identifies the standard LM head as a critical optimization bottleneck where the dimensionality mismatch between hidden features and vocabulary size compresses 95–99% of gradient information, leading to suboptimal training dynamics and unlearnable patterns that necessitate new head architectures.

Nathan Godey, Yoav ArtziThu, 12 Ma💬 cs.CL

ReMix: Reinforcement routing for mixtures of LoRAs in LLM finetuning

This paper proposes ReMix, a novel Mixture-of-LoRAs framework that employs non-learnable routing weights and a Reinforce Leave-One-Out (RLOO) gradient estimator to prevent routing imbalance, thereby ensuring all active LoRAs contribute equally and significantly outperforming state-of-the-art parameter-efficient finetuning methods.

Ruizhong Qiu, Hanqing Zeng, Yinglong Xia, Yiwen Meng, Ren Chen, Jiarui Feng, Dongqi Fu, Qifan Wang, Jiayi Liu, Jun Xiao, Xiangjun Fan, Benyu Zhang, Hong Li, Zhining Liu, Hyunsik Yoo, Zhichen Zeng, Tianxin Wei, Hanghang TongThu, 12 Ma🤖 cs.LG

OpenClaw-RL: Train Any Agent Simply by Talking

OpenClaw-RL is an asynchronous framework that enables a single agent policy to continuously improve across diverse interaction domains (such as personal conversations, terminal, and GUI tasks) by simultaneously learning from universal next-state signals through both scalar rewards and token-level directional advantages derived via Hindsight-Guided On-Policy Distillation.

Yinjie Wang, Xuyang Chen, Xiaolong Jin, Mengdi Wang, Ling YangThu, 12 Ma💬 cs.CL

Calibration-Reasoning Framework for Descriptive Speech Quality Assessment

This paper introduces a calibration-reasoning framework that fine-tunes foundational Audio Large Language Models through a calibration stage and Group Relative Policy Optimization-based reinforcement learning to achieve state-of-the-art performance in multidimensional speech quality assessment, artifact localization, and MOS prediction.

Elizaveta Kostenok, Mathieu Salzmann, Milos CernakThu, 12 Ma⚡ eess

Video-Based Reward Modeling for Computer-Use Agents

This paper introduces the Execution Video Reward Model (ExeVRM), a scalable and model-agnostic framework that leverages a new 53k video-task-reward dataset and spatiotemporal token pruning to accurately assess computer-using agent trajectories from execution videos, outperforming leading proprietary models in task success prediction.

Linxin Song, Jieyu Zhang, Huanxin Sheng, Taiwei Shi, Gupta Rahul, Yang Liu, Ranjay Krishna, Jian Kang, Jieyu ZhaoThu, 12 Ma💬 cs.CL

Adaptive Activation Cancellation for Hallucination Mitigation in Large Language Models

This paper introduces Adaptive Activation Cancellation (AAC), a real-time, training-free inference framework that mitigates hallucinations in large language models by identifying and suppressing hallucination-associated neural activations as structured interference, thereby improving factual accuracy across multiple model scales without degrading general capabilities or fluency.

Eric Yocam, Varghese Vaidyan, Gurcan Comert, Paris Kalathas, Yong Wang, Judith L. MwakalongeThu, 12 Ma💬 cs.CL

ViDia2Std: A Parallel Corpus and Methods for Low-Resource Vietnamese Dialect-to-Standard Translation

This paper introduces ViDia2Std, the first manually annotated parallel corpus covering all 63 provinces of Vietnam to address dialectal variation in NLP, and demonstrates that dialect-to-standard translation significantly improves downstream task performance using state-of-the-art sequence-to-sequence models.

Khoa Anh Ta, Nguyen Van Dinh, Kiet Van NguyenThu, 12 Ma💬 cs.CL

Sabiá-4 Technical Report

This technical report introduces Sabi'a-4 and Sabiazinho-4, a new generation of Brazilian Portuguese language models featuring a 128K token context window and specialized training in legal and agentic tasks, which demonstrate superior cost-performance and capabilities in legal drafting, dialogue, and tool use compared to previous generations.

Thiago Laitz, Thales Sales Almeida, Hugo Abonizio, Roseval Malaquias Junior, Giovana Kerche Bonás, Marcos Piau, Celio Larcher, Ramon Pires, Rodrigo NogueiraThu, 12 Ma💬 cs.CL

S-GRADES -- Studying Generalization of Student Response Assessments in Diverse Evaluative Settings

This paper introduces S-GRADES, an open-source web-based benchmark that unifies 14 diverse student response assessment datasets to enable standardized evaluation and reveal generalization gaps across automated essay scoring and short answer grading tasks.

Tasfia Seuti, Sagnik Ray ChoudhuryThu, 12 Ma💬 cs.CL

GR-SAP: Generative Replay for Safety Alignment Preservation during Fine-Tuning

This paper proposes GR-SAP, a unified framework that preserves large language model safety alignment during fine-tuning by synthesizing domain-specific alignment data via generative replay, effectively mitigating safety degradation without requiring access to original alignment datasets.

Zhouxiang Fang, Jiawei Zhou, Hanjie ChenThu, 12 Ma💬 cs.CL

Is this Idea Novel? An Automated Benchmark for Judgment of Research Ideas

This paper introduces RINoBench, the first comprehensive benchmark for evaluating automated research idea novelty judgments, which reveals that while large language models can generate reasoning similar to human experts, they still struggle to produce accurate novelty assessments that align with human gold standards.

Tim Schopf, Michael FärberThu, 12 Ma💬 cs.CL

Large language models can disambiguate opioid slang on social media

This paper demonstrates that large language models significantly outperform traditional lexicon-based strategies in accurately disambiguating ambiguous opioid slang and identifying relevant social media posts across lexicon-based, lexicon-free, and emergent slang scenarios, thereby enhancing the monitoring of the opioid overdose crisis.

Kristy A. Carpenter, Issah A. Samori, Mathew V. Kiang, Keith Humphreys, Anna Lembke, Johannes C. Eichstaedt, Russ B. AltmanThu, 12 Ma💬 cs.CL

Mitigating Translationese Bias in Multilingual LLM-as-a-Judge via Disentangled Information Bottleneck

This paper introduces DIBJudge, a robust fine-tuning framework that mitigates translationese bias in multilingual LLM-as-a-Judge systems by using a disentangled information bottleneck to isolate spurious cross-lingual correlations from judgment-critical representations.

Hongbin Zhang, Kehai Chen, Xuefen Bai, Youcheng Pan, Yang Xiang, Jinpeng Wang, Min ZhangThu, 12 Ma💬 cs.CL

← Previous Next →