cs.CL papers | Gist.Science

Prism- $\Delta$ : Differential Subspace Steering for Prompt Highlighting in Large Language Models

PRISM- $\Delta$ is a novel prompt highlighting method that steers large language models by decomposing cross-covariance matrices to isolate discriminative signals and eliminate shared patterns, achieving superior performance across multiple benchmarks and long-context tasks while maintaining low computational overhead.

Yuyao Ge, Shenghua Liu, Yiwei Wang, Tianyu Liu, Baolong Bi, Lingrui Mei, Jiayu Yao, Jiafeng Guo, Xueqi ChengThu, 12 Ma💬 cs.CL

HeartAgent: An Autonomous Agent System for Explainable Differential Diagnosis in Cardiology

HeartAgent is an autonomous, cardiology-specific agent system that leverages specialized sub-agents and curated data to deliver explainable, high-accuracy differential diagnoses, significantly outperforming existing AI methods and enhancing clinical decision-making when assisting human experts.

Shuang Zhou, Kai Yu, Song Wang, Wenya Xie, Zaifu Zhan, Meng-Han Tsai, Yuen-Hei Chung, Shutong Hou, Huixue Zhou, Min Zeng, Bhavadharini Ramu, Lin Yee Chen, Feng Xie, Rui ZhangThu, 12 Ma💬 cs.CL

mAceReason-Math: A Dataset of High-Quality Multilingual Math Problems Ready For RLVR

The paper introduces mAceReason-Math, a high-quality multilingual dataset comprising over 10,000 challenging math problems per language across 14 languages, specifically curated and cleaned to advance Reinforcement Learning with Verifiable Rewards (RLVR) research beyond the current English-centric landscape.

Konstantin Dobler, Simon Lehnerer, Federico Scozzafava, Jonathan Janke, Mohamed AliThu, 12 Ma💬 cs.CL

Word Recovery in Large Language Models Enables Character-Level Tokenization Robustness

This paper reveals that large language models maintain robustness to character-level tokenization through a "word recovery" mechanism, where hidden states reconstruct canonical word identities via critical in-group attention among characters belonging to the same token.

Zhipeng Yang, Shu Yang, Lijie Hu, Di WangThu, 12 Ma💬 cs.CL

Large Language Models as Annotators for Machine Translation Quality Estimation

This paper proposes using GPT-4o to generate simplified MQM-style annotations via a specialized prompt (PPbMQM) to train a COMET model, achieving competitive machine translation quality estimation performance while mitigating the high inference costs of directly applying large language models.

Sidi Wang, Sophie Arnoult, Amir KamranThu, 12 Ma💬 cs.CL

Interpretable Chinese Metaphor Identification via LLM-Assisted MIPVU Rule Script Generation: A Comparative Protocol Study

This paper introduces an interpretable, LLM-assisted pipeline that operationalizes four distinct metaphor identification protocols as executable rule scripts for Chinese, demonstrating through a comparative study that the choice of protocol is the primary source of variation in identification results while achieving competitive performance with full transparency and reproducibility.

Weihang Huang, Mengna LiuThu, 12 Ma💬 cs.CL

LuxBorrow: From Pompier to Pompjee, Tracing Borrowing in Luxembourgish

This paper introduces LuxBorrow, a comprehensive corpus-based analysis of Luxembourgish news from 1999 to 2025 that reveals pervasive multilingual mixing dominated by French loanwords and morphological adaptations, advocating for a shift from document-level code-mixing metrics to token-level borrowing evaluation.

Nina Hosseini-Kivanani, Fred PhilippyThu, 12 Ma💬 cs.CL

Multilingual Reasoning Gym: Multilingual Scaling of Procedural Reasoning Environments

This paper introduces the Multilingual Reasoning Gym, a procedurally generated framework that extends the original Reasoning Gym to 14 languages with native-speaker validation, enabling the scalable creation of parallel, verifiable reasoning problems for training and evaluating multilingual models.

Konstantin Dobler, Simon Lehnerer, Federico Scozzafava, Jonathan Janke, Mohamed AliThu, 12 Ma💬 cs.CL

PivotAttack: Rethinking the Search Trajectory in Hard-Label Text Attacks via Pivot Words

PivotAttack introduces a query-efficient, "inside-out" framework for hard-label text attacks that leverages Multi-Armed Bandit algorithms to identify strategic "Pivot Sets" of interdependent tokens, thereby outperforming existing state-of-the-art methods in both attack success rate and query efficiency across various models.

Yuzhi Liang, Shiliang Xiao, Jingsong Wei, Qiliang Lin, Xia LiThu, 12 Ma💬 cs.CL

Towards Cold-Start Drafting and Continual Refining: A Value-Driven Memory Approach with Application to NPU Kernel Synthesis

The paper introduces EvoKernel, a self-evolving agentic framework that leverages value-driven memory and reinforcement learning to overcome data scarcity in NPU kernel synthesis, significantly improving model correctness and achieving substantial speedups through automated drafting and iterative refinement.

Yujie Zheng, Zhuo Li, Shengtao Zhang, Hanjing Wang, Junjie Sheng, Jiaqian Wang, Junchi Yan, Weinan Zhang, Ying Wen, Bo Tang, Muning WenThu, 12 Ma🤖 cs.LG

$V_{0.5}$ : Generalist Value Model as a Prior for Sparse RL Rollouts

The paper proposes $V_{0.5}$ , a novel method that dynamically fuses a Generalist Value Model's prior with sparse RL rollouts via real-time statistical testing to minimize baseline estimation error, thereby achieving faster convergence and over 10% performance gains on mathematical reasoning benchmarks compared to GRPO and DAPO.

Yi-Kai Zhang, Yueqing Sun, Hongyan Hao, Qi Gu, Xunliang Cai, De-Chuan Zhan, Han-Jia YeThu, 12 Ma🤖 cs.LG

SiDiaC-v.2.0: Sinhala Diachronic Corpus Version 2.0

SiDiaC-v.2.0 is the largest comprehensive diachronic corpus of the Sinhala language to date, comprising 244,000 words from 185 literary works spanning the 5th to the 20th centuries, which have been rigorously filtered, preprocessed, and categorized by genre to serve as a foundational resource for Sinhala NLP research.

Nevidu Jayatilleke, Nisansa de Silva, Uthpala Nimanthi, Gagani Kulathilaka, Azra Safrullah, Johan SofalasThu, 12 Ma💬 cs.CL

An Extreme Multi-label Text Classification (XMTC) Library Dataset: What if we took "Use of Practical AI in Digital Libraries" seriously?

This paper introduces a large bilingual (English/German) corpus of catalog records annotated with the Integrated Authority File (GND) and a machine-actionable GND taxonomy to enable ontology-aware multi-label classification and agent-assisted cataloging, aiming to develop transparent, authority-anchored AI tools that enhance the efficiency and scalability of subject indexing in digital libraries.

Jennifer D'Souza, Sameer Sadruddin, Maximilian Kähler, Andrea Salfinger, Luca Zaccagna, Francesca Incitti, Lauro Snidaro, Osma SuominenThu, 12 Ma💬 cs.CL

From Images to Words: Efficient Cross-Modal Knowledge Distillation to Language Models from Black-box Teachers

The paper introduces ARMADA, an efficient cross-modal knowledge distillation framework that transfers knowledge from large, potentially black-box vision-language models to language-only models without requiring teacher pre-training or internal access, thereby significantly improving performance across diverse natural language tasks.

Ayan Sengupta, Shantanu Dixit, Md Shad Akhtar, Tanmoy ChakrabortyThu, 12 Ma💬 cs.CL

GLM-OCR Technical Report

GLM-OCR is a compact 0.9B-parameter multimodal model that leverages a Multi-Token Prediction mechanism and a two-stage pipeline to achieve state-of-the-art efficiency and performance in real-world document understanding tasks, making it suitable for both edge and large-scale deployments.

Shuaiqi Duan, Yadong Xue, Weihan Wang, Zhe Su, Huan Liu, Sheng Yang, Guobing Gan, Guo Wang, Zihan Wang, Shengdong Yan, Dexin Jin, Yuxuan Zhang, Guohong Wen, Yanfeng Wang, Yutao Zhang, Xiaohan Zhang, Wenyi Hong, Yukuo Cen, Da Yin, Bin Chen, Wenmeng Yu, Xiaotao Gu, Jie TangThu, 12 Ma💬 cs.CL

LLM2Vec-Gen: Generative Embeddings from Large Language Models

LLM2Vec-Gen introduces a novel self-supervised framework that generates high-quality, interpretable text embeddings by training special tokens to represent an LLM's potential responses, thereby achieving state-of-the-art performance on MTEB while transferring safety and reasoning capabilities without requiring labeled data or a frozen backbone.

Parishad BehnamGhader, Vaibhav Adlakha, Fabian David Schmidt, Nicolas Chapados, Marius Mosbach, Siva ReddyThu, 12 Ma💬 cs.CL

TOSSS: a CVE-based Software Security Benchmark for Large Language Models

This paper introduces TOSSS, a CVE-based benchmark designed to evaluate the ability of Large Language Models to distinguish between secure and vulnerable code snippets in C/C++ and Java, revealing that current models achieve security scores ranging from 0.48 to 0.89.

Marc Damie, Murat Bilgehan Ertan, Domenico Essoussi, Angela Makhanu, Gaëtan Peter, Roos WensveenThu, 12 Ma🤖 cs.LG

A Systematic Study of Pseudo-Relevance Feedback with LLMs

This paper systematically disentangles the roles of feedback source and feedback model in LLM-based pseudo-relevance feedback, revealing that the choice of feedback model is critical and that the optimal feedback source depends on whether the system relies on LLM-generated text or a strong first-stage retriever.

Nour Jedidi, Jimmy LinThu, 12 Ma💬 cs.CL

Beyond the Illusion of Consensus: From Surface Heuristics to Knowledge-Grounded Evaluation in LLM-as-a-Judge

This paper challenges the assumption that high inter-evaluator agreement in LLM-as-a-judge systems indicates reliability by revealing an "Evaluation Illusion" driven by surface heuristics, and proposes the MERG framework, which uses domain-grounded rubrics to achieve more meaningful and consistent assessments in codified fields.

Mingyang Song, Mao Zheng, Chenning XuThu, 12 Ma💬 cs.CL

Instruction set for the representation of graphs

This paper introduces IsalGraph, a novel method that encodes any finite simple graph into a compact, valid nine-character instruction string using a virtual machine, enabling efficient canonical representation and demonstrating strong correlation between string edit distance and graph edit distance for applications in similarity search and language modeling.

Ezequiel Lopez-Rubio, Mario Pascual-GonzalezThu, 12 Ma💬 cs.CL

← Previous Next →

cs.CL

Prism-Δ\DeltaΔ: Differential Subspace Steering for Prompt Highlighting in Large Language Models