cs.CL papers | Gist.Science

LLLMs: A Data-Driven Survey of Evolving Research on Limitations of Large Language Models

This paper presents a data-driven survey of 14,648 studies from 2022 to early 2025, revealing that research on the limitations of large language models (LLLMs) has surged to over 30% of all LLM-related work, with reasoning, generalization, and hallucination being the most prominent areas of focus.

Aida Kostikova, Zhipin Wang, Deidamea Bajri, Ole Pütz, Benjamin Paaßen, Steffen EgerThu, 12 Ma💬 cs.CL

AutoPCR: Automated Phenotype Concept Recognition by Prompting

AutoPCR is a novel, training-free prompt-based framework for phenotype concept recognition that combines rule-based and neural extraction, SapBERT retrieval, and LLM linking to achieve state-of-the-art, robust performance across diverse datasets and ontologies without requiring ontology-specific training.

Yicheng Tao, Yuanhao Huang, Yiqun Wang, Xin Luo, Jie LiuThu, 12 Ma💬 cs.CL

Shadow in the Cache: Unveiling and Mitigating Privacy Risks of KV-cache in LLM Inference

This paper reveals that the Key-Value (KV) cache used to accelerate Large Language Model inference is vulnerable to privacy attacks that allow attackers to reconstruct sensitive user inputs, and it proposes KV-Cloak, a lightweight and efficient obfuscation defense that effectively prevents such leakage without compromising model accuracy or performance.

Zhifan Luo, Shuo Shao, Su Zhang, Lijing Zhou, Yuke Hu, Chenxu Zhao, Zhihao Liu, Zhan QinThu, 12 Ma💬 cs.CL

LaTeXTrans: Structured LaTeX Translation with Multi-Agent Coordination

The paper introduces LaTeXTrans, a collaborative multi-agent system that overcomes the challenges of translating structured LaTeX documents by decomposing, translating, validating, and reconstructing content to ensure both semantic accuracy and the preservation of complex formatting and syntax.

Ziming Zhu, Chenglong Wang, Haosong Xv, Shunjie Xing, Yifu Huo, Fengning Tian, Quan Du, Di Yang, Chunliang Zhang, Tong Xiao, Jingbo ZhuThu, 12 Ma💬 cs.CL

QCSE: A Pretrained Quantum Context-Sensitive Word Embedding for Natural Language Processing

This paper introduces QCSE, a pretrained quantum context-sensitive word embedding model that utilizes innovative context matrix computation methods to capture linguistic relationships and demonstrates its effectiveness on both English and low-resource Fulani corpora, highlighting the potential of Quantum Natural Language Processing to address data scarcity challenges.

Charles M. Varmantchaonala, Niclas Götting, Nils-Erik Schütte, Jean Louis E. K. Fendji, Christopher GiesThu, 12 Ma💬 cs.CL

Multi-modal Data Spectrum: Multi-modal Datasets are Multi-dimensional

This paper presents a large-scale empirical study across 23 visual question-answering benchmarks that reveals significant variations in intra- and inter-modality dependencies, uncovering that many benchmarks inadvertently amplify image-only reliance while showing limited true multi-modal interaction, thereby proposing a quantitative framework for principled dataset design and evaluation.

Divyam Madaan, Varshan Muhunthan, Kyunghyun Cho, Sumit ChopraThu, 12 Ma💬 cs.CL

Autoencoding-Free Context Compression for LLMs via Contextual Semantic Anchors

The paper proposes Semantic-Anchor Compression (SAC), an autoencoding-free context compression method that improves LLM inference by directly selecting and enhancing specific anchor tokens with learnable embeddings and bidirectional attention to aggregate contextual information, thereby outperforming existing autoencoding-based approaches in question-answering and summarization tasks.

Xin Liu, Runsong Zhao, Pengcheng Huang, Xinyu Liu, Junyi Xiao, Chunyang Xiao, Tong Xiao, Shengxiang Gao, Zhengtao Yu, Jingbo ZhuThu, 12 Ma💬 cs.CL

SAGE: A Top-Down Bottom-Up Knowledge-Grounded User Simulator for Multi-turn AGent Evaluation

The paper proposes SAGE, a novel user simulation framework for evaluating multi-turn agents that integrates top-down business logic and bottom-up infrastructure knowledge to generate realistic, diverse interactions capable of identifying significantly more agent errors than existing methods.

Ryan Shea, Yunan Lu, Liang Qiu, Zhou YuThu, 12 Ma💬 cs.CL

CEFR-Annotated WordNet: LLM-Based Proficiency-Guided Semantic Database for Language Learning

This paper presents a CEFR-annotated WordNet created using large language models to align semantic definitions with language proficiency levels, demonstrating that the resulting dataset and classifiers effectively bridge natural language processing and language education by achieving performance comparable to gold-standard annotations.

Masato Kikuchi, Masatsugu Ono, Toshioki Soga, Tetsu Tanabe, Tadachika OzonoThu, 12 Ma💬 cs.CL

Assessing the Political Fairness of Multilingual LLMs: A Case Study based on a 21-way Multiparallel EuroParl Dataset

This paper introduces a novel 21-way multiparallel EuroParl dataset to assess political bias in multilingual Large Language Models by demonstrating that translation quality systematically favors majority parties over outsider groups, offering a fairness-based alternative to traditional English survey methods.

Paul Lerner, François YvonThu, 12 Ma💬 cs.CL

KV Cache Transform Coding for Compact Storage in LLM Inference

KVTC is a lightweight, model-agnostic transform coder that achieves up to 20 $\times$ (or higher) compression of Key-Value caches for large language models by combining PCA-based decorrelation, adaptive quantization, and entropy coding, thereby enabling memory-efficient serving with reusable caches while maintaining high reasoning and long-context accuracy.

Konrad Staniszewski, Adrian ŁancuckiThu, 12 Ma💬 cs.CL

Get away with less: Need of source side data curation to build parallel corpus for low resource Machine Translation

This paper introduces LALITA, a framework that curates efficient parallel corpora for low-resource machine translation by selecting complex source sentences based on lexical and linguistic features, thereby significantly improving translation quality while reducing data requirements by more than half.

Saumitra Yadav, Manish ShrivastavaThu, 12 Ma💬 cs.CL

Chain-of-Thought Compression Should Not Be Blind: V-Skip for Efficient Multimodal Reasoning via Dual-Path Anchoring

The paper introduces V-Skip, a dual-path anchoring mechanism that prevents visual amnesia in multimodal reasoning by reformulating token compression as a visual-anchored information bottleneck, thereby achieving a 2.9× speedup with negligible accuracy loss.

Dongxu Zhang, Yiding Sun, Cheng Tan, Wenbiao Yan, Ning Yang, Jihua Zhu, Haijun ZhangThu, 12 Ma💬 cs.CL

Hallucination is a Consequence of Space-Optimality: A Rate-Distortion Theorem for Membership Testing

This paper establishes a rate-distortion theorem demonstrating that hallucinations in large language models are an inevitable consequence of information-theoretic optimal memory compression when storing sparse facts, forcing the model to confidently assign high scores to non-facts rather than abstain.

Anxin Guo, Jingwei LiThu, 12 Ma💬 cs.CL

Evaluating Long-Horizon Memory for Multi-Party Collaborative Dialogues

This paper introduces EverMemBench, the first benchmark designed to evaluate long-horizon memory in multi-party collaborative dialogues, revealing that current LLM systems struggle with multi-hop reasoning, temporal versioning, and implicit relevance retrieval in realistic, complex interaction scenarios.

Chuanrui Hu, Tong Li, Xingze Gao, Hongda Chen, Yi Bai, Dannong Xu, Tianwei Lin, Xiaohong Li, Yunyun Han, Jian Pei, Yafeng DengThu, 12 Ma💬 cs.CL

PsihoRo: Depression and Anxiety Romanian Text Corpus

The paper introduces PsihoRo, the first open-source Romanian corpus for depression and anxiety, which was constructed using open-ended responses and standardized screening surveys (PHQ-9 and GAD-7) from 205 participants to address the lack of mental health resources for the Romanian language in NLP.

Alexandra Ciobotaru, Ana-Maria Bucur, Liviu P. DinuThu, 12 Ma💬 cs.CL

How Large Language Models Get Stuck: Early structure with persistent errors

This paper investigates how Large Language Models trained on the BabyLM dataset often fail to learn specific grammatical rules because early erroneous biases, driven by misleading bigram statistics, become entrenched and persist throughout training, hindering efficient learning.

Alokesh Manna, William Snyder, Whitney TaborThu, 12 Ma💬 cs.CL

AdaPonderLM: Gated Pondering Language Models with Token-Wise Adaptive Depth

AdaPonderLM is a self-supervised recurrent language model that employs token-wise adaptive halting gates and KV reuse to dynamically allocate inference compute to difficult tokens, achieving significant efficiency gains without sacrificing performance compared to fixed-depth baselines.

Shixiang Song, He Li, Zitong Wang, Boyi Zeng, Feichen Song, Yixuan Wang, Zhiqin John Xu, Ziwei He, Zhouhan LinThu, 12 Ma💬 cs.CL

Cross-Family Speculative Prefill: Training-Free Long-Context Compression with Small Draft Models

This paper introduces Cross-Family Speculative Prefill, a training-free method that leverages lightweight draft models from different families to compress long prompts for target LLMs, achieving substantial latency reductions while maintaining or slightly improving accuracy across diverse tasks.

Shubhangi Upasani, Ravi Shanker Raju, Bo Li, Mengmeng Ji, John Long, Chen Wu, Urmish Thakker, Guangtao WangThu, 12 Ma💬 cs.CL

GhazalBench: Usage-Grounded Evaluation of LLMs on Persian Ghazals

The paper introduces GhazalBench, a benchmark demonstrating that while large language models can generally understand the meaning of Persian ghazals, they struggle with exact verse recall in completion tasks due to limited training exposure, a gap that narrows in recognition settings and is less pronounced for English sonnets.

Ghazal Kalhor, Yadollah YaghoobzadehThu, 12 Ma💬 cs.CL

← Previous Next →