Shadow in the Cache: Unveiling and Mitigating Privacy Risks of KV-cache in LLM Inference

This paper reveals that the Key-Value (KV) cache used to accelerate Large Language Model inference is vulnerable to privacy attacks that allow attackers to reconstruct sensitive user inputs, and it proposes KV-Cloak, a lightweight and efficient obfuscation defense that effectively prevents such leakage without compromising model accuracy or performance.

Zhifan Luo, Shuo Shao, Su Zhang, Lijing Zhou, Yuke Hu, Chenxu Zhao, Zhihao Liu, Zhan QinThu, 12 Ma💬 cs.CL

LaTeXTrans: Structured LaTeX Translation with Multi-Agent Coordination

The paper introduces LaTeXTrans, a collaborative multi-agent system that overcomes the challenges of translating structured LaTeX documents by decomposing, translating, validating, and reconstructing content to ensure both semantic accuracy and the preservation of complex formatting and syntax.

Ziming Zhu, Chenglong Wang, Haosong Xv, Shunjie Xing, Yifu Huo, Fengning Tian, Quan Du, Di Yang, Chunliang Zhang, Tong Xiao, Jingbo ZhuThu, 12 Ma💬 cs.CL

QCSE: A Pretrained Quantum Context-Sensitive Word Embedding for Natural Language Processing

This paper introduces QCSE, a pretrained quantum context-sensitive word embedding model that utilizes innovative context matrix computation methods to capture linguistic relationships and demonstrates its effectiveness on both English and low-resource Fulani corpora, highlighting the potential of Quantum Natural Language Processing to address data scarcity challenges.

Charles M. Varmantchaonala, Niclas Götting, Nils-Erik Schütte, Jean Louis E. K. Fendji, Christopher GiesThu, 12 Ma💬 cs.CL

Multi-modal Data Spectrum: Multi-modal Datasets are Multi-dimensional

This paper presents a large-scale empirical study across 23 visual question-answering benchmarks that reveals significant variations in intra- and inter-modality dependencies, uncovering that many benchmarks inadvertently amplify image-only reliance while showing limited true multi-modal interaction, thereby proposing a quantitative framework for principled dataset design and evaluation.

Divyam Madaan, Varshan Muhunthan, Kyunghyun Cho, Sumit ChopraThu, 12 Ma💬 cs.CL

Autoencoding-Free Context Compression for LLMs via Contextual Semantic Anchors

The paper proposes Semantic-Anchor Compression (SAC), an autoencoding-free context compression method that improves LLM inference by directly selecting and enhancing specific anchor tokens with learnable embeddings and bidirectional attention to aggregate contextual information, thereby outperforming existing autoencoding-based approaches in question-answering and summarization tasks.

Xin Liu, Runsong Zhao, Pengcheng Huang, Xinyu Liu, Junyi Xiao, Chunyang Xiao, Tong Xiao, Shengxiang Gao, Zhengtao Yu, Jingbo ZhuThu, 12 Ma💬 cs.CL

CEFR-Annotated WordNet: LLM-Based Proficiency-Guided Semantic Database for Language Learning

This paper presents a CEFR-annotated WordNet created using large language models to align semantic definitions with language proficiency levels, demonstrating that the resulting dataset and classifiers effectively bridge natural language processing and language education by achieving performance comparable to gold-standard annotations.

Masato Kikuchi, Masatsugu Ono, Toshioki Soga, Tetsu Tanabe, Tadachika OzonoThu, 12 Ma💬 cs.CL

Evaluating Long-Horizon Memory for Multi-Party Collaborative Dialogues

This paper introduces EverMemBench, the first benchmark designed to evaluate long-horizon memory in multi-party collaborative dialogues, revealing that current LLM systems struggle with multi-hop reasoning, temporal versioning, and implicit relevance retrieval in realistic, complex interaction scenarios.

Chuanrui Hu, Tong Li, Xingze Gao, Hongda Chen, Yi Bai, Dannong Xu, Tianwei Lin, Xiaohong Li, Yunyun Han, Jian Pei, Yafeng DengThu, 12 Ma💬 cs.CL

Cross-Family Speculative Prefill: Training-Free Long-Context Compression with Small Draft Models

This paper introduces Cross-Family Speculative Prefill, a training-free method that leverages lightweight draft models from different families to compress long prompts for target LLMs, achieving substantial latency reductions while maintaining or slightly improving accuracy across diverse tasks.

Shubhangi Upasani, Ravi Shanker Raju, Bo Li, Mengmeng Ji, John Long, Chen Wu, Urmish Thakker, Guangtao WangThu, 12 Ma💬 cs.CL