cs.CL papers | Gist.Science

IAG: Input-aware Backdoor Attack on VLM-based Visual Grounding

This paper introduces IAG, the first input-aware backdoor attack on vision-language models for visual grounding, which utilizes a text-conditioned UNet to dynamically generate imperceptible, target-specific triggers that achieve high attack success rates across various models and datasets while maintaining stealth and robustness against defenses.

Junxian Li, Beining Xu, Simin Chen, Jiatong Li, Jingdi Lei, Haodong Zhao, Di ZhangTue, 10 Ma💬 cs.CL

Multi-Domain Audio Question Answering Benchmark Toward Acoustic Content Reasoning

This paper introduces Task 5 of the DCASE 2025 Challenge, a multi-domain Audio Question Answering benchmark designed to evaluate and advance the acoustic reasoning capabilities of audio-language models across diverse scenarios including bioacoustics, temporal soundscapes, and complex real-world clips.

Chao-Han Huck Yang, Sreyan Ghosh, Qing Wang, Jaeyeon Kim, Hengyi Hong, Sonal Kumar, Guirui Zhong, Zhifeng Kong, S Sakshi, Vaibhavi Lokegaonkar, Oriol Nieto, Ramani Duraiswami, Dinesh Manocha, Gunhee Kim, Jun Du, Rafael Valle, Bryan CatanzaroTue, 10 Ma💬 cs.CL

PrivMedChat: End-to-End Differentially Private RLHF for Medical Dialogue Systems

PrivMedChat is an end-to-end framework that enables the training of medical dialogue systems with formal differential privacy guarantees by integrating DP-SGD and DP-aware policy optimization across all supervision stages, while utilizing an annotation-free preference construction strategy to avoid costly clinician labeling.

Sudip BhujelTue, 10 Ma💬 cs.CL

CyclicJudge: Mitigating Judge Bias Efficiently in LLM-based Evaluation

This paper introduces CyclicJudge, a cost-effective round-robin evaluation strategy that mitigates systematic LLM-as-judge bias by partitioning score variance and assigning judges cyclically to scenarios, thereby achieving reliable rankings without increasing computational costs.

Ziyi Zhu, Olivier Tieleman, Alexey Bukhtiyarov, Jinghong ChenTue, 10 Ma💬 cs.CL

Conformal Prediction for Risk-Controlled Medical Entity Extraction Across Clinical Domains

This paper proposes a conformal prediction framework that ensures safe, domain-specific deployment of LLMs for medical entity extraction by adapting calibration thresholds to counteract the distinct underconfidence observed in structured FDA labels and overconfidence in free-text radiology reports, thereby achieving target coverage guarantees with manageable rejection rates across diverse clinical settings.

Manil Shrestha, Edward KimTue, 10 Ma💬 cs.CL

KVSlimmer: Theoretical Insights and Practical Optimizations for Asymmetric KV Merging

KVSlimmer introduces a theoretically grounded, gradient-free algorithm for asymmetric KV cache merging that leverages spectral energy distribution analysis and exact Hessian formulation to achieve state-of-the-art performance in reducing memory costs and latency while improving inference quality.

Lianjun Liu, Hongli An, Weiqi Yan, Xin Du, Shengchuan Zhang, Huazhong Liu, Yunshan ZhongTue, 10 Ma💬 cs.CL

Condition-Gated Reasoning for Context-Dependent Biomedical Question Answering

This paper introduces CondMedQA, the first benchmark for conditional biomedical question answering, and proposes Condition-Gated Reasoning (CGR), a framework that constructs condition-aware knowledge graphs to dynamically prune reasoning paths based on patient-specific factors, thereby improving the reliability of medical decision-making.

Jash Rajesh Parekh, Wonbin Kweon, Joey Chan, Rezarta Islamaj, Robert Leaman, Pengcheng Jiang, Chih-Hsuan Wei, Zhizheng Wang, Zhiyong Lu, Jiawei HanTue, 10 Ma💬 cs.CL

Understand Then Memory: A Cognitive Gist-Driven RAG Framework with Global Semantic Diffusion

CogitoRAG is a novel Retrieval-Augmented Generation framework inspired by human episodic memory that enhances complex reasoning and reduces hallucinations by extracting semantic gists into a multi-dimensional knowledge graph, utilizing query decomposition and entity diffusion for associative retrieval, and employing a fusion-based reranking algorithm to deliver high-density evidence.

Pengcheng Zhou, Haochen Li, Zhiqiang Nie, JiaLe Chen, Qing Gong, Weizhen Zhang, Chun YuTue, 10 Ma💬 cs.CL

Explainable Token-level Noise Filtering for LLM Fine-tuning Datasets

This paper introduces XTF, an explainable framework that improves LLM fine-tuning performance by decomposing token contributions into reasoning importance, knowledge novelty, and task relevance to identify and mask noisy tokens, achieving up to a 13.7% improvement across math, code, and medical tasks.

Yuchen Yang, Wenze Lin, Enhao Huang, Zhixuan Chu, Hongbin Zhou, Lan Tao, Yiming Li, Zhan Qin, Kui RenTue, 10 Ma💬 cs.CL

Neuro-Symbolic Synergy for Interactive World Modeling

The paper proposes NeSyS, a neuro-symbolic framework that integrates the semantic expressivity of large language models with the logical consistency of symbolic world models through alternating training and direct probability distribution constraints, achieving superior prediction accuracy and data efficiency across diverse interactive environments.

Hongyu Zhao, Siyu Zhou, Haolin Yang, Zengyi Qin, Tianyi ZhouTue, 10 Ma💬 cs.CL

Listen to the Layers: Mitigating Hallucinations with Inter-Layer Disagreement

The paper proposes CoCoA, a training-free decoding algorithm that mitigates LLM hallucinations by detecting and penalizing factually incorrect outputs through the analysis of representational instability and internal disagreement across the model's middle layers.

Koduvayur Subbalakshmi, Sabbir Hossain Ujjal, Venkata Krishna Teja Mangichetty, Nastaran Jamalipour SoofiTue, 10 Ma💬 cs.CL

Improving X-Codec-2.0 for Multi-Lingual Speech: 25 Hz Latent Rate and 24 kHz Sampling

This paper presents an optimized version of X-Codec-2.0 that reduces the latent rate to 25 Hz and increases the sampling rate to 24 kHz through simple architectural adjustments, achieving superior multilingual speech quality and efficiency compared to the original baseline.

Husein ZolkepliTue, 10 Ma💬 cs.CL

EFT-CoT: A Multi-Agent Chain-of-Thought Framework for Emotion-Focused Therapy

This paper introduces EFT-CoT, a multi-agent chain-of-thought framework grounded in Emotion-Focused Therapy that operationalizes a three-stage intervention workflow to generate empathetic and professionally structured mental health responses, validated by the creation of the EFT-Instruct dataset and the superior performance of the fine-tuned EFT-LLM model.

Lanqing Du, Yunong Li, YuJie Long, Shihong ChenTue, 10 Ma💬 cs.CL

NC-Bench: An LLM Benchmark for Evaluating Conversational Competence

NC-Bench introduces a theory-grounded benchmark that evaluates the conversational competence of large language models by assessing their ability to manage the form and structure of natural interactions across basic, retrieval-augmented, and complex multi-turn scenarios, revealing that while models excel at basic answering, they struggle significantly with repair and complex sequence management tasks.

Robert J. Moore, Sungeun An, Farhan Ahmed, Jay Pankaj GalaTue, 10 Ma💬 cs.CL

SETUP: Sentence-level English-To-Uniform Meaning Representation Parser

This paper introduces SETUP, a sentence-level English-to-Uniform Meaning Representation parser that achieves state-of-the-art performance by fine-tuning Abstract Meaning Representation models and leveraging Universal Dependencies converters, thereby enabling the automatic large-scale production of accurate UMR graphs.

Emma Markle, Javier Gutierrez Bach, Shira WeinTue, 10 Ma💬 cs.CL

Stealth Fine-Tuning: Efficiently Breaking Alignment in RVLMs Using Self-Generated CoT

This paper introduces "Stealth Fine-Tuning," a low-cost attack method that bypasses the safety alignment of Reasoning-augmented Vision-Language Models (RVLMs) by exploiting exposed chain-of-thought traces to generate harmful reasoning data for efficient fine-tuning, achieving significantly higher attack success rates than existing methods while preserving general reasoning capabilities.

Le Yu, Zhengyue Zhao, Yawen Zheng, Yunhao LiuTue, 10 Ma💬 cs.CL

Multimodal LLMs Do Not Compose Skills Optimally Across Modalities

This paper reveals that Multimodal Large Language Models (MLLMs) struggle to optimally compose skills across different modalities, exhibiting significant performance gaps even when employing strategies like chain-of-thought prompting and specialized fine-tuning.

Paula Ontalvilla, Aitor Ormazabal, Gorka AzkuneTue, 10 Ma💬 cs.CL

SPOT: An Annotated French Corpus and Benchmark for Detecting Critical Interventions in Online Conversations

This paper introduces SPOT, the first annotated French corpus and benchmark for detecting "stopping points"—subtle critical interventions that pause or redirect online discussions—and demonstrates that fine-tuned encoder models outperform prompted LLMs in this task, particularly when enriched with contextual metadata.

Manon Berriche, Célia Nouri, Chloée Clavel, Jean-Philippe CointetTue, 10 Ma💬 cs.CL

HatePrototypes: Interpretable and Transferable Representations for Implicit and Explicit Hate Speech Detection

This paper introduces HatePrototypes, a parameter-free method using class-level vector representations derived from as few as 50 examples per class to enable efficient, interpretable, and transferable detection of both explicit and implicit hate speech without requiring repeated fine-tuning.

Irina Proskurina, Marc-Antoine Carpentier, Julien VelcinTue, 10 Ma💬 cs.CL

SwiftEmbed: Ultra-Fast Text Embeddings via Static Token Lookup for Real-Time Applications

SwiftEmbed is a production-oriented, Rust-based serving system that achieves ultra-low latency (1.12 ms p50) and high throughput (50,000 RPS) for real-time applications by utilizing static token lookup and mean pooling on the distilled Potion-base-8M model, delivering strong performance in duplicate detection and semantic similarity tasks while trading off accuracy on complex classification and retrieval workloads compared to full transformer inference.

Edouard Lansiaux, Antoine Simonet, Eric WielTue, 10 Ma💬 cs.CL

← Previous Next →