IAG: Input-aware Backdoor Attack on VLM-based Visual Grounding

This paper introduces IAG, the first input-aware backdoor attack on vision-language models for visual grounding, which utilizes a text-conditioned UNet to dynamically generate imperceptible, target-specific triggers that achieve high attack success rates across various models and datasets while maintaining stealth and robustness against defenses.

Junxian Li, Beining Xu, Simin Chen, Jiatong Li, Jingdi Lei, Haodong Zhao, Di ZhangTue, 10 Ma💬 cs.CL

Multi-Domain Audio Question Answering Benchmark Toward Acoustic Content Reasoning

This paper introduces Task 5 of the DCASE 2025 Challenge, a multi-domain Audio Question Answering benchmark designed to evaluate and advance the acoustic reasoning capabilities of audio-language models across diverse scenarios including bioacoustics, temporal soundscapes, and complex real-world clips.

Chao-Han Huck Yang, Sreyan Ghosh, Qing Wang, Jaeyeon Kim, Hengyi Hong, Sonal Kumar, Guirui Zhong, Zhifeng Kong, S Sakshi, Vaibhavi Lokegaonkar, Oriol Nieto, Ramani Duraiswami, Dinesh Manocha, Gunhee Kim, Jun Du, Rafael Valle, Bryan CatanzaroTue, 10 Ma💬 cs.CL

Conformal Prediction for Risk-Controlled Medical Entity Extraction Across Clinical Domains

This paper proposes a conformal prediction framework that ensures safe, domain-specific deployment of LLMs for medical entity extraction by adapting calibration thresholds to counteract the distinct underconfidence observed in structured FDA labels and overconfidence in free-text radiology reports, thereby achieving target coverage guarantees with manageable rejection rates across diverse clinical settings.

Manil Shrestha, Edward KimTue, 10 Ma💬 cs.CL

Condition-Gated Reasoning for Context-Dependent Biomedical Question Answering

This paper introduces CondMedQA, the first benchmark for conditional biomedical question answering, and proposes Condition-Gated Reasoning (CGR), a framework that constructs condition-aware knowledge graphs to dynamically prune reasoning paths based on patient-specific factors, thereby improving the reliability of medical decision-making.

Jash Rajesh Parekh, Wonbin Kweon, Joey Chan, Rezarta Islamaj, Robert Leaman, Pengcheng Jiang, Chih-Hsuan Wei, Zhizheng Wang, Zhiyong Lu, Jiawei HanTue, 10 Ma💬 cs.CL

Understand Then Memory: A Cognitive Gist-Driven RAG Framework with Global Semantic Diffusion

CogitoRAG is a novel Retrieval-Augmented Generation framework inspired by human episodic memory that enhances complex reasoning and reduces hallucinations by extracting semantic gists into a multi-dimensional knowledge graph, utilizing query decomposition and entity diffusion for associative retrieval, and employing a fusion-based reranking algorithm to deliver high-density evidence.

Pengcheng Zhou, Haochen Li, Zhiqiang Nie, JiaLe Chen, Qing Gong, Weizhen Zhang, Chun YuTue, 10 Ma💬 cs.CL

EFT-CoT: A Multi-Agent Chain-of-Thought Framework for Emotion-Focused Therapy

This paper introduces EFT-CoT, a multi-agent chain-of-thought framework grounded in Emotion-Focused Therapy that operationalizes a three-stage intervention workflow to generate empathetic and professionally structured mental health responses, validated by the creation of the EFT-Instruct dataset and the superior performance of the fine-tuned EFT-LLM model.

Lanqing Du, Yunong Li, YuJie Long, Shihong ChenTue, 10 Ma💬 cs.CL

NC-Bench: An LLM Benchmark for Evaluating Conversational Competence

NC-Bench introduces a theory-grounded benchmark that evaluates the conversational competence of large language models by assessing their ability to manage the form and structure of natural interactions across basic, retrieval-augmented, and complex multi-turn scenarios, revealing that while models excel at basic answering, they struggle significantly with repair and complex sequence management tasks.

Robert J. Moore, Sungeun An, Farhan Ahmed, Jay Pankaj GalaTue, 10 Ma💬 cs.CL

Stealth Fine-Tuning: Efficiently Breaking Alignment in RVLMs Using Self-Generated CoT

This paper introduces "Stealth Fine-Tuning," a low-cost attack method that bypasses the safety alignment of Reasoning-augmented Vision-Language Models (RVLMs) by exploiting exposed chain-of-thought traces to generate harmful reasoning data for efficient fine-tuning, achieving significantly higher attack success rates than existing methods while preserving general reasoning capabilities.

Le Yu, Zhengyue Zhao, Yawen Zheng, Yunhao LiuTue, 10 Ma💬 cs.CL

SPOT: An Annotated French Corpus and Benchmark for Detecting Critical Interventions in Online Conversations

This paper introduces SPOT, the first annotated French corpus and benchmark for detecting "stopping points"—subtle critical interventions that pause or redirect online discussions—and demonstrates that fine-tuned encoder models outperform prompted LLMs in this task, particularly when enriched with contextual metadata.

Manon Berriche, Célia Nouri, Chloée Clavel, Jean-Philippe CointetTue, 10 Ma💬 cs.CL

SwiftEmbed: Ultra-Fast Text Embeddings via Static Token Lookup for Real-Time Applications

SwiftEmbed is a production-oriented, Rust-based serving system that achieves ultra-low latency (1.12 ms p50) and high throughput (50,000 RPS) for real-time applications by utilizing static token lookup and mean pooling on the distilled Potion-base-8M model, delivering strong performance in duplicate detection and semantic similarity tasks while trading off accuracy on complex classification and retrieval workloads compared to full transformer inference.

Edouard Lansiaux, Antoine Simonet, Eric WielTue, 10 Ma💬 cs.CL