cs.CL papers | Gist.Science

AdaCultureSafe: Adaptive Cultural Safety Grounded by Cultural Knowledge in Large Language Models

The paper proposes AdaCultureSafe, a framework that addresses the lack of correlation between cultural safety and knowledge in Large Language Models by constructing a novel dataset of culturally grounded queries and introducing a knowledge-integrated method to significantly enhance adaptive cultural safety.

Hankun Kang, Di Lin, Zhirong Liao, Pengfei Bai, Xinyi Zeng, Jiawei Jiang, Yuanyuan Zhu, Tieyun QianTue, 10 Ma💬 cs.CL

Evaluating LLM-Based Grant Proposal Review via Structured Perturbations

This paper evaluates LLM-based grant proposal reviews using structured perturbations on six quality axes, finding that a section-by-section analysis approach outperforms other architectures but that current models still struggle with clarity detection and holistic assessment, suggesting they are best suited as supplementary tools rather than replacements for human reviewers.

William Thorne, Joseph James, Yang Wang, Chenghua Lin, Diana MaynardTue, 10 Ma💬 cs.CL

Using Multimodal and Language-Agnostic Sentence Embeddings for Abstractive Summarization

This paper introduces SBARThez, a novel framework that leverages multimodal and multilingual sentence embeddings alongside a Named Entity Injection mechanism to enhance the factual consistency and cross-lingual capabilities of abstractive summarization for both text and speech inputs.

Chaimae Chellaf, Salima Mdhaffar, Yannick Estève, Stéphane HuetTue, 10 Ma💬 cs.CL

LAMUS: A Large-Scale Corpus for Legal Argument Mining from U.S. Caselaw using LLMs

This paper introduces LAMUS, a large-scale, high-quality sentence-level legal argument mining corpus for U.S. caselaw constructed via an LLM-driven pipeline with human refinement, which demonstrates that chain-of-thought prompting and LLM-assisted verification significantly enhance annotation quality and model performance for future legal NLP research.

Serene Wang, Lavanya Pobbathi, Haihua ChenTue, 10 Ma💬 cs.CL

Learning Multiple Utterance-Level Attribute Representations with a Unified Speech Encoder

This paper proposes a unified post-training framework that extends speech foundation models to generate multiple arbitrary utterance-level attribute representations, demonstrating its effectiveness through the joint learning of semantic and speaker embeddings for multilingual retrieval and speaker recognition tasks.

Maryem Bouziane, Salima Mdhaffar, Yannick EstèveTue, 10 Ma💬 cs.CL

SlowBA: An efficiency backdoor attack towards VLM-based GUI agents

This paper introduces SlowBA, a novel backdoor attack against VLM-based GUI agents that utilizes a two-stage reward-level injection strategy and realistic pop-up triggers to induce excessive reasoning chains, thereby significantly increasing response latency while maintaining task accuracy and evading existing defenses.

Junxian Li, Tu Lan, Haozhen Tan, Yan Meng, Haojin ZhuTue, 10 Ma💬 cs.CL

SPD-RAG: Sub-Agent Per Document Retrieval-Augmented Generation

SPD-RAG is a hierarchical multi-agent framework that improves scalability and answer quality for complex cross-document queries by assigning dedicated agents to process individual documents and synthesizing their outputs through a token-bounded coordinator, achieving superior performance on the LOONG benchmark with significantly reduced API costs compared to standard RAG and full-context baselines.

Yagiz Can Akay, Muhammed Yusuf Kartal, Esra Alparslan, Faruk Ortakoyluoglu, Arda AkpinarTue, 10 Ma💬 cs.CL

Rethinking Attention Output Projection: Structured Hadamard Transforms for Efficient Transformers

This paper proposes replacing the dense output projection in multi-head attention with a parameter-free Walsh-Hadamard Transform and lightweight affine rescaling, achieving significant reductions in parameters, memory, and inference latency while maintaining or improving model performance across various benchmarks.

Shubham Aggarwal, Lokendra KumarTue, 10 Ma🤖 cs.LG

Do Language Models Know Theo Has a Wife? Investigating the Proviso Problem

This paper introduces a diagnostic dataset and evaluation framework to investigate how language models handle the proviso problem in pragmatics, revealing that while models align with human judgments, they rely on shallow pattern matching rather than genuine semantic or pragmatic reasoning.

Tara Azin, Daniel Dumitrescu, Diana Inkpen, Raj SinghTue, 10 Ma💬 cs.CL

Computational modeling of early language learning from acoustic speech and audiovisual input without linguistic priors

This chapter reviews recent computational models demonstrating that self-supervised and visually grounded learning principles can effectively explain early language acquisition from acoustic and audiovisual speech without relying on strong linguistic priors.

Okko RäsänenTue, 10 Ma💬 cs.CL

Adaptive Loops and Memory in Transformers: Think Harder or Know More?

This paper introduces a transformer architecture combining adaptive per-layer looping and gated memory banks, demonstrating that while looping primarily enhances mathematical reasoning and memory aids commonsense tasks, their integration yields a model that outperforms significantly deeper baselines on math benchmarks.

Markus Frey, Behzad Shomali, Ali Hamza Bashir, David Berghaus, Mehdi AliTue, 10 Ma💬 cs.CL

COACH meets QUORUM: A Framework and Pipeline for Aligning User, Expert and Developer Perspectives in LLM-generated Health Counselling

This paper introduces QUORUM, a unified evaluation framework, and COACH, an LLM-driven pipeline, to generate and assess personalized health counseling for cancer patients, demonstrating that while stakeholders converge on the system's relevance and quality, they diverge on nuances like tone and error sensitivity, thereby highlighting the critical need for multi-perspective evaluation in trustworthy patient-centered NLP systems.

Yee Man Ng, Bram van Dijk, Pieter Beynen, Otto Boekesteijn, Joris Jansen, Gerard van Oortmerssen, Max van Duijn, Marco SpruitTue, 10 Ma💬 cs.CL

Revealing Behavioral Plasticity in Large Language Models: A Token-Conditional Perspective

This paper introduces Token-Conditioned Reinforcement Learning (ToCoRL), a framework that leverages the intrinsic behavioral plasticity of Large Language Models to internalize and stabilize inference-time adaptations, enabling precise control over behavioral modes like switching from reasoning to direct answering without degrading overall capabilities.

Liyuan Mao, Le Yu, Jing Zhou, Chujie Zheng, Bowen Yu, Chang Gao, Shixuan Liu, An Yang, Weinan Zhang, JunYang LinTue, 10 Ma🤖 cs.LG

Sandpiper: Orchestrated AI-Annotation for Educational Discourse at Scale

The paper introduces Sandpiper, a mixed-initiative system that integrates interactive researcher dashboards with agentic LLMs to enable scalable, privacy-preserving, and rigorous qualitative analysis of large-scale educational discourse while mitigating hallucinations and ensuring methodological consistency.

Daryl Hedley, Doug Pietrzak, Jorge Dias, Ian Burden, Bakhtawar Ahtisham, Zhuqian Zhou, Kirk Vanacore, Josh Marland, Rachel Slama, Justin Reich, Kenneth Koedinger, René KizilcecTue, 10 Ma💬 cs.CL

Aligning to Illusions: Choice Blindness in Human and AI Feedback

This paper challenges the stability of human and AI preferences in Reinforcement Learning from Human Feedback (RLHF) by demonstrating that both are susceptible to "choice blindness," where preferences are easily manipulated by context and shallow cues, leading to undetected reward signal corruption and downstream policy degradation.

Wenbin WuTue, 10 Ma💬 cs.CL

One Model Is Enough: Native Retrieval Embeddings from LLM Agent Hidden States

This paper proposes a method to equip LLM agents with native retrieval capabilities by projecting their hidden states directly into the embedding space via a lightweight head, thereby eliminating the need for a separate embedding model while retaining 97% of baseline retrieval quality.

Bo JiangTue, 10 Ma💬 cs.CL

Can Vision-Language Models Solve the Shell Game?

This paper introduces VET-Bench, a diagnostic benchmark revealing that current Vision-Language Models fail at tracking visually identical objects due to an over-reliance on static features, and proposes Spatiotemporal Grounded Chain-of-Thought (SGCoT) to achieve over 90% accuracy by explicitly generating object trajectories as intermediate reasoning steps.

Tiedong Liu, Wee Sun LeeTue, 10 Ma💬 cs.CL

A prospective clinical feasibility study of a conversational diagnostic AI in an ambulatory primary care clinic

This prospective feasibility study demonstrates that a conversational AI system (AMIE) can safely and effectively conduct clinical history-taking and generate diagnostic suggestions in a real-world urgent care setting, achieving high patient satisfaction and diagnostic accuracy comparable to primary care providers while requiring no real-time human intervention.

Peter Brodeur, Jacob M. Koshy, Anil Palepu, Khaled Saab, Ava Homiar, Roma Ruparel, Charles Wu, Ryutaro Tanno, Joseph Xu, Amy Wang, David Stutz, Hannah M. Ferrera, David Barrett, Lindsey Crowley, Jihyeon Lee, Spencer E. Rittner, Ellery Wulczyn, Selena K. Zhang, Elahe Vedadi, Christine G. Kohn, Kavita Kulkarni, Vinay Kadiyala, Sara Mahdavi, Wendy Du, Jessica Williams, David Feinbloom, Renee Wong, Tao Tu, Petar Sirkovic, Alessio Orlandi, Christopher Semturs, Yun Liu, Juraj Gottweis, Dale R. Webster, Joëlle Barral, Katherine Chou, Pushmeet Kohli, Avinatan Hassidim, Yossi Matias, James Manyika, Rob Fields, Jonathan X. Li, Marc L. Cohen, Vivek Natarajan, Mike Schaekermann, Alan Karthikesalingam, Adam RodmanTue, 10 Ma🤖 cs.LG

A Dataset for Probing Translationese Preferences in English-to-Swedish Translation

This paper introduces the first freely available English-to-Swedish dataset designed to benchmark language models' tendency to prefer literal "translationese" over idiomatic phrasing, revealing that exposure to source text biases models toward unnatural translations even when context is removed.

Jenny Kunz, Anja Jarochenko, Marcel BollmannTue, 10 Ma💬 cs.CL

LycheeCluster: Efficient Long-Context Inference with Structure-Aware Chunking and Hierarchical KV Indexing

LycheeCluster is a novel KV cache management method that employs structure-aware chunking and hierarchical indexing to transform cache retrieval into a logarithmic-time process, achieving up to a 3.6x inference speedup with minimal performance degradation for long-context LLMs.

Dongfang Li, Zixuan Liu, Gang Lin, Baotian Hu, Min ZhangTue, 10 Ma🤖 cs.LG

← Previous Next →