cs.CL papers | Gist.Science

Speaker effects in language comprehension: An integrative model of language and speaker processing

This review proposes an integrative model of language comprehension that explains how speaker effects arise from the dynamic interplay between bottom-up acoustic perception and top-down social expectations, distinguishing between individual familiarity and demographic biases while highlighting the model's relevance for understanding language development and human-AI interaction.

Hanlin Wu, Zhenguang G. CaiTue, 10 Ma💬 cs.CL

Llama-Mob: Instruction-Tuning Llama-3-8B Excels in City-Scale Mobility Prediction

This paper introduces Llama-Mob, an instruction-tuned Llama-3-8B model that outperforms state-of-the-art methods in long-term, city-scale human mobility prediction and demonstrates strong zero-shot generalization across different urban environments.

Peizhi Tang, Chuang Yang, Tong Xing, Xiaohang Xu, Jiayi Xu, Renhe Jiang, Kaoru SezakiTue, 10 Ma💬 cs.CL

OfficeQA Pro: An Enterprise Benchmark for End-to-End Grounded Reasoning

The paper introduces OfficeQA Pro, a challenging enterprise benchmark using a massive corpus of U.S. Treasury Bulletins to demonstrate that current frontier AI agents struggle significantly with grounded, multi-document reasoning, achieving low accuracy even with direct document access and benefiting notably from structured document representations.

Krista Opsahl-Ong, Arnav Singhvi, Jasmine Collins, Ivan Zhou, Cindy Wang, Ashutosh Baheti, Owen Oertell, Jacob Portes, Sam Havens, Erich Elsen, Michael Bendersky, Matei Zaharia, Xing ChenTue, 10 Ma💬 cs.CL

Can Vision-Language Models Solve the Shell Game?

This paper introduces VET-Bench, a diagnostic benchmark revealing that current Vision-Language Models fail at tracking visually identical objects due to an over-reliance on static features, and proposes Spatiotemporal Grounded Chain-of-Thought (SGCoT) to achieve over 90% accuracy by explicitly generating object trajectories as intermediate reasoning steps.

Tiedong Liu, Wee Sun LeeTue, 10 Ma💬 cs.CL

Sandpiper: Orchestrated AI-Annotation for Educational Discourse at Scale

The paper introduces Sandpiper, a mixed-initiative system that integrates interactive researcher dashboards with agentic LLMs to enable scalable, privacy-preserving, and rigorous qualitative analysis of large-scale educational discourse while mitigating hallucinations and ensuring methodological consistency.

Daryl Hedley, Doug Pietrzak, Jorge Dias, Ian Burden, Bakhtawar Ahtisham, Zhuqian Zhou, Kirk Vanacore, Josh Marland, Rachel Slama, Justin Reich, Kenneth Koedinger, René KizilcecTue, 10 Ma💬 cs.CL

SlowBA: An efficiency backdoor attack towards VLM-based GUI agents

This paper introduces SlowBA, a novel backdoor attack against VLM-based GUI agents that utilizes a two-stage reward-level injection strategy and realistic pop-up triggers to induce excessive reasoning chains, thereby significantly increasing response latency while maintaining task accuracy and evading existing defenses.

Junxian Li, Tu Lan, Haozhen Tan, Yan Meng, Haojin ZhuTue, 10 Ma💬 cs.CL

Bootstrapping Audiovisual Speech Recognition in Zero-AV-Resource Scenarios with Synthetic Visual Data

This paper proposes a zero-AV-resource framework for audiovisual speech recognition that generates synthetic talking-head videos by lip-syncing static facial images with real audio, successfully enabling high-performance model training for under-resourced languages like Catalan without the need for labeled video corpora.

Pol Buitrago, Pol Gàlvez, Oriol Pareras, Javier HernandoTue, 10 Ma💬 cs.CL

Quantifying Cross-Lingual Transfer in Paralinguistic Speech Tasks

This paper introduces the Cross-Lingual Transfer Matrix (CLTM) to systematically quantify language-dependent performance variations in paralinguistic tasks like gender identification and speaker verification, revealing that despite their acoustic nature, these tasks exhibit distinct cross-lingual transfer patterns when using multilingual HuBERT-based encoders.

Pol Buitrago, Oriol Pareras, Federico Costa, Javier HernandoTue, 10 Ma💬 cs.CL

DualTurn: Learning Turn-Taking from Dual-Channel Generative Speech Pretraining

DualTurn is a dual-channel generative speech model that learns natural turn-taking dynamics through unsupervised pretraining on conversational audio and fine-tuning to predict agent actions, outperforming existing methods in both action prediction accuracy and turn-boundary anticipation while enabling tool-calling capabilities.

Shangeth RajaaTue, 10 Ma💬 cs.CL

SynPlanResearch-R1: Encouraging Tool Exploration for Deep Research with Synthetic Plans

The paper introduces SynPlanResearch-R1, a framework that synthesizes tool-use trajectories to encourage deeper exploration during supervised fine-tuning, thereby overcoming the limitations of reinforcement learning with verifiable rewards and significantly improving research agent performance across multiple benchmarks.

Hansi Zeng, Zoey Li, Yifan Gao, Chenwei Zhang, Xiaoman Pan, Tao Yang, Fengran Mo, Jiacheng Lin, Xian Li, Jingbo ShangTue, 10 Ma💬 cs.CL

DistillGuard: Evaluating Defenses Against LLM Knowledge Distillation

The paper introduces DistillGuard, a framework that systematically evaluates output-level defenses against LLM knowledge distillation and finds that most current approaches are largely ineffective, with performance degradation being highly task-dependent and insufficient to broadly prevent knowledge theft.

Bo JiangTue, 10 Ma💬 cs.CL

ArcLight: A Lightweight LLM Inference Architecture for Many-Core CPUs

ArcLight is a lightweight LLM inference architecture designed specifically for many-core CPUs that overcomes cross-NUMA memory access bottlenecks through efficient memory management, thread scheduling, and controlled tensor parallelism, achieving up to 46% higher throughput than mainstream frameworks while maintaining broad device compatibility.

Yuzhuang Xu, Xu Han, Yuxuan Li, Wanxiang CheTue, 10 Ma💬 cs.CL

3ViewSense: Spatial and Mental Perspective Reasoning from Orthographic Views in Vision-Language Models

To address the "spatial intelligence gap" where Vision-Language Models struggle with elementary 3D tasks despite strong logical reasoning, the paper introduces 3ViewSense, a framework that leverages an engineering-inspired "Simulate-and-Reason" mechanism to ground spatial understanding in orthographic views, significantly improving performance on occlusion-heavy counting and view-consistent reasoning benchmarks.

Shaoxiong Zhan, Yanlin Lai, Zheng Liu, Hai Lin, Shen Li, Xiaodong Cai, Zijian Lin, Wen Huang, Hai-Tao ZhengTue, 10 Ma💬 cs.CL

Large Language Model for Discrete Optimization Problems: Evaluation and Step-by-step Reasoning

This paper evaluates the capabilities of various large language models, including Llama-3 and ChatGPT, in solving diverse discrete optimization problems using natural language datasets, revealing that while stronger models generally perform better, Chain-of-Thought reasoning is not universally effective and data augmentation can improve performance on simpler tasks despite introducing instability.

Tianhao Qian, Guilin Qi, Z. Y. Wu, Ran Gu, Xuanyi Liu, Canchen LyuTue, 10 Ma💬 cs.CL

KCoEvo: A Knowledge Graph Augmented Framework for Evolutionary Code Generation

KCoEvo is a knowledge graph-augmented framework that addresses the challenges of API-driven code evolution by decomposing migration into path retrieval and informed generation stages, significantly improving accuracy and execution success over standard LLM baselines through structured reasoning and synthetic supervision.

Jiazhen Kang, Yuchen Lu, Chen Jiang, Jinrui Liu, Tianhao Zhang, Bo Jiang, Ningyuan Sun, Tongtong Wu, Guilin QiTue, 10 Ma💬 cs.CL

Image Generation Models: A Technical History

This paper provides a comprehensive technical survey of the history and evolution of image generation models, detailing the objectives, architectures, and limitations of various approaches from VAEs to diffusion methods, while also addressing recent advancements in video generation and the critical challenges of robustness and responsible deployment.

Rouzbeh ShirvaniTue, 10 Ma💬 cs.CL

AQuA: Toward Strategic Response Generation for Ambiguous Visual Questions

This paper introduces AQuA, a fine-grained dataset that categorizes ambiguous visual questions into four levels with corresponding optimal response strategies, demonstrating that fine-tuning Vision-Language Models on this dataset enables them to effectively recognize ambiguity and adaptively generate context-appropriate responses such as seeking clarification or listing alternatives, thereby outperforming existing baselines.

Jihyoung Jang, Hyounghun KimTue, 10 Ma💬 cs.CL

SoK: Agentic Retrieval-Augmented Generation (RAG): Taxonomy, Architectures, Evaluation, and Research Directions

This Systematization of Knowledge (SoK) paper establishes the first unified framework for Agentic Retrieval-Augmented Generation (RAG) by formalizing autonomous loops as decision-making processes, proposing a comprehensive taxonomy and architectural decomposition, critiquing current evaluation limitations and systemic risks, and outlining critical research directions for building reliable and scalable agentic systems.

Saroj Mishra, Suman Niroula, Umesh Yadav, Dilip Thakur, Srijan Gyawali, Shiva GaireTue, 10 Ma💬 cs.CL

The Third Ambition: Artificial Intelligence and the Science of Human Behavior

This paper proposes a "third ambition" for artificial intelligence research, advocating for the use of large language models as scientific instruments to study human behavior, culture, and moral reasoning by treating them as computationally accessible condensates of collective discourse while addressing their methodological and epistemic limitations.

W. Russell Neuman, Chad ColemanTue, 10 Ma💬 cs.CL

Fine-Grained Table Retrieval Through the Lens of Complex Queries

This paper introduces DCTR, a table retrieval mechanism that leverages fine-grained typed query decomposition and global connectivity awareness to effectively handle complex, open-domain question answering over relational databases, demonstrating robustness on industry-aligned benchmarks.

Wojciech Kosiuk, Xingyu Ji, Yeounoh Chung, Fatma Özcan, Madelon HulsebosTue, 10 Ma💬 cs.CL

← Previous Next →