Seeing the Context: Rich Visual Context-Aware Speech Recognition via Multimodal Reasoning

This paper introduces VASR, a multimodal reasoning framework for Context-Aware Visual Speech Recognition (CAVSR) that leverages an Audio-Visual Chain-of-Thought (AV-CoT) to explicitly ground acoustic signals with rich visual context like scenes and on-screen text, thereby overcoming single-modality dominance and achieving state-of-the-art performance.

Wenjie Tian, Mingchen Shao, Bingshen Mu, Xuelong Geng, Chengyou Wang, Yujie Liao, Zhixian Zhao, Ziyu Zhang, Jingbin Hu, Mengqi Wei, Lei Xie2026-03-10💻 cs

LLM-FK: Multi-Agent LLM Reasoning for Foreign Key Detection in Large-Scale Complex Databases

LLM-FK is a novel multi-agent framework that overcomes the limitations of conventional heuristic and naive LLM methods in detecting foreign keys within large-scale complex databases by coordinating specialized agents to prune the search space, enhance reasoning with domain knowledge, and ensure global schema consistency, thereby achieving superior accuracy and scalability.

Zijian Tang, Ying Zhang, Sibo Cai, Ruoxuan Wang2026-03-10💻 cs

Do Deployment Constraints Make LLMs Hallucinate Citations? An Empirical Study across Four Models and Five Prompting Regimes

This empirical study demonstrates that deployment-motivated prompting constraints significantly exacerbate citation hallucinations across four large language models, with no model achieving a citation existence rate above 47.5% and a substantial portion of unverifiable outputs being fabricated, thereby underscoring the critical need for post-hoc verification in academic and software engineering contexts.

Chen Zhao, Yuan Tang, Yitian Qian2026-03-10💻 cs

TopRank-Based Delivery Rate Optimization for Coded Caching under Non-Uniform Demands

This paper proposes a TopRank-based coded caching strategy that optimizes delivery rates under non-uniform, unknown file demands by ranking files based on request count differences rather than estimating exact popularities, thereby achieving superior performance and sublinear regret in scenarios with limited users, small cache capacities, or noisy observation data.

Mohammadsaber Bahadori, Seyed Pooya Shariatpanahi, Behnam Bahrak2026-03-10💻 cs

MAviS: A Multimodal Conversational Assistant For Avian Species

This paper introduces MAviS, a domain-adaptive multimodal conversational assistant for avian species that leverages the newly created MAviS-Dataset and is evaluated on the MAviS-Bench to achieve state-of-the-art performance in fine-grained bird species understanding and multimodal question answering.

Yevheniia Kryklyvets, Mohammed Irfan Kurpath, Sahal Shaji Mullappilly, Jinxing Zhou, Fahad Shabzan Khan, Rao Anwer, Salman Khan, Hisham Cholakkal2026-03-10💻 cs

Seeing the Reasoning: How LLM Rationales Influence User Trust and Decision-Making in Factual Verification Tasks

This study reveals that in factual verification tasks, users' trust and decision-making are primarily driven by the correctness and certainty framing of LLM rationales rather than their presentation format, highlighting the dual potential of well-designed rationales to either support decision-making or miscalibrate trust.

Xin Sun, Shu Wei, Jos A Bosch, Isao Echizen, Saku Sugawara, Abdallah El Ali2026-03-10💻 cs