How Much Do LLMs Hallucinate in Document Q&A Scenarios? A 172-Billion-Token Study Across Temperatures, Context Lengths, and Hardware Platforms

This study utilizes a massive 172-billion-token evaluation across diverse models, context lengths, and hardware to reveal that while model selection is the primary determinant of accuracy, hallucination rates in document Q&A rise significantly with context length and vary non-linearly with temperature, highlighting that grounding ability and fabrication resistance are distinct capabilities.

JV Roig2026-03-10💬 cs.CL

AdaCultureSafe: Adaptive Cultural Safety Grounded by Cultural Knowledge in Large Language Models

The paper proposes AdaCultureSafe, a framework that addresses the lack of correlation between cultural safety and knowledge in Large Language Models by constructing a novel dataset of culturally grounded queries and introducing a knowledge-integrated method to significantly enhance adaptive cultural safety.

Hankun Kang, Di Lin, Zhirong Liao, Pengfei Bai, Xinyi Zeng, Jiawei Jiang, Yuanyuan Zhu, Tieyun Qian2026-03-10💬 cs.CL

Evaluating LLM-Based Grant Proposal Review via Structured Perturbations

This paper evaluates LLM-based grant proposal reviews using structured perturbations on six quality axes, finding that a section-by-section analysis approach outperforms other architectures but that current models still struggle with clarity detection and holistic assessment, suggesting they are best suited as supplementary tools rather than replacements for human reviewers.

William Thorne, Joseph James, Yang Wang, Chenghua Lin, Diana Maynard2026-03-10💬 cs.CL

Minor First, Major Last: A Depth-Induced Implicit Bias of Sharpness-Aware Minimization

This paper reveals that Sharpness-Aware Minimization (SAM) exhibits depth-dependent implicit biases in linear diagonal networks, where \ell_\infty-SAM's convergence becomes initialization-sensitive and unstable at depth L=2L=2, while 2\ell_2-SAM displays "sequential feature amplification" that prioritizes minor features early in training, demonstrating that infinite-time implicit bias analyses fail to capture SAM's critical finite-time dynamics.

Chaewon Moon, Dongkuk Si, Chulhee Yun2026-03-10🤖 cs.LG

Deconstructing Multimodal Mathematical Reasoning: Towards a Unified Perception-Alignment-Reasoning Paradigm

This paper systematically reviews recent advancements in Multimodal Mathematical Reasoning by proposing a unified Perception-Alignment-Reasoning paradigm, categorizing existing approaches around four fundamental questions regarding information extraction, representation, reasoning, and evaluation, while outlining future research challenges.

Tianyu Yang, Sihong Wu, Yilun Zhao, Zhenwen Liang, Lisen Dai, Chen Zhao, Minhao Cheng, Arman Cohan, Xiangliang Zhang2026-03-10💻 cs

Retrieval-Augmented Anatomical Guidance for Text-to-CT Generation

This paper proposes a retrieval-augmented framework for text-to-CT generation that leverages a 3D vision-language encoder to retrieve semantically related clinical cases and their anatomical annotations as structural proxies, thereby enhancing image fidelity and spatial controllability in a realistic inference setting without requiring ground-truth annotations.

Daniele Molino, Camillo Maria Caruso, Paolo Soda, Valerio Guarrasi2026-03-10💻 cs

Human-AI Divergence in Ego-centric Action Recognition under Spatial and Spatiotemporal Manipulations

This paper presents a large-scale comparative study using the Epic ReduAct dataset and over 3,000 human participants to demonstrate that while humans rely on sparse, semantically critical cues for egocentric action recognition, state-of-the-art AI models degrade more gradually by depending on contextual and low-level features, revealing fundamental divergences in how humans and machines process spatial and spatiotemporal information.

Sadegh Rahmaniboldaji, Filip Rybansky, Quoc C. Vuong, Anya C. Hurlbert, Frank Guerin, Andrew Gilbert2026-03-10💻 cs

CORE-Acu: Structured Reasoning Traces and Knowledge Graph Safety Verification for Acupuncture Clinical Decision Support

CORE-Acu is a neuro-symbolic framework for acupuncture clinical decision support that integrates structured reasoning traces, a knowledge graph-based safety verification system, and a specialized loss function to ensure interpretable, hallucination-free, and strictly safe treatment recommendations, outperforming standard LLMs with zero observed safety violations.

Liuyi Xu, Yun Guo, Ming Chen, Zihan Dun, Yining Qian, An-Yang Lu, Shuang Li, Lijun Liu2026-03-10💻 cs

Agentic Neurosymbolic Collaboration for Mathematical Discovery: A Case Study in Combinatorial Design

This paper presents a neurosymbolic collaboration between an LLM-powered agent, symbolic computation tools, and human researchers that successfully discovered and formally verified a new tight lower bound on the imbalance of Latin squares for the case n1(mod3)n \equiv 1 \pmod{3}, demonstrating the potential of AI-human partnerships in pure mathematical discovery.

Hai Xia, Carla P. Gomes, Bart Selman, Stefan Szeider2026-03-10🔢 math

SPD-RAG: Sub-Agent Per Document Retrieval-Augmented Generation

SPD-RAG is a hierarchical multi-agent framework that improves scalability and answer quality for complex cross-document queries by assigning dedicated agents to process individual documents and synthesizing their outputs through a token-bounded coordinator, achieving superior performance on the LOONG benchmark with significantly reduced API costs compared to standard RAG and full-context baselines.

Yagiz Can Akay, Muhammed Yusuf Kartal, Esra Alparslan, Faruk Ortakoyluoglu, Arda Akpinar2026-03-10💬 cs.CL

Electrocardiogram Classification with Transformers Using Koopman and Wavelet Features

This paper demonstrates that while wavelet features excel in binary ECG classification, a transformer-based model utilizing Koopman operator features derived from an optimized Extended Dynamic Mode Decomposition (EDMD) with a radial basis function dictionary achieves superior performance in multi-class ECG classification, outperforming both wavelet-only and hybrid approaches.

Sucheta Ghosh, Zahra Monfared2026-03-10🤖 cs.LG