cs.AI papers | Gist.Science

Disentangling Reasoning in Large Audio-Language Models for Ambiguous Emotion Prediction

This paper introduces a systematic framework for Large Audio-Language Models that reformulates ambiguous emotion recognition as a distributional reasoning problem, utilizing an ambiguity-aware objective and structured chain-of-thought supervision to significantly improve performance on standard benchmarks.

Xiaofeng Yu, Jiaheng Dong, Jean Honorio, Abhirup Ghosh, Hong Jia, Ting Dang2026-03-10💻 cs

The Struggle Between Continuation and Refusal: A Mechanistic Analysis of the Continuation-Triggered Jailbreak in LLMs

This paper investigates the continuation-triggered jailbreak phenomenon in large language models, revealing through mechanistic interpretability analysis that its root cause lies in the inherent competition between the model's intrinsic continuation drive and its safety alignment defenses, while also identifying distinct behavioral patterns in safety-critical attention heads across different architectures.

Yonghong Deng, Zhen Yang, Ping Jian, Xinyue Zhang, Zhongbin Guo, Chengzhi Li2026-03-10🤖 cs.LG

Exploring Deep Learning and Ultra-Widefield Imaging for Diabetic Retinopathy and Macular Edema

This study leverages the MICCAI 2024 UWF4DR dataset to benchmark state-of-the-art deep learning models, including CNNs, Vision Transformers, and foundation models, in both spatial and frequency domains for image quality assessment, referable diabetic retinopathy detection, and diabetic macular edema identification using ultra-widefield imaging, demonstrating that feature-level fusion and frequency-domain representations yield robust and explainable results.

Pablo Jimenez-Lizcano, Sergio Romero-Tapiador, Ruben Tolosana, Aythami Morales, Guillermo González de Rivera, Ruben Vera-Rodriguez, Julian Fierrez2026-03-10💻 cs

Fibration Policy Optimization

This paper introduces Fibration Policy Optimization (FiberPO), a unified framework that bridges trust-region theory and compositional algebraic structures to enable principled, multi-scale stability control in large language model training through the novel Aggregational Policy Censoring Objective and Fiber Bundle Gating mechanism.

Chang Li, Tshihao Tsu, Yaren Zhang, Chao Xue, Xiaodong He2026-03-10🤖 cs.LG

FinToolBench: Evaluating LLM Agents for Real-World Financial Tool Use

The paper introduces FinToolBench, the first real-world, runnable benchmark that evaluates LLM agents on 760 executable financial tools using a novel framework assessing timeliness, intent, and regulatory compliance, alongside a proposed finance-aware baseline named FATR to advance trustworthy agentic AI in finance.

Jiaxuan Lu, Kong Wang, Yemin Wang, Qingmei Tang, Hongwei Zeng, Xiang Chen, Jiahao Pi, Shujian Deng, Lingzhi Chen, Yi Fu, Kehua Yang, Xiao Sun2026-03-10💻 cs

Towards a more efficient bias detection in financial language models

This paper proposes a cost-effective approach to detecting bias in financial language models by leveraging cross-model patterns to identify bias-revealing inputs early, demonstrating that up to 73% of a model's biased behaviors can be uncovered using only 20% of the input pairs when guided by another model's outputs.

Firas Hadj Kacem, Ahmed Khanfir, Mike Papadakis2026-03-10🤖 cs.LG

SAIL: Test-Time Scaling for In-Context Imitation Learning with VLM

SAIL is a test-time scaling framework that enhances one-shot robot imitation learning by reframing trajectory generation as an iterative refinement process guided by Monte Carlo Tree Search, an automated retrieval archive, and a vision-language model-based scoring mechanism, thereby significantly improving success rates across diverse manipulation tasks.

Makoto Sato, Yusuke Iwasawa, Yujin Tang, So Kuroki2026-03-10💻 cs

SCL-GNN: Towards Generalizable Graph Neural Networks via Spurious Correlation Learning

The paper proposes SCL-GNN, a novel framework that enhances the generalization of Graph Neural Networks on both IID and OOD graphs by utilizing the Hilbert-Schmidt Independence Criterion to identify and mitigate spurious correlations through an efficient bi-level optimization strategy.

Yuxiang Zhang, Enyan Dai2026-03-10🤖 cs.LG

How Much Do LLMs Hallucinate in Document Q&A Scenarios? A 172-Billion-Token Study Across Temperatures, Context Lengths, and Hardware Platforms

This study utilizes a massive 172-billion-token evaluation across diverse models, context lengths, and hardware to reveal that while model selection is the primary determinant of accuracy, hallucination rates in document Q&A rise significantly with context length and vary non-linearly with temperature, highlighting that grounding ability and fabrication resistance are distinct capabilities.

JV Roig2026-03-10💬 cs.CL

AdaCultureSafe: Adaptive Cultural Safety Grounded by Cultural Knowledge in Large Language Models

The paper proposes AdaCultureSafe, a framework that addresses the lack of correlation between cultural safety and knowledge in Large Language Models by constructing a novel dataset of culturally grounded queries and introducing a knowledge-integrated method to significantly enhance adaptive cultural safety.

Hankun Kang, Di Lin, Zhirong Liao, Pengfei Bai, Xinyi Zeng, Jiawei Jiang, Yuanyuan Zhu, Tieyun Qian2026-03-10💬 cs.CL

TA-RNN-Medical-Hybrid: A Time-Aware and Interpretable Framework for Mortality Risk Prediction

The paper proposes TA-RNN-Medical-Hybrid, a time-aware and interpretable deep learning framework that integrates continuous-time encoding, SNOMED-based disease representations, and a hierarchical dual-level attention mechanism to accurately predict ICU mortality risk while providing clinically meaningful explanations.

Zahra Jafari, Azadeh Zamanifar, Amirfarhad Farhadi2026-03-10🤖 cs.LG

Evaluating LLM-Based Grant Proposal Review via Structured Perturbations

This paper evaluates LLM-based grant proposal reviews using structured perturbations on six quality axes, finding that a section-by-section analysis approach outperforms other architectures but that current models still struggle with clarity detection and holistic assessment, suggesting they are best suited as supplementary tools rather than replacements for human reviewers.

William Thorne, Joseph James, Yang Wang, Chenghua Lin, Diana Maynard2026-03-10💬 cs.CL

A Blockchain-based Traceability System for AI-Driven Engine Blade Inspection

This paper presents BladeChain, a blockchain-based system that integrates multi-stakeholder endorsement, automated scheduling, and AI model provenance to provide immutable, auditable traceability for aircraft engine blade inspections across the entire component life cycle.

Mahmoud Hafez, Eman Ouda, Mohammed A. Mohammed Eltoum, Khaled Salah, Yusra Abdulrahman2026-03-10💻 cs

Minor First, Major Last: A Depth-Induced Implicit Bias of Sharpness-Aware Minimization

This paper reveals that Sharpness-Aware Minimization (SAM) exhibits depth-dependent implicit biases in linear diagonal networks, where $\ell_\infty$ -SAM's convergence becomes initialization-sensitive and unstable at depth $L=2$ , while $\ell_2$ -SAM displays "sequential feature amplification" that prioritizes minor features early in training, demonstrating that infinite-time implicit bias analyses fail to capture SAM's critical finite-time dynamics.

Chaewon Moon, Dongkuk Si, Chulhee Yun2026-03-10🤖 cs.LG

Deconstructing Multimodal Mathematical Reasoning: Towards a Unified Perception-Alignment-Reasoning Paradigm

This paper systematically reviews recent advancements in Multimodal Mathematical Reasoning by proposing a unified Perception-Alignment-Reasoning paradigm, categorizing existing approaches around four fundamental questions regarding information extraction, representation, reasoning, and evaluation, while outlining future research challenges.

Tianyu Yang, Sihong Wu, Yilun Zhao, Zhenwen Liang, Lisen Dai, Chen Zhao, Minhao Cheng, Arman Cohan, Xiangliang Zhang2026-03-10💻 cs

Graph-Instructed Neural Networks for parametric problems with varying boundary conditions

This paper proposes Graph-Instructed Neural Networks (GINNs) as a robust and scalable alternative to classical reduced order methods for efficiently simulating parametric partial differential equations with varying boundary conditions by learning the direct mapping between domain descriptions and PDE solutions.

Francesco Della Santa, Sandra Pieraccini, Maria Strazzullo2026-03-10🤖 cs.LG

Retrieval-Augmented Anatomical Guidance for Text-to-CT Generation

This paper proposes a retrieval-augmented framework for text-to-CT generation that leverages a 3D vision-language encoder to retrieve semantically related clinical cases and their anatomical annotations as structural proxies, thereby enhancing image fidelity and spatial controllability in a realistic inference setting without requiring ground-truth annotations.

Daniele Molino, Camillo Maria Caruso, Paolo Soda, Valerio Guarrasi2026-03-10💻 cs

Concept-Guided Fine-Tuning: Steering ViTs away from Spurious Correlations to Improve Robustness

This paper introduces a concept-guided fine-tuning framework that enhances Vision Transformer robustness against distribution shifts by automatically generating and aligning model attention with fine-grained semantic concepts rather than spurious background correlations.

Yehonatan Elisha, Oren Barkan, Noam Koenigstein2026-03-10🤖 cs.LG

Human-AI Divergence in Ego-centric Action Recognition under Spatial and Spatiotemporal Manipulations

This paper presents a large-scale comparative study using the Epic ReduAct dataset and over 3,000 human participants to demonstrate that while humans rely on sparse, semantically critical cues for egocentric action recognition, state-of-the-art AI models degrade more gradually by depending on contextual and low-level features, revealing fundamental divergences in how humans and machines process spatial and spatiotemporal information.

Sadegh Rahmaniboldaji, Filip Rybansky, Quoc C. Vuong, Anya C. Hurlbert, Frank Guerin, Andrew Gilbert2026-03-10💻 cs

CORE-Acu: Structured Reasoning Traces and Knowledge Graph Safety Verification for Acupuncture Clinical Decision Support

CORE-Acu is a neuro-symbolic framework for acupuncture clinical decision support that integrates structured reasoning traces, a knowledge graph-based safety verification system, and a specialized loss function to ensure interpretable, hallucination-free, and strictly safe treatment recommendations, outperforming standard LLMs with zero observed safety violations.

Liuyi Xu, Yun Guo, Ming Chen, Zihan Dun, Yining Qian, An-Yang Lu, Shuang Li, Lijun Liu2026-03-10💻 cs

← Previous Next →