cs.AI papers | Gist.Science

Towards a more efficient bias detection in financial language models

This paper proposes a cost-effective approach to detecting bias in financial language models by leveraging cross-model patterns to identify bias-revealing inputs early, demonstrating that up to 73% of a model's biased behaviors can be uncovered using only 20% of the input pairs when guided by another model's outputs.

Firas Hadj Kacem, Ahmed Khanfir, Mike Papadakis2026-03-10🤖 cs.LG

SAIL: Test-Time Scaling for In-Context Imitation Learning with VLM

SAIL is a test-time scaling framework that enhances one-shot robot imitation learning by reframing trajectory generation as an iterative refinement process guided by Monte Carlo Tree Search, an automated retrieval archive, and a vision-language model-based scoring mechanism, thereby significantly improving success rates across diverse manipulation tasks.

Makoto Sato, Yusuke Iwasawa, Yujin Tang, So Kuroki2026-03-10💻 cs

SCL-GNN: Towards Generalizable Graph Neural Networks via Spurious Correlation Learning

The paper proposes SCL-GNN, a novel framework that enhances the generalization of Graph Neural Networks on both IID and OOD graphs by utilizing the Hilbert-Schmidt Independence Criterion to identify and mitigate spurious correlations through an efficient bi-level optimization strategy.

Yuxiang Zhang, Enyan Dai2026-03-10🤖 cs.LG

How Much Do LLMs Hallucinate in Document Q&A Scenarios? A 172-Billion-Token Study Across Temperatures, Context Lengths, and Hardware Platforms

This study utilizes a massive 172-billion-token evaluation across diverse models, context lengths, and hardware to reveal that while model selection is the primary determinant of accuracy, hallucination rates in document Q&A rise significantly with context length and vary non-linearly with temperature, highlighting that grounding ability and fabrication resistance are distinct capabilities.

JV Roig2026-03-10💬 cs.CL

AdaCultureSafe: Adaptive Cultural Safety Grounded by Cultural Knowledge in Large Language Models

The paper proposes AdaCultureSafe, a framework that addresses the lack of correlation between cultural safety and knowledge in Large Language Models by constructing a novel dataset of culturally grounded queries and introducing a knowledge-integrated method to significantly enhance adaptive cultural safety.

Hankun Kang, Di Lin, Zhirong Liao, Pengfei Bai, Xinyi Zeng, Jiawei Jiang, Yuanyuan Zhu, Tieyun Qian2026-03-10💬 cs.CL

TA-RNN-Medical-Hybrid: A Time-Aware and Interpretable Framework for Mortality Risk Prediction

The paper proposes TA-RNN-Medical-Hybrid, a time-aware and interpretable deep learning framework that integrates continuous-time encoding, SNOMED-based disease representations, and a hierarchical dual-level attention mechanism to accurately predict ICU mortality risk while providing clinically meaningful explanations.

Zahra Jafari, Azadeh Zamanifar, Amirfarhad Farhadi2026-03-10🤖 cs.LG

Evaluating LLM-Based Grant Proposal Review via Structured Perturbations

This paper evaluates LLM-based grant proposal reviews using structured perturbations on six quality axes, finding that a section-by-section analysis approach outperforms other architectures but that current models still struggle with clarity detection and holistic assessment, suggesting they are best suited as supplementary tools rather than replacements for human reviewers.

William Thorne, Joseph James, Yang Wang, Chenghua Lin, Diana Maynard2026-03-10💬 cs.CL

A Blockchain-based Traceability System for AI-Driven Engine Blade Inspection

This paper presents BladeChain, a blockchain-based system that integrates multi-stakeholder endorsement, automated scheduling, and AI model provenance to provide immutable, auditable traceability for aircraft engine blade inspections across the entire component life cycle.

Mahmoud Hafez, Eman Ouda, Mohammed A. Mohammed Eltoum, Khaled Salah, Yusra Abdulrahman2026-03-10💻 cs

Minor First, Major Last: A Depth-Induced Implicit Bias of Sharpness-Aware Minimization

This paper reveals that Sharpness-Aware Minimization (SAM) exhibits depth-dependent implicit biases in linear diagonal networks, where $\ell_\infty$ -SAM's convergence becomes initialization-sensitive and unstable at depth $L=2$ , while $\ell_2$ -SAM displays "sequential feature amplification" that prioritizes minor features early in training, demonstrating that infinite-time implicit bias analyses fail to capture SAM's critical finite-time dynamics.

Chaewon Moon, Dongkuk Si, Chulhee Yun2026-03-10🤖 cs.LG

Deconstructing Multimodal Mathematical Reasoning: Towards a Unified Perception-Alignment-Reasoning Paradigm

This paper systematically reviews recent advancements in Multimodal Mathematical Reasoning by proposing a unified Perception-Alignment-Reasoning paradigm, categorizing existing approaches around four fundamental questions regarding information extraction, representation, reasoning, and evaluation, while outlining future research challenges.

Tianyu Yang, Sihong Wu, Yilun Zhao, Zhenwen Liang, Lisen Dai, Chen Zhao, Minhao Cheng, Arman Cohan, Xiangliang Zhang2026-03-10💻 cs

Graph-Instructed Neural Networks for parametric problems with varying boundary conditions

This paper proposes Graph-Instructed Neural Networks (GINNs) as a robust and scalable alternative to classical reduced order methods for efficiently simulating parametric partial differential equations with varying boundary conditions by learning the direct mapping between domain descriptions and PDE solutions.

Francesco Della Santa, Sandra Pieraccini, Maria Strazzullo2026-03-10🤖 cs.LG

Retrieval-Augmented Anatomical Guidance for Text-to-CT Generation

This paper proposes a retrieval-augmented framework for text-to-CT generation that leverages a 3D vision-language encoder to retrieve semantically related clinical cases and their anatomical annotations as structural proxies, thereby enhancing image fidelity and spatial controllability in a realistic inference setting without requiring ground-truth annotations.

Daniele Molino, Camillo Maria Caruso, Paolo Soda, Valerio Guarrasi2026-03-10💻 cs

Concept-Guided Fine-Tuning: Steering ViTs away from Spurious Correlations to Improve Robustness

This paper introduces a concept-guided fine-tuning framework that enhances Vision Transformer robustness against distribution shifts by automatically generating and aligning model attention with fine-grained semantic concepts rather than spurious background correlations.

Yehonatan Elisha, Oren Barkan, Noam Koenigstein2026-03-10🤖 cs.LG

Human-AI Divergence in Ego-centric Action Recognition under Spatial and Spatiotemporal Manipulations

This paper presents a large-scale comparative study using the Epic ReduAct dataset and over 3,000 human participants to demonstrate that while humans rely on sparse, semantically critical cues for egocentric action recognition, state-of-the-art AI models degrade more gradually by depending on contextual and low-level features, revealing fundamental divergences in how humans and machines process spatial and spatiotemporal information.

Sadegh Rahmaniboldaji, Filip Rybansky, Quoc C. Vuong, Anya C. Hurlbert, Frank Guerin, Andrew Gilbert2026-03-10💻 cs

CORE-Acu: Structured Reasoning Traces and Knowledge Graph Safety Verification for Acupuncture Clinical Decision Support

CORE-Acu is a neuro-symbolic framework for acupuncture clinical decision support that integrates structured reasoning traces, a knowledge graph-based safety verification system, and a specialized loss function to ensure interpretable, hallucination-free, and strictly safe treatment recommendations, outperforming standard LLMs with zero observed safety violations.

Liuyi Xu, Yun Guo, Ming Chen, Zihan Dun, Yining Qian, An-Yang Lu, Shuang Li, Lijun Liu2026-03-10💻 cs

Agentic Neurosymbolic Collaboration for Mathematical Discovery: A Case Study in Combinatorial Design

This paper presents a neurosymbolic collaboration between an LLM-powered agent, symbolic computation tools, and human researchers that successfully discovered and formally verified a new tight lower bound on the imbalance of Latin squares for the case $n \equiv 1 \pmod{3}$ , demonstrating the potential of AI-human partnerships in pure mathematical discovery.

Hai Xia, Carla P. Gomes, Bart Selman, Stefan Szeider2026-03-10🔢 math

EndoSERV: A Vision-based Endoluminal Robot Navigation System

EndoSERV is a novel vision-based navigation system for endoluminal robots that overcomes challenges like tissue deformation and label scarcity by combining segment-to-structure odometry with real-to-virtual transfer learning to achieve accurate localization without requiring real-world pose labels.

Junyang Wu, Fangfang Xie, Minghui Zhang, Hanxiao Zhang, Jiayuan Sun, Yun Gu, Guang-Zhong Yang2026-03-10💻 cs

SPD-RAG: Sub-Agent Per Document Retrieval-Augmented Generation

SPD-RAG is a hierarchical multi-agent framework that improves scalability and answer quality for complex cross-document queries by assigning dedicated agents to process individual documents and synthesizing their outputs through a token-bounded coordinator, achieving superior performance on the LOONG benchmark with significantly reduced API costs compared to standard RAG and full-context baselines.

Yagiz Can Akay, Muhammed Yusuf Kartal, Esra Alparslan, Faruk Ortakoyluoglu, Arda Akpinar2026-03-10💬 cs.CL

Detecting Fake Reviewer Groups in Dynamic Networks: An Adaptive Graph Learning Method

The paper proposes DS-DGA-GCN, an adaptive graph learning model that integrates diversity- and similarity-aware dynamic graph attention with a Network Feature Scoring system to effectively detect organized fake reviewer groups in dynamic networks, achieving state-of-the-art performance on real-world datasets.

Jing Zhang, Ke Huang, Yao Zhang, Bin Guo, Zhiwen Yu2026-03-10💻 cs

Electrocardiogram Classification with Transformers Using Koopman and Wavelet Features

This paper demonstrates that while wavelet features excel in binary ECG classification, a transformer-based model utilizing Koopman operator features derived from an optimized Extended Dynamic Mode Decomposition (EDMD) with a radial basis function dictionary achieves superior performance in multi-class ECG classification, outperforming both wavelet-only and hybrid approaches.

Sucheta Ghosh, Zahra Monfared2026-03-10🤖 cs.LG

← Previous Next →