cs.AI papers | Gist.Science

TikArt: Stabilizing Aperture-Guided Fine-Grained Visual Reasoning with Reinforcement Learning

TikArt is a reinforcement learning-trained multimodal agent that stabilizes fine-grained visual reasoning by employing a Think-Aperture-Observe loop to sequentially acquire and linguistically encode evidence through zooming and segmentation, thereby overcoming the limitations of single-pass global image encoding.

Hao Ding, Zhichuan Yang, Weijie Ge, Ziqin Gao, Chaoyi Lu, Lei Zhao2026-03-12🤖 cs.AI

GOT-JEPA: Generic Object Tracking with Model Adaptation and Occlusion Handling using Joint-Embedding Predictive Architecture

The paper proposes GOT-JEPA, a model-predictive pretraining framework that learns to predict robust tracking models from corrupted observations to improve generalization, and introduces OccuSolver to enhance occlusion handling through iterative, object-aware visibility estimation.

Shih-Fang Chen, Jun-Cheng Chen, I-Hong Jhuo, Yen-Yu Lin2026-03-12🤖 cs.AI

Many AI Analysts, One Dataset: Navigating the Agentic Data Science Multiverse

This paper demonstrates that fully autonomous AI analysts can cheaply replicate the analytic diversity and conflicting conclusions observed in human many-analyst studies, revealing that empirical results are highly sensitive to analytic choices and prompting a new transparency norm requiring multiverse-style reporting and full prompt disclosure for AI-generated science.

Martin Bertran, Riccardo Fogliato, Zhiwei Steven Wu2026-03-12🤖 cs.AI

No Need For Real Anomaly: MLLM Empowered Zero-Shot Video Anomaly Detection

This paper introduces LAVIDA, a zero-shot video anomaly detection framework that leverages a Multimodal Large Language Model and an Anomaly Exposure Sampler to train exclusively on pseudo-anomalies, achieving state-of-the-art performance across multiple benchmarks without requiring real anomaly data.

Zunkai Dai, Ke Li, Jiajia Liu, Jie Yang, Yuanyuan Qiao2026-03-12🤖 cs.AI

PatchDenoiser: Parameter-efficient multi-scale patch learning and fusion denoiser for Low-dose CT imaging

PatchDenoiser is a lightweight, parameter-efficient multi-scale patch-based framework that effectively denoises low-dose CT images by balancing noise suppression with anatomical detail preservation, outperforming state-of-the-art CNN and GAN methods while significantly reducing computational costs and energy consumption.

Jitindra Fartiyal, Pedro Freire, Sergei K. Turitsyn, Sergei G. Solovski2026-03-12🤖 cs.AI

Adversarial Hubness Detector: Detecting Hubness Poisoning in Retrieval-Augmented Generation Systems

This paper introduces Hubscan, an open-source security scanner that utilizes a multi-detector architecture to identify and mitigate hubness poisoning attacks in Retrieval-Augmented Generation (RAG) systems, achieving high recall rates in detecting adversarial hubs across various vector databases and real-world benchmarks.

Idan Habler, Vineeth Sai Narajala, Stav Koren, Amy Chang, Tiffany Saade2026-03-12🤖 cs.AI

AMLRIS: Alignment-aware Masked Learning for Referring Image Segmentation

This paper proposes Alignment-Aware Masked Learning (AML), a training strategy that improves Referring Image Segmentation by quantifying pixel-level vision-language alignment to mask unreliable regions during optimization, thereby achieving state-of-the-art performance without architectural changes or inference overhead.

Tongfei Chen, Shuo Yang, Yuguang Yang, Linlin Yang, Runtang Guo, Changbai Li, He Long, Chunyu Xie, Dawei Leng, Baochang Zhang2026-03-12🤖 cs.AI

A Minimal Agent for Automated Theorem Proving

The paper introduces a minimal, open-source agentic baseline for automated theorem proving that achieves competitive performance against state-of-the-art systems through iterative refinement and library search, demonstrating superior sample efficiency and cost-effectiveness while serving as a reference for future research.

Borja Requena, Austin Letson, Krystian Nowakowski, Izan Beltran Ferreiro, Leopoldo Sarra2026-03-12🤖 cs.AI

Defensive Refusal Bias: How Safety Alignment Fails Cyber Defenders

This paper identifies and quantifies "Defensive Refusal Bias," a safety alignment failure in large language models where legitimate cybersecurity defenders are disproportionately denied assistance for critical tasks due to the presence of security-sensitive keywords, a problem exacerbated by explicit authorization attempts and current reliance on semantic similarity rather than intent reasoning.

David Campbell, Neil Kale, Udari Madhushani Sehwag, Bert Herring, Nick Price, Dan Borges, Alex Levinson, Christina Q Knight2026-03-12🤖 cs.AI

CARE: Towards Clinical Accountability in Multi-Modal Medical Reasoning with an Evidence-Grounded Agentic Framework

The paper introduces CARE, an evidence-grounded agentic framework that enhances clinical accountability and reasoning accuracy in multi-modal medical AI by decomposing tasks into specialized modules for entity proposal, pixel-level localization, and evidence-based reasoning, thereby outperforming state-of-the-art models on medical VQA benchmarks.

Yuexi Du, Jinglu Wang, Shujie Liu, Nicha C. Dvornek, Yan Lu2026-03-12🤖 cs.AI

SEED-SET: Scalable Evolving Experimental Design for System-level Ethical Testing

The paper proposes SEED-SET, a Bayesian experimental design framework that combines objective evaluations and subjective stakeholder preferences via hierarchical Gaussian Processes to efficiently and interpretably benchmark the ethical alignment of autonomous systems in high-stakes domains.

Anjali Parashar, Yingke Li, Eric Yang Yu, Fei Chen, James Neidhoefer, Devesh Upadhyay, Chuchu Fan2026-03-12📊 stat

BrandFusion: A Multi-Agent Framework for Seamless Brand Integration in Text-to-Video Generation

The paper introduces BrandFusion, a novel multi-agent framework that enables seamless brand integration in text-to-video generation by combining an offline brand knowledge base with an online iterative refinement process to balance semantic fidelity, brand recognizability, and contextual naturalness.

Zihao Zhu, Ruotong Wang, Siwei Lyu, Min Zhang, Baoyuan Wu2026-03-12🤖 cs.AI

One Model, Many Skills: Parameter-Efficient Fine-Tuning for Multitask Code Analysis

This paper presents the first comprehensive evaluation of parameter-efficient fine-tuning (PEFT) for multitask code analysis, demonstrating that a single shared PEFT module can match or surpass full fine-tuning performance while significantly reducing computational and storage costs, provided that tasks are strategically grouped based on factors like complementarity and stability.

Amal Akli, Maxime Cordy, Mike Papadakis, Yves Le Traon2026-03-12💻 cs

Explainable LLM Unlearning Through Reasoning

This paper proposes Targeted Reasoning Unlearning (TRU), a novel framework that utilizes a reasoning-based unlearning target to guide models in precisely removing specific undesirable knowledge while preserving general capabilities and enhancing robustness against attacks.

Junfeng Liao, Qizhou Wang, Shanshan Ye, Xin Yu, Ling Chen, Zhen Fang2026-03-12🤖 cs.LG

AraModernBERT: Transtokenized Initialization and Long-Context Encoder Modeling for Arabic

This paper introduces AraModernBERT, an Arabic adaptation of the ModernBERT encoder that leverages transtokenized initialization and native long-context modeling up to 8,192 tokens to achieve significant improvements in both language modeling and downstream discriminative tasks.

Omar Elshehy, Omer Nacar, Abdelbasset Djamai, Muhammed Ragab, Khloud Al Jallad, Mona Abdelazim2026-03-12💬 cs.CL

MoE-SpAc: Efficient MoE Inference Based on Speculative Activation Utility in Heterogeneous Edge Scenarios

MoE-SpAc is an efficient inference framework for Mixture-of-Experts models on heterogeneous edge devices that repurposes speculative decoding as a predictive sensor for memory management, achieving significant throughput improvements through dynamic workload balancing and asynchronous execution.

Shuhuai Li, Jianghao Lin, Dongdong Ge, Yinyu Ye2026-03-12🤖 cs.LG

The Dunning-Kruger Effect in Large Language Models: An Empirical Study of Confidence Calibration

This empirical study demonstrates that large language models exhibit a Dunning-Kruger-like cognitive bias, where poorly performing models display significantly higher overconfidence and worse calibration than their more accurate counterparts.

Sudipta Ghosh, Mrityunjoy Panday2026-03-12💬 cs.CL

Quantifying Hallucinations in Language Language Models on Medical Textbooks

This paper quantifies hallucinations in medical textbook-based QA by demonstrating that LLaMA-70B-Instruct hallucinated in nearly 20% of answers despite high plausibility, and found that lower hallucination rates generally correlate with higher clinician-rated usefulness across models.

Brandon C. Colelough, Davis Bartels, Dina Demner-Fushman2026-03-12💬 cs.CL

Evolving Demonstration Optimization for Chain-of-Thought Feature Transformation

This paper proposes a closed-loop framework that optimizes Large Language Model-driven Feature Transformation by evolving and selecting diverse, task-verified transformation trajectories via chain-of-thought reasoning, thereby outperforming existing methods in generating effective feature operators for downstream predictive tasks.

Xinyuan Wang, Kunpeng Liu, Arun Vignesh Malarkkan, Yanjie Fu2026-03-12💬 cs.CL

Causally Grounded Mechanistic Interpretability for LLMs with Faithful Natural-Language Explanations

This paper presents a pipeline that bridges mechanistic interpretability and natural language explanations by identifying causally important attention heads in GPT-2 Small, generating high-quality explanations via LLMs, and evaluating their faithfulness to reveal that while explanations can be sufficient, they often lack comprehensiveness due to distributed backup mechanisms.

Ajay Pravin Mahale2026-03-12💬 cs.CL

← Previous Next →