Adversarial Hubness Detector: Detecting Hubness Poisoning in Retrieval-Augmented Generation Systems

This paper introduces Hubscan, an open-source security scanner that utilizes a multi-detector architecture to identify and mitigate hubness poisoning attacks in Retrieval-Augmented Generation (RAG) systems, achieving high recall rates in detecting adversarial hubs across various vector databases and real-world benchmarks.

Idan Habler, Vineeth Sai Narajala, Stav Koren, Amy Chang, Tiffany Saade2026-03-12🤖 cs.AI

AMLRIS: Alignment-aware Masked Learning for Referring Image Segmentation

This paper proposes Alignment-Aware Masked Learning (AML), a training strategy that improves Referring Image Segmentation by quantifying pixel-level vision-language alignment to mask unreliable regions during optimization, thereby achieving state-of-the-art performance without architectural changes or inference overhead.

Tongfei Chen, Shuo Yang, Yuguang Yang, Linlin Yang, Runtang Guo, Changbai Li, He Long, Chunyu Xie, Dawei Leng, Baochang Zhang2026-03-12🤖 cs.AI

Defensive Refusal Bias: How Safety Alignment Fails Cyber Defenders

This paper identifies and quantifies "Defensive Refusal Bias," a safety alignment failure in large language models where legitimate cybersecurity defenders are disproportionately denied assistance for critical tasks due to the presence of security-sensitive keywords, a problem exacerbated by explicit authorization attempts and current reliance on semantic similarity rather than intent reasoning.

David Campbell, Neil Kale, Udari Madhushani Sehwag, Bert Herring, Nick Price, Dan Borges, Alex Levinson, Christina Q Knight2026-03-12🤖 cs.AI

CARE: Towards Clinical Accountability in Multi-Modal Medical Reasoning with an Evidence-Grounded Agentic Framework

The paper introduces CARE, an evidence-grounded agentic framework that enhances clinical accountability and reasoning accuracy in multi-modal medical AI by decomposing tasks into specialized modules for entity proposal, pixel-level localization, and evidence-based reasoning, thereby outperforming state-of-the-art models on medical VQA benchmarks.

Yuexi Du, Jinglu Wang, Shujie Liu, Nicha C. Dvornek, Yan Lu2026-03-12🤖 cs.AI

One Model, Many Skills: Parameter-Efficient Fine-Tuning for Multitask Code Analysis

This paper presents the first comprehensive evaluation of parameter-efficient fine-tuning (PEFT) for multitask code analysis, demonstrating that a single shared PEFT module can match or surpass full fine-tuning performance while significantly reducing computational and storage costs, provided that tasks are strategically grouped based on factors like complementarity and stability.

Amal Akli, Maxime Cordy, Mike Papadakis, Yves Le Traon2026-03-12💻 cs

Causally Grounded Mechanistic Interpretability for LLMs with Faithful Natural-Language Explanations

This paper presents a pipeline that bridges mechanistic interpretability and natural language explanations by identifying causally important attention heads in GPT-2 Small, generating high-quality explanations via LLMs, and evaluating their faithfulness to reveal that while explanations can be sufficient, they often lack comprehensiveness due to distributed backup mechanisms.

Ajay Pravin Mahale2026-03-12💬 cs.CL

The System Hallucination Scale (SHS): A Minimal yet Effective Human-Centered Instrument for Evaluating Hallucination-Related Behavior in Large Language Models

The paper introduces the System Hallucination Scale (SHS), a lightweight, human-centered psychometric instrument validated by 210 participants to rapidly evaluate Large Language Models' hallucination-related behaviors from a user perspective, distinct from automatic detection metrics.

Heimo Müller, Dominik Steiger, Markus Plass, Andreas Holzinger2026-03-12💬 cs.CL

PoultryLeX-Net: Domain-Adaptive Dual-Stream Transformer Architecture for Large-Scale Poultry Stakeholder Modeling

This paper introduces PoultryLeX-Net, a domain-adaptive dual-stream transformer framework enhanced with poultry-specific lexicons and topic modeling, which significantly outperforms existing baseline models in accurately analyzing stakeholder sentiment within the global poultry industry.

Stephen Afrifa, Biswash Khatiwada, Kapalik Khanal, Sanjay Shah, Lingjuan Wang-Li, Ramesh Bahadur Bist2026-03-12💬 cs.CL

TAMUSA-Chat: A Domain-Adapted Large Language Model Conversational System for Research and Responsible Deployment

This paper introduces TAMUSA-Chat, a research-oriented framework that enables academic institutions to build responsible, domain-adapted conversational AI systems through supervised fine-tuning, retrieval-augmented generation, and systematic evaluation, while providing a publicly available codebase to support reproducible experimentation and ethical deployment.

Izzat Alsmadi, Anas Alsobeh2026-03-12💬 cs.CL

CEI: A Benchmark for Evaluating Pragmatic Reasoning in Language Models

This paper introduces the Contextual Emotional Inference (CEI) Benchmark, a dataset of 300 human-validated scenarios designed to evaluate large language models' ability to infer intended meaning beyond literal semantics by navigating ambiguous utterances across diverse power dynamics and pragmatic subtypes.

Jon Chun, Hannah Sussman, Adrian Mangine, Murathan Kocaman, Kirill Sidorko, Abhigya Koirala, Andre McCloud, Gwen Eisenbeis, Wisdom Akanwe, Moustapha Gassama, Eliezer Gonzalez Chirinos, Anne-Duncan Enright, Peter Dunson, Tiffanie Ng, Anna von Rosenstiel, Godwin Idowu2026-03-12💬 cs.CL