cs.AI papers | Gist.Science

Evaluating Long-Horizon Memory for Multi-Party Collaborative Dialogues

This paper introduces EverMemBench, the first benchmark designed to evaluate long-horizon memory in multi-party collaborative dialogues, revealing that current LLM systems struggle with multi-hop reasoning, temporal versioning, and implicit relevance retrieval in realistic, complex interaction scenarios.

Chuanrui Hu, Tong Li, Xingze Gao, Hongda Chen, Yi Bai, Dannong Xu, Tianwei Lin, Xiaohong Li, Yunyun Han, Jian Pei, Yafeng Deng2026-03-12💬 cs.CL

Moving On, Even When You're Broken: Fail-Active Trajectory Generation via Diffusion Policies Conditioned on Embodiment and Task

This paper introduces DEFT, a diffusion-based trajectory generator that enables robots to achieve fail-active operation by successfully completing tasks under arbitrary actuation failures, outperforming classical methods in both simulation and real-world scenarios while demonstrating robust zero-shot generalization.

Gilberto G. Briscoe-Martinez, Yaashia Gautam, Rahul Shetty, Anuj Pasricha, Marco M. Nicotra, Alessandro Roncone2026-03-12🤖 cs.AI

DMS2F-HAD: A Dual-branch Mamba-based Spatial-Spectral Fusion Network for Hyperspectral Anomaly Detection

The paper proposes DMS2F-HAD, a dual-branch Mamba-based network that efficiently fuses spatial and spectral features to achieve state-of-the-art accuracy and significantly faster inference speeds for hyperspectral anomaly detection across multiple benchmark datasets.

Aayushma Pant, Lakpa Tamang, Tsz-Kwan Lee + 1 more2026-03-12🤖 cs.AI

Long Chain-of-Thought Compression via Fine-Grained Group Policy Optimization

This paper introduces Fine-grained Group Policy Optimization (FGO), a reinforcement learning algorithm that effectively compresses verbose Chain-of-Thought reasoning in Large Language Models while simultaneously addressing the data inefficiency and entropy collapse limitations of Group Relative Policy Optimization (GRPO).

Xinchen Han, Hossam Afifi, Michel Marot, Xilu Wang, Lu Yin2026-03-12🤖 cs.LG

UniWeTok: An Unified Binary Tokenizer with Codebook Size $\mathit{2^{128}}$ for Unified Multimodal Large Language Model

UniWeTok is a unified binary tokenizer featuring a massive $2^{128}$ codebook, a convolution-attention hybrid architecture with SigLu activation, and a novel three-stage training framework that achieves state-of-the-art performance in image generation and multimodal understanding with significantly lower computational costs than existing models.

Shaobin Zhuang, Yuang Ai, Jiaming Han, Weijia Mao, Xiaohui Li, Fangyikang Wang, Xiao Wang, Yan Li, Shanchuan Lin, Kun Xu, Zhenheng Yang, Huaibo Huang, Xiangyu Yue, Hao Chen, Yali Wang2026-03-12🤖 cs.AI

TikArt: Stabilizing Aperture-Guided Fine-Grained Visual Reasoning with Reinforcement Learning

TikArt is a reinforcement learning-trained multimodal agent that stabilizes fine-grained visual reasoning by employing a Think-Aperture-Observe loop to sequentially acquire and linguistically encode evidence through zooming and segmentation, thereby overcoming the limitations of single-pass global image encoding.

Hao Ding, Zhichuan Yang, Weijie Ge, Ziqin Gao, Chaoyi Lu, Lei Zhao2026-03-12🤖 cs.AI

GOT-JEPA: Generic Object Tracking with Model Adaptation and Occlusion Handling using Joint-Embedding Predictive Architecture

The paper proposes GOT-JEPA, a model-predictive pretraining framework that learns to predict robust tracking models from corrupted observations to improve generalization, and introduces OccuSolver to enhance occlusion handling through iterative, object-aware visibility estimation.

Shih-Fang Chen, Jun-Cheng Chen, I-Hong Jhuo, Yen-Yu Lin2026-03-12🤖 cs.AI

Many AI Analysts, One Dataset: Navigating the Agentic Data Science Multiverse

This paper demonstrates that fully autonomous AI analysts can cheaply replicate the analytic diversity and conflicting conclusions observed in human many-analyst studies, revealing that empirical results are highly sensitive to analytic choices and prompting a new transparency norm requiring multiverse-style reporting and full prompt disclosure for AI-generated science.

Martin Bertran, Riccardo Fogliato, Zhiwei Steven Wu2026-03-12🤖 cs.AI

No Need For Real Anomaly: MLLM Empowered Zero-Shot Video Anomaly Detection

This paper introduces LAVIDA, a zero-shot video anomaly detection framework that leverages a Multimodal Large Language Model and an Anomaly Exposure Sampler to train exclusively on pseudo-anomalies, achieving state-of-the-art performance across multiple benchmarks without requiring real anomaly data.

Zunkai Dai, Ke Li, Jiajia Liu, Jie Yang, Yuanyuan Qiao2026-03-12🤖 cs.AI

PatchDenoiser: Parameter-efficient multi-scale patch learning and fusion denoiser for Low-dose CT imaging

PatchDenoiser is a lightweight, parameter-efficient multi-scale patch-based framework that effectively denoises low-dose CT images by balancing noise suppression with anatomical detail preservation, outperforming state-of-the-art CNN and GAN methods while significantly reducing computational costs and energy consumption.

Jitindra Fartiyal, Pedro Freire, Sergei K. Turitsyn, Sergei G. Solovski2026-03-12🤖 cs.AI

Adversarial Hubness Detector: Detecting Hubness Poisoning in Retrieval-Augmented Generation Systems

This paper introduces Hubscan, an open-source security scanner that utilizes a multi-detector architecture to identify and mitigate hubness poisoning attacks in Retrieval-Augmented Generation (RAG) systems, achieving high recall rates in detecting adversarial hubs across various vector databases and real-world benchmarks.

Idan Habler, Vineeth Sai Narajala, Stav Koren, Amy Chang, Tiffany Saade2026-03-12🤖 cs.AI

AMLRIS: Alignment-aware Masked Learning for Referring Image Segmentation

This paper proposes Alignment-Aware Masked Learning (AML), a training strategy that improves Referring Image Segmentation by quantifying pixel-level vision-language alignment to mask unreliable regions during optimization, thereby achieving state-of-the-art performance without architectural changes or inference overhead.

Tongfei Chen, Shuo Yang, Yuguang Yang, Linlin Yang, Runtang Guo, Changbai Li, He Long, Chunyu Xie, Dawei Leng, Baochang Zhang2026-03-12🤖 cs.AI

A Minimal Agent for Automated Theorem Proving

The paper introduces a minimal, open-source agentic baseline for automated theorem proving that achieves competitive performance against state-of-the-art systems through iterative refinement and library search, demonstrating superior sample efficiency and cost-effectiveness while serving as a reference for future research.

Borja Requena, Austin Letson, Krystian Nowakowski, Izan Beltran Ferreiro, Leopoldo Sarra2026-03-12🤖 cs.AI

Defensive Refusal Bias: How Safety Alignment Fails Cyber Defenders

This paper identifies and quantifies "Defensive Refusal Bias," a safety alignment failure in large language models where legitimate cybersecurity defenders are disproportionately denied assistance for critical tasks due to the presence of security-sensitive keywords, a problem exacerbated by explicit authorization attempts and current reliance on semantic similarity rather than intent reasoning.

David Campbell, Neil Kale, Udari Madhushani Sehwag, Bert Herring, Nick Price, Dan Borges, Alex Levinson, Christina Q Knight2026-03-12🤖 cs.AI

CARE: Towards Clinical Accountability in Multi-Modal Medical Reasoning with an Evidence-Grounded Agentic Framework

The paper introduces CARE, an evidence-grounded agentic framework that enhances clinical accountability and reasoning accuracy in multi-modal medical AI by decomposing tasks into specialized modules for entity proposal, pixel-level localization, and evidence-based reasoning, thereby outperforming state-of-the-art models on medical VQA benchmarks.

Yuexi Du, Jinglu Wang, Shujie Liu, Nicha C. Dvornek, Yan Lu2026-03-12🤖 cs.AI

SEED-SET: Scalable Evolving Experimental Design for System-level Ethical Testing

The paper proposes SEED-SET, a Bayesian experimental design framework that combines objective evaluations and subjective stakeholder preferences via hierarchical Gaussian Processes to efficiently and interpretably benchmark the ethical alignment of autonomous systems in high-stakes domains.

Anjali Parashar, Yingke Li, Eric Yang Yu, Fei Chen, James Neidhoefer, Devesh Upadhyay, Chuchu Fan2026-03-12📊 stat

BrandFusion: A Multi-Agent Framework for Seamless Brand Integration in Text-to-Video Generation

The paper introduces BrandFusion, a novel multi-agent framework that enables seamless brand integration in text-to-video generation by combining an offline brand knowledge base with an online iterative refinement process to balance semantic fidelity, brand recognizability, and contextual naturalness.

Zihao Zhu, Ruotong Wang, Siwei Lyu, Min Zhang, Baoyuan Wu2026-03-12🤖 cs.AI

One Model, Many Skills: Parameter-Efficient Fine-Tuning for Multitask Code Analysis

This paper presents the first comprehensive evaluation of parameter-efficient fine-tuning (PEFT) for multitask code analysis, demonstrating that a single shared PEFT module can match or surpass full fine-tuning performance while significantly reducing computational and storage costs, provided that tasks are strategically grouped based on factors like complementarity and stability.

Amal Akli, Maxime Cordy, Mike Papadakis, Yves Le Traon2026-03-12💻 cs

Explainable LLM Unlearning Through Reasoning

This paper proposes Targeted Reasoning Unlearning (TRU), a novel framework that utilizes a reasoning-based unlearning target to guide models in precisely removing specific undesirable knowledge while preserving general capabilities and enhancing robustness against attacks.

Junfeng Liao, Qizhou Wang, Shanshan Ye, Xin Yu, Ling Chen, Zhen Fang2026-03-12🤖 cs.LG

AraModernBERT: Transtokenized Initialization and Long-Context Encoder Modeling for Arabic

This paper introduces AraModernBERT, an Arabic adaptation of the ModernBERT encoder that leverages transtokenized initialization and native long-context modeling up to 8,192 tokens to achieve significant improvements in both language modeling and downstream discriminative tasks.

Omar Elshehy, Omer Nacar, Abdelbasset Djamai, Muhammed Ragab, Khloud Al Jallad, Mona Abdelazim2026-03-12💬 cs.CL

← Previous Next →

cs.AI