cs.AI papers | Gist.Science

Gender Fairness in Audio Deepfake Detection: Performance and Disparity Analysis

This paper analyzes gender bias in audio deepfake detection using the ASVspoof 5 dataset and a ResNet-18 classifier, demonstrating that while aggregate metrics like Equal Error Rate may suggest low disparity, fairness-aware evaluation reveals significant gender-specific error distributions that necessitate more equitable and robust detection systems.

Aishwarya Fursule, Shruti Kshirsagar, Anderson R. Avila2026-03-11🤖 cs.AI

Improving through Interaction: Searching Behavioral Representation Spaces with CMA-ES-IG

This paper introduces CMA-ES-IG, an algorithm that enhances robot preference learning by generating perceptually distinct and informative queries, thereby improving scalability, robustness, and user experience compared to existing state-of-the-art methods.

Nathaniel Dennler, Zhonghao Shi, Yiran Tao, Andreea Bobu, Stefanos Nikolaidis, Maja Mataric2026-03-11🤖 cs.AI

Meissa: Multi-modal Medical Agentic Intelligence

Meissa is a lightweight, 4B-parameter offline medical multi-modal agent that achieves state-of-the-art performance across diverse clinical benchmarks by employing novel trajectory modeling and stratified supervision to distill frontier model capabilities, thereby offering a cost-effective, low-latency, and privacy-preserving alternative to API-dependent systems.

Yixiong Chen, Xinyi Bai, Yue Pan, Zongwei Zhou, Alan Yuille2026-03-11🤖 cs.AI

AI Phenomenology for Understanding Human-AI Experiences Across Eras

This paper proposes "AI phenomenology" as a research framework that prioritizes users' first-person lived experiences over traditional performance metrics to better understand and guide the bidirectional alignment between humans and AI systems, offering a set of methodological tools, design concepts, and a research agenda derived from three empirical studies.

Bhada Yun, Evgenia Taranova, Dana Feng, Renn Su, April Yi Wang2026-03-11🤖 cs.AI

MEMO: Memory-Augmented Model Context Optimization for Robust Multi-Turn Multi-Agent LLM Games

The paper introduces MEMO, a memory-augmented self-play framework that optimizes inference-time context through structured memory retention and uncertainty-aware prompt exploration, significantly improving the win rates and run-to-run stability of multi-agent LLMs in long-horizon, imperfect-information games.

Yunfei Xie, Kevin Wang, Bobby Cheng, Jianzhu Yao, Zhizhou Sha, Alexander Duffy, Yihan Xi, Hongyuan Mei, Cheston Tan, Chen Wei, Pramod Viswanath, Zhangyang Wang2026-03-11🤖 cs.AI

The Missing Memory Hierarchy: Demand Paging for LLM Context Windows

This paper introduces Pichay, a demand paging system that treats LLM context windows as a memory hierarchy rather than a static cache, successfully reducing context consumption by up to 93% in production by evicting stale content and dynamically reloading it only when needed.

Tony Mason2026-03-11🤖 cs.AI

Automating Detection and Root-Cause Analysis of Flaky Tests in Quantum Software

This paper presents an automated pipeline leveraging Large Language Models to detect and diagnose flaky tests in quantum software, successfully expanding an existing dataset by 54% and demonstrating that models like Google Gemini can achieve high accuracy (F1-scores up to 0.9643) in classifying flakiness and identifying root causes.

Janakan Sivaloganathan, Ainaz Jamshidi, Andriy Miranskyy, Lei Zhang2026-03-11🤖 cs.AI

PlayWorld: Learning Robot World Models from Autonomous Play

PlayWorld introduces a fully autonomous pipeline that trains high-fidelity, physically consistent video world models from unsupervised robot self-play, outperforming human-collected data in predicting complex interactions and significantly boosting real-world reinforcement learning success rates.

Tenny Yin, Zhiting Mei, Zhonghe Zheng, Miyu Yamane, David Wang, Jade Sceats, Samuel M. Bateman, Lihan Zha, Apurva Badithela, Ola Shorinwa, Anirudha Majumdar2026-03-11🤖 cs.AI

WS-Net: Weak-Signal Representation Learning and Gated Abundance Reconstruction for Hyperspectral Unmixing via State-Space and Weak Signal Attention Fusion

This paper introduces WS-Net, a deep unmixing framework that combines state-space modeling, wavelet-fused encoding, and a specialized weak signal attention mechanism to effectively recover weak spectral signals and significantly improve abundance estimation accuracy in hyperspectral images under low signal-to-noise conditions.

Zekun Long, Ali Zia, Guanyiman Fu, Vivien Rolland, Jun Zhou2026-03-11🤖 cs.AI

Time, Identity and Consciousness in Language Model Agents

This paper proposes a conservative toolkit for evaluating language model agent identity by applying Stack Theory's temporal gap to separate mere behavioral consistency from genuine structural organization, yielding persistence scores that distinguish between agents that merely talk like a stable self and those actually organized as one.

Elija Perrier, Michael Timothy Bennett2026-03-11🤖 cs.AI

EPOCH: An Agentic Protocol for Multi-Round System Optimization

The paper introduces EPOCH, a unified engineering protocol that structures multi-round autonomous system optimization into baseline construction and iterative self-improvement phases with role-constrained stages to ensure stability, reproducibility, and traceability across heterogeneous environments.

Zhanlin Liu, Yitao Li, Munirathnam Srikanth2026-03-11🤖 cs.AI

From Days to Minutes: An Autonomous AI Agent Achieves Reliable Clinical Triage in Remote Patient Monitoring

The paper introduces Sentinel, an autonomous AI agent that achieves reliable, scalable clinical triage for remote patient monitoring by outperforming individual clinicians in sensitivity and consistency while maintaining a clinically defensible overtriage profile at a negligible cost.

Sim2Act: Robust Simulation-to-Decision Learning via Adversarial Calibration and Group-Relative Perturbation

The paper proposes Sim2Act, a robust simulation-to-decision framework that enhances policy reliability in mission-critical domains by combining an adversarial calibration mechanism to align simulation fidelity with decision impact and a group-relative perturbation strategy to stabilize learning without overly conservative constraints.

Hongyu Cao, Jinghan Zhang, Kunpeng Liu, Dongjie Wang, Feng Xia, Haifeng Chen, Xiaohua Hu, Yanjie Fu2026-03-11🤖 cs.AI

A Text-Native Interface for Generative Video Authoring

This paper introduces Doki, a text-native interface that enables users of varying expertise to author generative videos by defining assets, scenes, and edits directly within a freeform text document, thereby shifting video creation from specialized tools to a natural writing process.

Xingyu Bruce Liu, Mira Dontcheva, Dingzeyu Li2026-03-11🤖 cs.AI

GST-VLA: Structured Gaussian Spatial Tokens for 3D Depth-Aware Vision-Language-Action Models

GST-VLA introduces a novel framework that enhances Vision-Language-Action models by converting visual observations into anisotropic 3D Gaussian spatial tokens and employing 3D Depth-Aware Chain-of-Thought reasoning to achieve state-of-the-art performance on precision-demanding robotic manipulation tasks.

Md Selim Sarowar, Omer Tariq, Sungho Kim2026-03-11🤖 cs.AI

Not All News Is Equal: Topic- and Event-Conditional Sentiment from Finetuned LLMs for Aluminum Price Forecasting

This study demonstrates that integrating sentiment scores derived from a finetuned Qwen3 model analyzing English and Chinese news significantly enhances aluminum price forecasting accuracy and economic utility, particularly during periods of high market volatility, compared to traditional tabular data models.

Alvaro Paredes Amorin, Andre Python, Christoph Weisser2026-03-11🤖 cs.AI

Latent World Models for Automated Driving: A Unified Taxonomy, Evaluation Framework, and Open Challenges

This paper proposes a unified taxonomy and evaluation framework for latent world models in automated driving, organizing design choices by latent representations and structural priors while identifying key internal mechanics and research directions to enhance robustness, generalization, and deployability.

Rongxiang Zeng, Yongqi Dong2026-03-11🤖 cs.AI

Composed Vision-Language Retrieval for Skin Cancer Case Search via Joint Alignment of Global and Local Representations

This paper proposes a transformer-based framework for skin cancer case retrieval that effectively combines reference images and textual descriptors by learning hierarchical representations and performing joint global-local alignment, thereby achieving state-of-the-art performance on the Derm7pt dataset to support clinical decision-making.

Yuheng Wang, Yuji Lin, Dongrun Zhu, Jiayue Cai, Sunil Kalia, Harvey Lui, Chunqi Chang, Z. Jane Wang, Tim K. Lee2026-03-11🤖 cs.AI

VIVID-Med: LLM-Supervised Structured Pretraining for Deployable Medical ViTs

VIVID-Med introduces a novel framework that leverages a frozen large language model as a structured semantic teacher to pretrain lightweight, deployable medical Vision Transformers via a Unified Medical Schema and Structured Prediction Decomposition, achieving state-of-the-art performance across diverse medical imaging tasks with significantly reduced data requirements compared to existing vision-language models.

Xiyao Wang, Xiaoyu Tan, Yang Dai, Yuxuan Fu, Shuo Li, Xihe Qiu2026-03-11🤖 cs.AI

PM-Nav: Priori-Map Guided Embodied Navigation in Functional Buildings

The paper introduces PM-Nav, a novel framework that leverages priori-semantic maps and hierarchical chain-of-thought prompting to overcome the challenges of language-driven navigation in functional buildings with highly similar features, achieving substantial performance improvements over existing methods in both simulation and real-world environments.

Jiang Gao, Xiangyu Dong, Haozhou Li, Haoran Zhao, Yaoming Zhou, Xiaoguang Ma2026-03-11🤖 cs.AI

← Previous Next →