cs.AI papers | Gist.Science

Interpretable Markov-Based Spatiotemporal Risk Surfaces for Missing-Child Search Planning with Reinforcement Learning and LLM-Based Quality Assurance

The paper presents "Guardian," an interpretable, three-layer decision-support system that combines Markov chains, reinforcement learning, and LLM-based validation to generate dynamic, probabilistic search plans for missing-child investigations within the critical first 72 hours.

Joshua Castillo, Ravi Mukkamala2026-03-11🤖 cs.AI

PathoScribe: Transforming Pathology Data into a Living Library with a Unified LLM-Driven Framework for Semantic Retrieval and Clinical Integration

PathoScribe is a unified retrieval-augmented large language model framework that transforms static pathology archives into an active, reasoning-enabled clinical intelligence platform, enabling natural language case retrieval, automated cohort construction, and real-time diagnostic support with high accuracy and efficiency.

Abdul Rehman Akbar, Samuel Wales-McGrath, Alejadro Levya, Lina Gokhale, Rajendra Singh, Wei Chen, Anil Parwani, Muhammad Khalid Khan Niazi2026-03-11🤖 cs.AI

VoxEmo: Benchmarking Speech Emotion Recognition with Speech LLMs

The paper introduces VoxEmo, a comprehensive benchmark and toolkit for evaluating Speech Large Language Models on speech emotion recognition across 35 corpora and 15 languages, featuring a distribution-aware soft-label protocol that reveals how these models uniquely align with human subjective emotion distributions despite trailing supervised baselines in hard-label accuracy.

Hezhao Zhang, Huang-Cheng Chou, Shrikanth Narayanan, Thomas Hain2026-03-11🤖 cs.AI

AgentOS: From Application Silos to a Natural Language-Driven Data Ecosystem

This paper proposes "AgentOS," a new paradigm that replaces traditional GUI-based operating systems with a natural language-driven ecosystem centered on an Agent Kernel, framing the realization of such a system as a Knowledge Discovery and Data Mining (KDD) challenge involving intent mining, workflow automation, and dynamic personal knowledge graphs.

Rui Liu, Tao Zhe, Dongjie Wang, Zijun Yao, Kunpeng Liu, Yanjie Fu, Huan Liu, Jian Pei2026-03-11🤖 cs.AI

BiCLIP: Domain Canonicalization via Structured Geometric Transformation

The paper introduces BiCLIP, a simple and parameter-efficient framework that achieves state-of-the-art few-shot domain adaptation for vision-language models by applying a structured geometric transformation to align multimodal features across disparate domains using a small set of anchor samples.

Pranav Mantini, Shishir K. Shah2026-03-11🤖 cs.AI

A Consensus-Driven Multi-LLM Pipeline for Missing-Person Investigations

This paper introduces Guardian, a consensus-driven, multi-LLM pipeline enhanced by QLoRA fine-tuning that coordinates specialized models and a consensus engine to perform auditable, structured information extraction for critical missing-person investigations while avoiding unconstrained decision-making.

Joshua Castillo, Ravi Mukkamala2026-03-11🤖 cs.AI

Automated Tensor-Relational Decomposition for Large-Scale Sparse Tensor Computation

This paper introduces \texttt{EinSum}, a tensor-relational extension of Einstein Summation Notation that automatically rewrites computations to leverage efficient numerical kernels for dense operations while utilizing relational systems to manage large-scale sparsity.

Yuxin Tang, Zhiyuan Xin, Zhimin Ding, Xinyu Yao, Daniel Bourgeois, Tirthak Patel, Chris Jermaine2026-03-11🤖 cs.AI

The FABRIC Strategy for Verifying Neural Feedback Systems

This paper introduces the FaBRIC strategy, which integrates new scalable backward reachability algorithms with existing forward analysis techniques to significantly improve the verification of reach-avoid specifications in nonlinear neural feedback systems.

I. Samuel Akinwande, Sydney M. Katz, Mykel J. Kochenderfer, Clark Barrett2026-03-11🤖 cs.AI

Semantic Level of Detail: Multi-Scale Knowledge Representation via Heat Kernel Diffusion on Hyperbolic Manifolds

This paper introduces Semantic Level of Detail (SLoD), a framework that utilizes heat kernel diffusion on hyperbolic manifolds to enable continuous, principled control over knowledge abstraction levels in AI memory systems, automatically detecting emergent semantic boundaries in both synthetic and real-world knowledge graphs without manual supervision.

Edward Izgorodin2026-03-11🤖 cs.AI

Arbiter: Detecting Interference in LLM Agent System Prompts

This paper introduces Arbiter, a framework that combines formal rules with multi-model LLM analysis to detect interference patterns in coding agent system prompts, revealing that prompt architecture influences failure types and that multi-model evaluation uncovers distinct vulnerabilities missed by single-model approaches.

Tony Mason2026-03-11🤖 cs.AI

Security Considerations for Multi-agent Systems

This study systematically characterizes the unique threat landscape of multi-agent AI systems and empirically evaluates 16 security frameworks, revealing that none achieve majority coverage of the identified risks, with Non-Determinism and Data Leakage being the most under-addressed domains.

Tam Nguyen, Moses Ndebugre, Dheeraj Arremsetty2026-03-11🤖 cs.AI

Gender Fairness in Audio Deepfake Detection: Performance and Disparity Analysis

This paper analyzes gender bias in audio deepfake detection using the ASVspoof 5 dataset and a ResNet-18 classifier, demonstrating that while aggregate metrics like Equal Error Rate may suggest low disparity, fairness-aware evaluation reveals significant gender-specific error distributions that necessitate more equitable and robust detection systems.

Aishwarya Fursule, Shruti Kshirsagar, Anderson R. Avila2026-03-11🤖 cs.AI

Improving through Interaction: Searching Behavioral Representation Spaces with CMA-ES-IG

This paper introduces CMA-ES-IG, an algorithm that enhances robot preference learning by generating perceptually distinct and informative queries, thereby improving scalability, robustness, and user experience compared to existing state-of-the-art methods.

Nathaniel Dennler, Zhonghao Shi, Yiran Tao, Andreea Bobu, Stefanos Nikolaidis, Maja Mataric2026-03-11🤖 cs.AI

Meissa: Multi-modal Medical Agentic Intelligence

Meissa is a lightweight, 4B-parameter offline medical multi-modal agent that achieves state-of-the-art performance across diverse clinical benchmarks by employing novel trajectory modeling and stratified supervision to distill frontier model capabilities, thereby offering a cost-effective, low-latency, and privacy-preserving alternative to API-dependent systems.

Yixiong Chen, Xinyi Bai, Yue Pan, Zongwei Zhou, Alan Yuille2026-03-11🤖 cs.AI

AI Phenomenology for Understanding Human-AI Experiences Across Eras

This paper proposes "AI phenomenology" as a research framework that prioritizes users' first-person lived experiences over traditional performance metrics to better understand and guide the bidirectional alignment between humans and AI systems, offering a set of methodological tools, design concepts, and a research agenda derived from three empirical studies.

Bhada Yun, Evgenia Taranova, Dana Feng, Renn Su, April Yi Wang2026-03-11🤖 cs.AI

MEMO: Memory-Augmented Model Context Optimization for Robust Multi-Turn Multi-Agent LLM Games

The paper introduces MEMO, a memory-augmented self-play framework that optimizes inference-time context through structured memory retention and uncertainty-aware prompt exploration, significantly improving the win rates and run-to-run stability of multi-agent LLMs in long-horizon, imperfect-information games.

Yunfei Xie, Kevin Wang, Bobby Cheng, Jianzhu Yao, Zhizhou Sha, Alexander Duffy, Yihan Xi, Hongyuan Mei, Cheston Tan, Chen Wei, Pramod Viswanath, Zhangyang Wang2026-03-11🤖 cs.AI

The Missing Memory Hierarchy: Demand Paging for LLM Context Windows

This paper introduces Pichay, a demand paging system that treats LLM context windows as a memory hierarchy rather than a static cache, successfully reducing context consumption by up to 93% in production by evicting stale content and dynamically reloading it only when needed.

Tony Mason2026-03-11🤖 cs.AI

Automating Detection and Root-Cause Analysis of Flaky Tests in Quantum Software

This paper presents an automated pipeline leveraging Large Language Models to detect and diagnose flaky tests in quantum software, successfully expanding an existing dataset by 54% and demonstrating that models like Google Gemini can achieve high accuracy (F1-scores up to 0.9643) in classifying flakiness and identifying root causes.

Janakan Sivaloganathan, Ainaz Jamshidi, Andriy Miranskyy, Lei Zhang2026-03-11🤖 cs.AI

PlayWorld: Learning Robot World Models from Autonomous Play

PlayWorld introduces a fully autonomous pipeline that trains high-fidelity, physically consistent video world models from unsupervised robot self-play, outperforming human-collected data in predicting complex interactions and significantly boosting real-world reinforcement learning success rates.

Tenny Yin, Zhiting Mei, Zhonghe Zheng, Miyu Yamane, David Wang, Jade Sceats, Samuel M. Bateman, Lihan Zha, Apurva Badithela, Ola Shorinwa, Anirudha Majumdar2026-03-11🤖 cs.AI

WS-Net: Weak-Signal Representation Learning and Gated Abundance Reconstruction for Hyperspectral Unmixing via State-Space and Weak Signal Attention Fusion

This paper introduces WS-Net, a deep unmixing framework that combines state-space modeling, wavelet-fused encoding, and a specialized weak signal attention mechanism to effectively recover weak spectral signals and significantly improve abundance estimation accuracy in hyperspectral images under low signal-to-noise conditions.

Zekun Long, Ali Zia, Guanyiman Fu, Vivien Rolland, Jun Zhou2026-03-11🤖 cs.AI

← Previous Next →