cs.AI papers | Gist.Science

Clear, Compelling Arguments: Rethinking the Foundations of Frontier AI Safety Cases

This paper critiques the current limitations of alignment-focused safety cases for frontier AI by drawing on established methodologies from safety-critical industries to propose a more robust, defensible framework, illustrated through a case study on deceptive alignment and CBRN capabilities.

Shaun Feakins, Ibrahim Habli, Phillip Morgan2026-03-11🤖 cs.AI

Multi-level meta-reinforcement learning with skill-based curriculum

This paper proposes a multi-level meta-reinforcement learning framework that systematically compresses Markov decision processes into hierarchical structures with skill-based curriculum learning to decouple sub-tasks, reduce stochasticity, and enable efficient transfer of skills across different problems and levels.

Sichen Yang (Johns Hopkins University), Mauro Maggioni (Johns Hopkins University)2026-03-11🤖 cs.AI

Large Language Model-Assisted Superconducting Qubit Experiments

This paper introduces a large language model (LLM) framework that automates the control and measurement of superconducting qubits by dynamically generating and invoking tools based on a knowledge base, thereby enabling rapid deployment of standard protocols and the flexible implementation of novel experimental procedures.

Shiheng Li, Jacob M. Miller, Phoebe J. Lee, Gustav Andersson, Christopher R. Conner, Yash J. Joshi, Bayan Karimi, Amber M. King, Howard L. Malc, Harsh Mishra, Hong Qiao, Minseok Ryu, Xuntao Wu, Siyuan Xing, Haoxiong Yan, Jian Shi, Andrew N. Cleland2026-03-11⚛️ quant-ph

Test-Driven AI Agent Definition (TDAD): Compiling Tool-Using Agents from Behavioral Specifications

This paper introduces Test-Driven AI Agent Definition (TDAD), a methodology that compiles tool-using LLM agents from behavioral specifications by iteratively refining prompts against executable tests, thereby ensuring measurable behavioral compliance and robustness against silent regressions through mechanisms like hidden test splits and semantic mutation testing.

Tzafrir Rehan2026-03-11🤖 cs.AI

Scale-Plan: Scalable Language-Enabled Task Planning for Heterogeneous Multi-Robot Teams

Scale-Plan is a scalable framework that leverages large language models to filter irrelevant perceptual information and construct compact, task-relevant representations from natural language instructions, thereby enabling efficient and reliable long-horizon planning for heterogeneous multi-robot teams while outperforming existing baselines on the new MAT2-THOR benchmark.

Piyush Gupta, Sangjae Bae, Jiachen Li, David Isele2026-03-11🤖 cs.AI

Beyond Relevance: On the Relationship Between Retrieval and RAG Information Coverage

This paper empirically demonstrates that coverage-based retrieval metrics serve as reliable early indicators of information coverage in RAG-generated responses, particularly when retrieval objectives align with generation goals, across diverse text and multimodal benchmarks.

Saron Samuel, Alexander Martin, Eugene Yang, Andrew Yates, Dawn Lawrie, Ian Soborof, Laura Dietz, Benjamin Van Durme2026-03-11🤖 cs.AI

Fish Audio S2 Technical Report

This paper introduces Fish Audio S2, an open-source text-to-speech system that leverages a multi-stage training pipeline to enable multi-speaker, multi-turn generation with natural-language instruction following, while providing production-ready weights and an efficient SGLang-based inference engine.

Shijia Liao, Yuxuan Wang, Songting Liu, Yifan Cheng, Ruoyi Zhang, Tianyu Li, Shidong Li, Yisheng Zheng, Xingwei Liu, Qingzheng Wang, Zhizhuo Zhou, Jiahua Liu, Xin Chen, Dawei Han2026-03-11🤖 cs.AI

Are Expressive Encoders Necessary for Discrete Graph Generation?

This paper introduces GenGNN, a modular message-passing framework that demonstrates expressive neural backbones like transformers are not strictly necessary for discrete graph generation, as diffusion models using GenGNN achieve competitive validity and superior inference speed on various datasets.

Jay Revolinsky, Harry Shomer, Jiliang Tang2026-03-11🤖 cs.AI

MASEval: Extending Multi-Agent Evaluation from Models to Systems

MASEval introduces a framework-agnostic library that shifts multi-agent evaluation from a model-centric to a system-centric approach, demonstrating through extensive experiments that implementation decisions regarding topology and orchestration impact performance as significantly as model selection.

Cornelius Emde, Alexander Rubinstein, Anmol Goel, Ahmed Heakl, Sangdoo Yun, Seong Joon Oh, Martin Gubri2026-03-11🤖 cs.AI

A Lightweight Multi-Cancer Tumor Localization Framework for Deployable Digital Pathology

The paper presents MuCTaL, a lightweight multi-cancer tumor localization framework trained on four cancer types that achieves high performance on training data and demonstrates generalization to unseen tumor types, enabling scalable deployment for digital pathology applications.

Brian Isett, Rebekah Dadey, Aofei Li, Ryan C. Augustin, Kate Smith, Aatur D. Singhi, Qiangqiang Gu, Riyue Bao2026-03-11🤖 cs.AI

LDP: An Identity-Aware Protocol for Multi-Agent LLM Systems

This paper introduces the LLM Delegate Protocol (LDP), an AI-native communication framework that enhances multi-agent system efficiency and governance by exposing model identity and reasoning profiles as first-class primitives, demonstrating significant reductions in latency and token usage alongside improved security and recovery capabilities in experimental evaluations.

Sunil Prakash2026-03-11🤖 cs.AI

Unpacking Interpretability: Human-Centered Criteria for Optimal Combinatorial Solutions

This paper establishes that human preference for equally optimal combinatorial packing solutions is reliably driven by three quantifiable structural properties—alignment with greedy heuristics, simple within-bin composition, and ordered visual representation—thereby providing a concrete framework for designing interpretable algorithmic support systems.

Dominik Pegler, Frank Jäkel, David Steyrl, Frank Scharnowski, Filip Melinscak2026-03-11🤖 cs.AI

Quantifying the Accuracy and Cost Impact of Design Decisions in Budget-Constrained Agentic LLM Search

This paper presents a controlled study using the Budget-Constrained Agentic Search (BCAS) framework to quantify how search depth, retrieval strategies, and completion budgets impact the accuracy and cost of Agentic RAG systems across six LLMs and three benchmarks, offering practical configuration guidelines for budget-constrained deployments.

Kyle McCleary, James Ghawaly2026-03-11🤖 cs.AI

A New Modeling to Feature Selection Based on the Fuzzy Rough Set Theory in Normal and Optimistic States on Hybrid Information Systems

This paper introduces FSbuHD, a novel feature selection model for hybrid information systems that addresses the computational and noise limitations of traditional fuzzy rough set theory by reformulating the problem as an optimization task based on combined object distances, demonstrating superior efficiency and effectiveness in both normal and optimistic states across UCI datasets.

Mohammad Hossein Safarpour, Seyed Mohammad Alavi, Mohammad Izadikhah, Hossein Dibachi2026-03-11🤖 cs.AI

NetDiffuser: Deceiving DNN-Based Network Attack Detection Systems with Diffusion-Generated Adversarial Traffic

This paper introduces NetDiffuser, a novel framework that leverages a feature categorization algorithm and diffusion models to generate natural adversarial examples that effectively deceive deep learning-based network intrusion detection systems while preserving traffic validity.

Pratyay Kumar, Abu Saleh Md Tayeen, Satyajayant Misra, Huiping Cao, Jiefei Liu, Qixu Gong, Jayashree Harikumar2026-03-11🤖 cs.AI

Cross-Domain Uncertainty Quantification for Selective Prediction: A Comprehensive Bound Ablation with Transfer-Informed Betting

This paper introduces Transfer-Informed Betting (TIB), a novel method that combines betting-based confidence sequences with cross-domain transfer learning to achieve tighter, data-efficient risk guarantees for selective prediction, demonstrating significant coverage improvements over existing bounds across multiple benchmarks and applications.

Abhinaba Basu2026-03-11🤖 cs.AI

FedLECC: Cluster- and Loss-Guided Client Selection for Federated Learning under Non-IID Data

FedLECC is a lightweight client selection strategy for federated learning under non-IID data that groups clients by label-distribution similarity and prioritizes those with higher local loss, thereby significantly improving test accuracy while reducing communication rounds and overhead.

Daniel M. Jimenez-Gutierrez, Giovanni Giunta, Mehrdad Hassanzadeh, Aris Anagnostopoulos, Ioannis Chatzigiannakis, Andrea Vitaletti2026-03-11🤖 cs.AI

Uncovering a Winning Lottery Ticket with Continuously Relaxed Bernoulli Gates

This paper introduces a fully differentiable approach to discovering Strong Lottery Tickets by employing continuously relaxed Bernoulli gates to optimize sparsity via gradient descent on frozen weights, achieving significantly higher sparsity with minimal accuracy loss compared to existing non-differentiable methods like edge-popup.

Itamar Tsayag, Ofir Lindenbaum2026-03-11🤖 cs.AI

Quantifying Uncertainty in AI Visibility: A Statistical Framework for Generative Search Measurement

This paper argues that citation visibility in generative search should be treated as a stochastic distribution requiring uncertainty estimates rather than a fixed value, demonstrating through empirical analysis of multiple AI platforms that single-run measurements are misleadingly precise and that robust statistical sampling is essential for accurate domain performance assessment.

Ronald Sielinski2026-03-11🤖 cs.AI

Using Vision Language Foundation Models to Generate Plant Simulation Configurations via In-Context Learning

This paper introduces a novel framework utilizing vision-language foundation models (Gemma 3 and Qwen3-VL) to automatically generate JSON simulation configurations for digital twin agriculture by interpreting drone imagery, demonstrating their potential to scale functional-structural plant modeling while highlighting current limitations in visual reasoning and reliance on contextual priors.

Heesup Yun, Isaac Kazuo Uyehara, Earl Ranario, Lars Lundqvist, Christine H. Diepenbrock, Brian N. Bailey, J. Mason Earles2026-03-11🤖 cs.AI

← Previous Next →