cs.AI papers | Gist.Science

AlphaFlowTSE: One-Step Generative Target Speaker Extraction via Conditional AlphaFlow

AlphaFlowTSE is a one-step conditional generative model for target speaker extraction that utilizes a JVP-free AlphaFlow objective and interval-consistency training to achieve high-fidelity speech recovery with low latency and improved generalization for downstream ASR tasks.

Duojia Li, Shuhan Zhang, Zihan Qian, Wenxuan Wu, Shuai Wang, Qingyang Hong, Lin Li, Haizhou Li2026-03-12🤖 cs.AI

Probabilistic Verification of Voice Anti-Spoofing Models

This paper introduces PV-VASM, a model-agnostic probabilistic framework that provides formal robustness guarantees and estimates misclassification probabilities for voice anti-spoofing models against various speech synthesis attacks and unseen perturbations.

Evgeny Kushnir, Alexandr Kozodaev, Dmitrii Korzh, Mikhail Pautov, Oleg Kiriukhin, Oleg Y. Rogov2026-03-12🤖 cs.AI

UAV traffic scene understanding: A cross-spectral guided approach and a unified benchmark

This paper proposes CTCNet, a novel cross-spectral guided network featuring a Prototype-Guided Knowledge Embedding module and a Quality-Aware Spectral Compensation module to enhance UAV traffic scene understanding under adverse conditions, accompanied by the introduction of Traffic-VQA, the first large-scale optical-thermal benchmark for cognitive traffic analysis.

Yu Zhang, Zhicheng Zhao, Ze Luo, Chenglong Li, Jin Tang2026-03-12🤖 cs.AI

Towards Robust Speech Deepfake Detection via Human-Inspired Reasoning

This paper introduces HIR-SDD, a novel speech deepfake detection framework that leverages Large Audio Language Models and a human-annotated dataset to achieve robust generalization across audio domains while providing interpretable, human-like reasoning for its predictions.

Artem Dvirniak, Evgeny Kushnir, Dmitrii Tarasov, Artem Iudin, Oleg Kiriukhin, Mikhail Pautov, Dmitrii Korzh, Oleg Y. Rogov2026-03-12🤖 cs.AI

CUPID: A Plug-in Framework for Joint Aleatoric and Epistemic Uncertainty Estimation with a Single Model

CUPID is a novel, plug-in framework that enables joint estimation of aleatoric and epistemic uncertainty in pretrained deep learning models without requiring retraining or architectural modifications, thereby enhancing interpretability and trust in high-stakes AI applications.

Xinran Xu, Xiuyi Fan2026-03-12🤖 cs.LG

Deep Randomized Distributed Function Computation (DeepRDFC): Neural Distributed Channel Simulation

This paper proposes a deep learning-based autoencoder architecture for the Randomized Distributed Function Computation (RDFC) framework that minimizes the total variation distance to an unknown target distribution using only data samples, demonstrating superior communication efficiency compared to traditional data compression methods, particularly under limited common randomness.

Didrik Bergström, Onur Günlü2026-03-12🔢 math

Taking Shortcuts for Categorical VQA Using Super Neurons

This paper introduces "Super Neurons," a training-free method that leverages scalar activations from the first generated token to create highly accurate classifiers for categorical VQA, enabling extreme early exiting from the first layer and achieving up to a 5.10x speedup while improving performance over the original network.

Pierre Musacchio, Jaeyi Jeong, Dahun Kim, Jaesik Park2026-03-12🤖 cs.AI

AI-Enhanced Spatial Cellular Traffic Demand Prediction with Contextual Clustering and Error Correction for 5G/6G Planning

This paper proposes an AI-driven framework that improves 5G/6G traffic demand prediction accuracy and spatial generalization by employing a context-aware two-stage splitting strategy and residual error correction to mitigate neighborhood leakage, as validated by experiments across five Canadian cities.

Mohamad Alkadamani, Colin Brown, Halim Yanikomeroglu2026-03-12⚡ eess

Towards Intelligent Spectrum Management: Spectrum Demand Estimation Using Graph Neural Networks

This paper proposes a hierarchical Graph Attention Network (HR-GAT) model that leverages public deployment records to accurately estimate fine-grained spectrum demand across multiple cities, significantly outperforming existing baselines and providing regulators with actionable insights for efficient spectrum sharing and allocation.

Mohamad Alkadamani, Amir Ghasemi, Halim Yanikomeroglu2026-03-12⚡ eess

Risk-Adjusted Harm Scoring for Automated Red Teaming for LLMs in Financial Services

This paper introduces a risk-aware evaluation framework for Large Language Models in financial services, featuring a domain-specific taxonomy, an automated multi-round red-teaming pipeline, and a Risk-Adjusted Harm Score (RAHS) metric to better capture and quantify severe, operationally actionable security failures that traditional domain-agnostic benchmarks miss.

Fabrizio Dimino, Bhaskarjit Sarmah, Stefano Pasquali2026-03-12💰 q-fin

Nurture-First Agent Development: Building Domain-Expert AI Agents Through Conversational Knowledge Crystallization

This paper proposes Nurture-First Development (NFD), a paradigm that shifts AI agent creation from static engineering to a continuous, conversational co-evolution process with domain experts, utilizing a Knowledge Crystallization Cycle to progressively transform tacit operational dialogue into structured, reusable expertise.

Linghao Zhang2026-03-12🤖 cs.AI

Protein Counterfactuals via Diffusion-Guided Latent Optimization

This paper introduces MCCOP, a framework that leverages a pretrained diffusion model within a joint sequence-structure latent space to generate minimal, biologically plausible protein mutations that flip predictive model outcomes to desired states, thereby bridging the gap between deep learning predictions and actionable protein engineering.

Weronika Kłos, Sidney Bender, Lukas Kades2026-03-12🤖 cs.LG

Speaker Verification with Speech-Aware LLMs: Evaluation and Augmentation

This paper evaluates the weak speaker verification capabilities of existing speech-aware large language models and proposes a lightweight augmentation method using frozen ECAPA-TDNN embeddings and LoRA adapters to significantly enhance speaker discrimination while preserving natural language interfaces.

Thomas Thebaud, Yuzhe Wang, Laureano Moro-Velazquez, Jesus Villalba-Lopez, Najim Dehak2026-03-12🤖 cs.AI

BALD-SAM: Disagreement-based Active Prompting in Interactive Segmentation

This paper introduces BALD-SAM, a principled framework that adapts Bayesian Active Learning by Disagreement to spatial prompt selection in interactive segmentation, enabling a lightweight uncertainty estimation head on frozen foundation models to significantly outperform human and oracle prompting across diverse domains.

Prithwijit Chowdhury, Mohit Prabhushankar, Ghassan AlRegib2026-03-12🤖 cs.AI

On the Reliability of Cue Conflict and Beyond

This paper critiques the instability and ambiguity of current cue-conflict benchmarks for measuring neural network shape-texture bias and introduces REFINED-BIAS, a new framework with balanced cue pairs and ranking-based metrics to enable reliable, interpretable, and fair cross-model comparisons.

Pum Jun Kim, Seung-Ah Lee, Seongho Park, Dongyoon Han, Jaejun Yoo2026-03-12🤖 cs.AI

Human Presence Detection via Wi-Fi Range-Filtered Doppler Spectrum on Commodity Laptops

This paper introduces a novel, low-complexity Human Presence Detection system for commodity laptops that utilizes the built-in Wi-Fi hardware and a new Range-Filtered Doppler Spectrum technique to achieve privacy-preserving, calibration-free occupancy sensing without requiring external sensors or infrastructure.

Jessica Sanson, Rahul C. Shah, Valerio Frascolla2026-03-12⚡ eess

Towards Cold-Start Drafting and Continual Refining: A Value-Driven Memory Approach with Application to NPU Kernel Synthesis

The paper introduces EvoKernel, a self-evolving agentic framework that leverages value-driven memory and reinforcement learning to overcome data scarcity in NPU kernel synthesis, significantly improving model correctness and achieving substantial speedups through automated drafting and iterative refinement.

Yujie Zheng, Zhuo Li, Shengtao Zhang, Hanjing Wang, Junjie Sheng, Jiaqian Wang, Junchi Yan, Weinan Zhang, Ying Wen, Bo Tang, Muning Wen2026-03-12🤖 cs.LG

Semantic Landmark Particle Filter for Robot Localisation in Vineyards

This paper introduces a Semantic Landmark Particle Filter (SLPF) that enhances robot localisation in vineyards by integrating trunk and pole detections with LiDAR and GNSS to overcome perceptual aliasing caused by parallel crop rows, achieving significantly lower pose errors and improved row correctness compared to existing geometry-only, vision-based, and GNSS-only baselines.

Rajitha de Silva, Jonathan Cox, James R. Heselden, Marija Popovic, Cesar Cadena, Riccardo Polvara2026-03-12🤖 cs.AI

$V_{0.5}$ : Generalist Value Model as a Prior for Sparse RL Rollouts

The paper proposes $V_{0.5}$ , a novel method that dynamically fuses a Generalist Value Model's prior with sparse RL rollouts via real-time statistical testing to minimize baseline estimation error, thereby achieving faster convergence and over 10% performance gains on mathematical reasoning benchmarks compared to GRPO and DAPO.

Yi-Kai Zhang, Yueqing Sun, Hongyan Hao, Qi Gu, Xunliang Cai, De-Chuan Zhan, Han-Jia Ye2026-03-12🤖 cs.LG

GRACE: A Unified 2D Multi-Robot Path Planning Simulator & Benchmark for Grid, Roadmap, And Continuous Environments

This paper introduces GRACE, a unified 2D simulator and benchmark that enables transparent, reproducible comparisons of multi-robot path planning algorithms across grid, roadmap, and continuous environments by standardizing task instantiation, execution, and evaluation protocols.

Chuanlong Zang, Anna Mannucci, Isabelle Barz, Philipp Schillinger, Florian Lier, Wolfgang Hönig2026-03-12🤖 cs.AI

← Previous Next →

cs.AI