cs.AI papers | Gist.Science

Dual Randomized Smoothing: Beyond Global Noise Variance

This paper proposes Dual Randomized Smoothing, a novel framework that overcomes the limitations of global noise variance by introducing input-dependent noise variances via a locally constant variance estimator, thereby achieving superior certified robustness across both small and large perturbation radii on CIFAR-10 and ImageNet.

Chenhao Sun, Yuhao Mao, Martin Vechev2026-03-10🤖 cs.LG

Process-Centric Analysis of Agentic Software Systems

This paper introduces Graphectory, a graph-based framework for analyzing the stochastic execution trajectories of agentic software systems, which reveals that richer prompts and stronger models yield more complex reasoning patterns while enabling real-time monitoring and intervention that significantly improves problem resolution rates and efficiency.

Shuyang Liu, Yang Chen, Rahul Krishna, Saurabh Sinha, Jatin Ganhotra, Reyhan Jabbarvand2026-03-10💬 cs.CL

Beyond Additivity: Sparse Isotonic Shapley Regression toward Nonlinear Explainability

This paper introduces Sparse Isotonic Shapley Regression (SISR), a unified framework that simultaneously learns a monotonic transformation to restore additivity and enforces sparsity constraints to provide robust, efficient, and theoretically grounded feature attributions for nonlinear, high-dimensional Explainable AI.

Jialai She2026-03-10🤖 cs.LG

Parallel Decoder Transformer: Planner-Seeded Latent Coordination for Synchronized Parallel Decoding

The Parallel Decoder Transformer (PDT) introduces a frozen-trunk architecture that enables synchronized parallel decoding by integrating a planner-seeded latent workspace and a dynamic notes bus, allowing multiple output streams to internally coordinate, resolve ownership, and synchronize generation without external orchestration.

Logan Robbins2026-03-10💬 cs.CL

Beyond Endpoints: Path-Centric Reasoning for Vectorized Off-Road Network Extraction

This paper addresses the challenges of off-road road network extraction by introducing the WildRoad dataset and MaGRoad, a novel path-centric framework that overcomes the limitations of existing node-centric models to achieve state-of-the-art performance and faster inference in wild terrains.

Wenfei Guan, Jilin Mei, Tong Shen, Xumin Wu, Shuo Wang, Chen Min, Yu Hu2026-03-10💻 cs

SALVE: Sparse Autoencoder-Latent Vector Editing for Mechanistic Control of Neural Networks

The paper introduces SALVE, a unified framework that combines sparse autoencoders and feature-level saliency mapping to discover, validate, and precisely edit neural network weights, enabling interpretable and robust control over both convolutional and transformer-based models.

Vegard Flovik2026-03-10🤖 cs.LG

Adaptation of Agentic AI: A Survey of Post-Training, Memory, and Skills

This survey proposes a unified four-paradigm framework to categorize and analyze the fragmented landscape of agentic AI adaptation, distinguishing between agent-side improvements (A1/A2) and tool-side enhancements (T1/T2) to systematically review post-training methods, memory architectures, and skill systems while evaluating their trade-offs and outlining future challenges.

Pengcheng Jiang, Jiacheng Lin, Zhiyi Shi, Zifeng Wang, Luxi He, Yichen Wu, Ming Zhong, Peiyang Song, Qizheng Zhang, Heng Wang, Xueqiang Xu, Hanwen Xu, Pengrui Han, Dylan Zhang, Jiashuo Sun, Chaoqi Yang, Kun Qian, Tian Wang, Changran Hu, Manling Li, Quanzheng Li, Hao Peng, Sheng Wang, Jingbo Shang, Chao Zhang, Jiaxuan You, Liyuan Liu, Pan Lu, Yu Zhang, Heng Ji, Yejin Choi, Dawn Song, Jimeng Sun, Jiawei Han2026-03-10💬 cs.CL

Meta-RL Induces Exploration in Language Agents

The paper introduces LaMer, a Meta-RL framework that enhances language agents' ability to actively explore and adapt to novel environments at test time through cross-episode training and in-context policy reflection, significantly outperforming standard RL baselines across diverse tasks.

Yulun Jiang, Liangze Jiang, Damien Teney, Michael Moor, Maria Brbic2026-03-10🤖 cs.LG

ReDepth Anything: Test-Time Depth Refinement via Self-Supervised Re-lighting

Re-Depth Anything is a test-time self-supervised framework that enhances monocular depth estimation by fusing foundation models with large-scale 2D diffusion priors to perform label-free refinement via generative re-lighting and Score Distillation Sampling, achieving state-of-the-art results without direct depth tensor optimization.

Ananta R. Bhattarai, Helge Rhodin2026-03-10🤖 cs.LG

Cost Trade-offs of Reasoning and Non-Reasoning Large Language Models in Text-to-SQL

This paper demonstrates that reasoning Large Language Models significantly reduce cloud query execution costs and data consumption compared to non-reasoning models in Text-to-SQL tasks, while revealing that execution time is a poor proxy for cost efficiency and highlighting the substantial financial risks posed by non-reasoning models' tendency to generate inefficient queries.

Saurabh Deochake, Debajyoti Mukhopadhyay2026-03-10💻 cs

Physics-Informed Neural Networks for Device and Circuit Modeling: A Case Study of NeuroSPICE

This paper introduces NeuroSPICE, a physics-informed neural network framework that solves circuit differential-algebraic equations via backpropagation to provide a flexible alternative to conventional SPICE for simulating emerging nonlinear devices and addressing inverse problems, despite not surpassing traditional solvers in raw speed or accuracy.

Chien-Ting Tung, Chenming Hu2026-03-10🔬 physics.app-ph

Toward a Physical Theory of Intelligence

This paper introduces the Conservation-Congruent Encoding (CCE) framework, a unified physical theory that defines intelligence as an irreversible process of extracting work while minimizing dissipation, thereby deriving universal computational bounds and linking thermodynamic measurement, quantum decoherence, and spacetime geometry to establish substrate-neutral constraints for both natural and artificial intelligence.

Peter David Fagan2026-03-10💻 cs

Reliable Grid Forecasting: State Space Models for Safety-Critical Energy Systems

This paper introduces an operator-legible evaluation framework centered on under-prediction risk to demonstrate that standard accuracy metrics fail to capture safety-critical grid forecasting needs, revealing that while explicit weather integration improves reliability, unconstrained probabilistic models often induce "fake safety" through excessive inflation, a problem solved by new Bias/OPR-constrained objectives.

Sunki Hong, Jisoo Lee2026-03-10⚡ eess

DrivingGen: A Comprehensive Benchmark for Generative Video World Models in Autonomous Driving

This paper introduces DrivingGen, the first comprehensive benchmark for generative driving world models that addresses the lack of rigorous evaluation by combining a diverse dataset with a novel suite of metrics to assess visual realism, trajectory plausibility, temporal coherence, and controllability, thereby revealing critical trade-offs in current state-of-the-art models.

Yang Zhou, Hao Shao, Letian Wang, Zhuofan Zong, Hongsheng Li, Steven L. Waslander2026-03-10💻 cs

Batch-of-Thought: Cross-Instance Learning for Enhanced LLM Reasoning

This paper introduces Batch-of-Thought (BoT), a training-free method that enhances Large Language Model reasoning by jointly processing related queries to leverage cross-instance signals, thereby improving accuracy, calibration, and computational efficiency through a multi-agent reflection architecture.

Xuan Yang, Furong Jia, Roy Xie, Xiong Xi, Hengwei Bian, Jian Li, Monica Agrawal2026-03-10💻 cs

NC-Bench: An LLM Benchmark for Evaluating Conversational Competence

NC-Bench introduces a theory-grounded benchmark that evaluates the conversational competence of large language models by assessing their ability to manage the form and structure of natural interactions across basic, retrieval-augmented, and complex multi-turn scenarios, revealing that while models excel at basic answering, they struggle significantly with repair and complex sequence management tasks.

Robert J. Moore, Sungeun An, Farhan Ahmed, Jay Pankaj Gala2026-03-10💬 cs.CL

The Algorithmic Gaze of Image Quality Assessment: An Audit and Trace Ethnography of the LAION-Aesthetics Predictor

This paper audits the LAION-Aesthetics Predictor to reveal how its algorithmic gaze reinforces Western, male, and imperial biases by disproportionately filtering content and prioritizing specific cultural aesthetics, ultimately urging a shift toward pluralistic evaluation methods in AI development.

Jordan Taylor, William Agnew, Maarten Sap, Sarah E. Fox, Haiyi Zhu2026-03-10💻 cs

CaMeLs Can Use Computers Too: System-level Security for Computer Use Agents

This paper introduces "Single-Shot Planning," a secure architecture for Computer Use Agents that generates a complete, trusted execution graph before observing untrusted UI states to effectively mitigate prompt injection and branch steering attacks while maintaining competitive task performance.

Hanna Foerster, Tom Blanchard, Kristina Nikolic, Ilia Shumailov, Cheng Zhang, Robert Mullins, Nicolas Papernot, Florian Tramèr, Yiren Zhao2026-03-10💻 cs

BoxMind: Closed-loop AI strategy optimization for elite boxing validated in the 2024 Olympics

This paper introduces BoxMind, a closed-loop AI system that transforms unstructured boxing footage into hierarchical tactical indicators and predictive gradients to generate expert-level strategic recommendations, which were validated during the 2024 Paris Olympics by contributing to the Chinese National Team's historic medal success.

Kaiwen Wang, Kaili Zheng, Rongrong Deng, Qingmin Fan, Milin Zhang, Zongrui Li, Xuesi Zhou, Bo Han, Liren Chen, Chenyi Guo, Ji Wu2026-03-10💻 cs

Multifaceted Scenario-Aware Hypergraph Learning for Next POI Recommendation

This paper proposes the Multifaceted Scenario-Aware Hypergraph Learning (MSAHG) framework, which addresses the limitations of existing methods in handling mobility variations across distinct contexts by constructing scenario-specific disentangled sub-hypergraphs and employing a parameter-splitting mechanism to resolve inter-scenario conflicts, thereby significantly improving next POI recommendation performance.

Yuxi Lin, Yongkang Li, Jie Xing, Zipei Fan2026-03-10💻 cs

← Previous Next →