cs.AI papers | Gist.Science

Multi-head automated segmentation by incorporating detection head into the contextual layer neural network

This paper proposes a gated multi-head Transformer architecture that integrates a parallel detection head to suppress anatomically implausible false positives in radiotherapy auto-segmentation, significantly improving robustness and accuracy on the Prostate-Anatomical-Edge-Cases dataset compared to conventional segmentation-only models.

Edwin Kys, Febian Febian2026-03-11🤖 cs.AI

UAT-LITE: Inference-Time Uncertainty-Aware Attention for Pretrained Transformers

The paper proposes UAT-LITE, an inference-time framework that injects Monte Carlo dropout into the self-attention mechanisms of pretrained transformers to estimate token-level epistemic uncertainty and modulate attention, thereby significantly improving model calibration and selective prediction performance without requiring additional training or weight modifications.

Elias Hossain, Shubhashis Roy Dipta, Subash Neupane, Rajib Rana, Ravid Shwartz-Ziv, Ivan Garibay, Niloofar Yousefi2026-03-11🤖 cs.AI

WebAccessVL: Violation-Aware VLM for Web Accessibility

The paper introduces WebAccessVL, a violation-aware vision-language model that automatically edits website HTML to fix WCAG2 accessibility violations while preserving visual design, achieving a 96% reduction in violations and outperforming GPT-5 through a supervised image-conditioned program synthesis approach enhanced by a checker-in-the-loop refinement strategy.

Amber Yijia Zheng, Jae Joong Lee, Bedrich Benes, Raymond A. Yeh2026-03-11🤖 cs.AI

Why do we Trust Chatbots? From Normative Principles to Behavioral Drivers

This paper argues that user trust in chatbots is often driven by interactional design choices that exploit cognitive biases rather than genuine trustworthiness, urging a reframing of chatbots as skilled salespeople and a distinction between psychological trust formation and normative trustworthiness to better calibrate user expectations.

Aditya Gulati, Nuria Oliver2026-03-11🤖 cs.AI

Monocular Normal Estimation via Shading Sequence Estimation

This paper introduces RoSE, a novel approach that reformulates monocular normal estimation as shading sequence estimation using image-to-video generative models to overcome 3D misalignment issues and achieve state-of-the-art performance on real-world benchmarks.

Zongrui Li, Xinhua Ma, Minghui Hu, Yunqing Zhao, Yingchen Yu, Qian Zheng, Chang Liu, Xudong Jiang, Song Bai2026-03-11🤖 cs.AI

Infusion: Shaping Model Behavior by Editing Training Data via Influence Functions

The paper introduces "Infusion," a framework that leverages scalable influence-function approximations to compute subtle perturbations in training data, demonstrating that modifying as little as 0.2% of a dataset can effectively and transferably shape model behavior across vision and language domains.

J Rosser, Robert Kirk, Edward Grefenstette, Jakob Foerster, Laura Ruis2026-03-11🤖 cs.AI

Energy-Aware Spike Budgeting for Continual Learning in Spiking Neural Networks for Neuromorphic Vision

This paper proposes an energy-aware spike budgeting framework that integrates experience replay, learnable neuron parameters, and an adaptive scheduler to effectively mitigate catastrophic forgetting while optimizing both accuracy and energy efficiency in Spiking Neural Networks across diverse frame-based and event-based neuromorphic vision benchmarks.

Anika Tabassum Meem, Muntasir Hossain Nadid, Md Zesun Ahmed Mia2026-03-11🤖 cs.AI

B-DENSE: Branching For Dense Ensemble Network Supervision Efficiency

The paper proposes B-DENSE, a novel distillation framework that leverages multi-branch trajectory alignment to enforce dense intermediate supervision, thereby overcoming the structural information loss and discretization errors of existing methods to achieve superior image generation quality with reduced inference latency.

Cherish Puniani, Tushar Kumar, Arnav Bendre, Gaurav Kumar, Shree Singhi2026-03-11🤖 cs.AI

Contextuality from Single-State Ontological Models: An Information-Theoretic No-Go Theorem

This paper establishes an information-theoretic no-go theorem proving that classical ontological models constrained to reuse a single ontic state space across multiple interventions inevitably incur an irreducible contextual information cost, thereby identifying contextuality as a fundamental limitation of such classical representations that quantum theory circumvents by relaxing the single-variable assumption.

Song-Ju Kim2026-03-11⚛️ quant-ph

Continual uncertainty learning

This paper proposes a curriculum-based continual learning framework that decomposes complex robust control problems with multiple uncertainties into sequential tasks, combining a model-based controller with deep reinforcement learning to achieve efficient, non-forgetting policy updates and successful sim-to-real transfer for automotive powertrain vibration control.

Heisei Yonezawa, Ansei Yonezawa, Itsuro Kajiwara2026-03-11🤖 cs.AI

ReDON: Recurrent Diffractive Optical Neural Processor with Reconfigurable Self-Modulated Nonlinearity

The paper introduces ReDON, a novel recurrent diffractive optical neural processor that overcomes the limitations of static passive masks by employing reconfigurable, self-modulated nonlinearity inspired by gated linear units, thereby significantly enhancing computational expressivity and task performance on image benchmarks with minimal power overhead.

Ziang Yin, Qi Jing, Raktim Sarma, Rena Huang, Yu Yao, Jiaqi Gu2026-03-11🔬 physics.optics

SafeGen-LLM: Enhancing Safety Generalization in Task Planning for Robotic Systems

This paper introduces SafeGen-LLM, a two-stage post-training framework combining supervised fine-tuning and GRPO with formal verification rewards to enhance the safety satisfaction and generalization of robotic task planning across diverse domains and input formats.

Jialiang Fan, Weizhe Xu, Mengyu Liu + 3 more2026-03-11🤖 cs.AI

Breaking the Factorization Barrier in Diffusion Language Models

The paper introduces Coupled Discrete Diffusion (CoDD), a hybrid framework that overcomes the "factorization barrier" in diffusion language models by replacing fully factorized outputs with a lightweight probabilistic inference layer, thereby enabling efficient parallel generation of coherent, high-quality text without the prohibitive costs of full joint modeling or reinforcement learning.

Ian Li, Zilei Shao, Benjie Wang, Rose Yu, Guy Van den Broeck, Anji Liu2026-03-11🤖 cs.AI

OrthoAI: A Neurosymbolic Framework for Evidence-Grounded Biomechanical Reasoning in Clear Aligner Orthodontics

OrthoAI is a neurosymbolic framework that bridges 3D tooth segmentation and clinical reasoning for clear aligner orthodontics by combining sparse-supervision learning, knowledge-grounded biomechanical constraint inference, and multi-criteria treatment evaluation to enable fast, evidence-based automated decision support.

Edouard Lansiaux, Margaux Leman, Mehdi Ammi2026-03-11🤖 cs.AI

Zero-Shot and Supervised Bird Image Segmentation Using Foundation Models: A Dual-Pipeline Approach with Grounding DINO~1.5, YOLOv11, and SAM~2.1

This paper proposes a dual-pipeline framework for bird image segmentation that leverages the frozen SAM 2.1 backbone with either a zero-shot Grounding DINO 1.5 detector or a supervised fine-tuned YOLOv11 detector, achieving state-of-the-art performance on the CUB-200-2011 dataset while eliminating the need for retraining the segmentation model across different species or domains.

Abhinav Munagala2026-03-11🤖 cs.AI

Pri4R: Learning World Dynamics for Vision-Language-Action Models with Privileged 4D Representation

Pri4R is a simple yet effective method that enhances Vision-Language-Action models with an implicit understanding of world dynamics by training them to predict 3D point tracks using privileged 4D information, thereby significantly improving physical manipulation performance without adding inference overhead.

Jisoo Kim, Jungbin Cho, Sanghyeok Chu, Ananya Bal, Jinhyung Kim, Gunhee Lee, Sihaeng Lee, Seung Hwan Kim, Bohyung Han, Hyunmin Lee, Laszlo A. Jeni, Seungryong Kim2026-03-11🤖 cs.AI

Reasoning as Gradient: Scaling MLE Agents Beyond Tree Search

This paper introduces \textsc{Gome}, a gradient-based MLE agent that outperforms traditional tree search methods on MLE-Bench by mapping diagnostic reasoning to gradient computation, demonstrating that as LLM reasoning capabilities improve, gradient-based optimization becomes increasingly superior to exhaustive enumeration.

Yifei Zhang, Xu Yang, Xiao Yang, Bowen Xian, Qizheng Li, Shikai Fang, Jingyuan Li, Jian Wang, Mingrui Xu, Weiqing Liu, Jiang Bian2026-03-11🤖 cs.AI

Boltzmann-based Exploration for Robust Decentralized Multi-Agent Planning (Extended Version)

This paper introduces Coordinated Boltzmann MCTS (CB-MCTS), a novel decentralized multi-agent planning algorithm that replaces deterministic UCT with a stochastic Boltzmann policy and decaying entropy bonus to overcome the limitations of existing methods in sparse or deceptive reward environments.

Nhat D. A. Nguyen, Duong D. Nguyen, Gianluca Rizzo, Hung X. Nguyen2026-03-11🤖 cs.AI

FinTexTS: Financial Text-Paired Time-Series Dataset via Semantic-Based and Multi-Level Pairing

The paper introduces FinTexTS, a large-scale financial text-paired time-series dataset constructed via a novel semantic-based and multi-level pairing framework that overcomes the limitations of simple keyword matching by leveraging LLMs to align news articles with stock prices across macro, sector, related company, and target-company levels, thereby significantly improving stock price forecasting performance.

Jaehoon Lee, Suhwan Park, Tae Yoon Lim, Seunghan Lee, Jun Seo, Dongwan Kang, Hwanil Choi, Minjae Kim, Sungdong Yoo, SoonYoung Lee, Yongjae Lee, Wonbin Ahn2026-03-11🤖 cs.AI

SPARC: Spatial-Aware Path Planning via Attentive Robot Communication

The paper proposes SPARC, a spatial-aware path planning framework that introduces a Relation-enhanced Multi-Head Attention (RMHA) mechanism to explicitly encode pairwise distances into robot communication, significantly improving decentralized multi-robot coordination and zero-shot generalization in high-density environments compared to existing methods.

Sayang Mu, Xiangyu Wu, Bo An2026-03-11🤖 cs.AI

← Previous Next →