cs.AI papers | Gist.Science

GroundCount: Grounding Vision-Language Models with Object Detection for Mitigating Counting Hallucinations

GroundCount proposes a framework that augments Vision-Language Models with explicit spatial grounding from object detection models to significantly mitigate counting hallucinations, demonstrating that structured prompt-based integration outperforms feature-level fusion and yields consistent accuracy improvements across most architectures.

Boyuan Chen, Minghao Shao, Siddharth Garg, Ramesh Karri, Muhammad Shafique2026-03-12🤖 cs.AI

Artificial Intelligence as a Catalyst for Innovation in Software Engineering

This paper argues that integrating Artificial Intelligence, particularly through Machine Learning and Natural Language Processing, acts as a catalyst for innovation in software engineering by automating tedious tasks and enhancing Agile practices to better manage evolving requirements while maintaining quality and speed.

Carlos Alberto Fernández-y-Fernández, Jorge R. Aguilar-Cisneros2026-03-12🤖 cs.AI

RCTs & Human Uplift Studies: Methodological Challenges and Practical Solutions for Frontier AI Evaluation

This paper synthesizes findings from interviews with 16 experts to identify methodological challenges in applying randomized controlled trials to evaluate frontier AI's impact on human performance and proposes practical solutions to address validity issues in high-stakes decision-making.

Patricia Paskov, Kevin Wei, Shen Zhou Hong, Dan Bateyko, Xavier Roberts-Gaal, Carson Ezell, Gailius Praninskas, Valerie Chen, Umang Bhatt, Ella Guest2026-03-12🤖 cs.AI

Does AI See like Art Historians? Interpreting How Vision Language Models Recognize Artistic Style

Through an interdisciplinary collaboration between computer scientists and art historians, this paper employs latent-space decomposition and quantitative analysis to reveal that Vision Language Models predict artistic styles using concepts that are largely coherent and relevant to human experts, often aligning with art historical reasoning even when utilizing formally interpreted features.

Marvin Limpijankit, Milad Alshomary, Yassin Oulad Daoud, Amith Ananthram, Tim Trombley, Elias Stengel-Eskin, Mohit Bansal, Noam M. Elcott, Kathleen McKeown2026-03-12🤖 cs.AI

Instruction set for the representation of graphs

This paper introduces IsalGraph, a novel method that encodes any finite simple graph into a compact, valid nine-character instruction string using a virtual machine, enabling efficient canonical representation and demonstrating strong correlation between string edit distance and graph edit distance for applications in similarity search and language modeling.

Ezequiel Lopez-Rubio, Mario Pascual-Gonzalez2026-03-12💬 cs.CL

V2M-Zero: Zero-Pair Time-Aligned Video-to-Music Generation

V2M-Zero introduces a zero-pair video-to-music generation framework that achieves superior temporal synchronization and semantic alignment by leveraging shared intra-modal temporal structures via event curves, eliminating the need for paired training data or cross-modal supervision.

Yan-Bo Lin, Jonah Casebeer, Long Mai, Aniruddha Mahapatra, Gedas Bertasius, Nicholas J. Bryan2026-03-12🤖 cs.AI

Neural Field Thermal Tomography: A Differentiable Physics Framework for Non-Destructive Evaluation

The paper introduces Neural Field Thermal Tomography (NeFTY), a differentiable physics framework that parameterizes 3D material diffusivity as a continuous neural field optimized via a rigorous numerical solver to achieve high-resolution, quantitative reconstruction of subsurface defects from transient surface temperature measurements, overcoming the limitations of traditional 1D approximations and soft-constrained PINNs.

Tao Zhong, Yixun Hu, Dongzhe Zheng, Aditya Sood, Christine Allen-Blanchette2026-03-12🔬 cond-mat.mtrl-sci

LiTo: Surface Light Field Tokenization

LiTo introduces a unified 3D latent representation that tokenizes surface light fields from RGB-depth images to jointly model geometry and view-dependent appearance, enabling high-fidelity 3D object generation with realistic specular effects and consistent lighting.

Jen-Hao Rick Chang, Xiaoming Zhao, Dorian Chan, Oncel Tuzel2026-03-12🤖 cs.AI

COMIC: Agentic Sketch Comedy Generation

The paper presents COMIC, a fully automated AI system that generates high-quality, diverse comedic sketch videos by employing a multi-agent framework with specialized roles and LLM-based critics trained on YouTube data to iteratively refine content toward professional standards.

Susung Hong, Brian Curless, Ira Kemelmacher-Shlizerman, Steve Seitz2026-03-12💬 cs.CL

SDR-GAIN: A High Real-Time Occluded Pedestrian Pose Completion Method for Autonomous Driving

This paper proposes SDR-GAIN, a novel real-time framework that utilizes self-supervised adversarial learning on keypoint coordinate distributions to accurately reconstruct occluded pedestrian poses for autonomous driving, outperforming existing methods in both accuracy and inference speed.

Honghao Fu, Yongli Gu, Yidong Yan + 3 more2026-03-11🤖 cs.AI

A Temporal-Spectral Fusion Transformer with Subject-Specific Adapter for Enhancing RSVP-BCI Decoding

This paper proposes TSformer-SA, a novel framework that integrates a temporal-spectral fusion transformer with subject-specific adapters and cross-view consistency learning to significantly enhance RSVP-BCI decoding performance while minimizing the training data and preparation time required for new subjects.

Xujin Li, Wei Wei, Shuang Qiu + 1 more2026-03-11🤖 cs.AI

PnLCalib: Sports Field Registration via Points and Lines Optimization

The paper proposes PnLCalib, an optimization-based calibration pipeline for sports field registration that leverages a 3D soccer field model, keypoints, and a novel line-based refinement module to achieve superior accuracy and robustness in diverse broadcast scenarios compared to traditional search-based methods.

Marc Gutiérrez-Pérez, Antonio Agudo2026-03-11🤖 cs.AI

DP-IQA: Utilizing Diffusion Prior for Blind Image Quality Assessment in the Wild

The paper proposes DP-IQA, a novel blind image quality assessment method that leverages the robust perceptual priors of pre-trained Stable Diffusion models and distills them into a lightweight CNN to achieve state-of-the-art generalization on in-the-wild datasets with limited training data.

Honghao Fu, Yufei Wang, Wenhan Yang + 2 more2026-03-11🤖 cs.AI

Dance of the ADS: Orchestrating Failures through Historically-Informed Scenario Fuzzing

This paper introduces ScenarioFuzz, a novel scenario-based fuzzing methodology that leverages historical test data, map networks, and graph neural networks to autonomously generate and optimize high-risk scenarios, significantly reducing testing time while uncovering numerous safety-critical bugs in autonomous driving systems.

Tong Wang, Taotao Gu, Huan Deng + 3 more2026-03-11🤖 cs.AI

Multi-agent Assessment with QoS Enhancement for HD Map Updates in a Vehicular Network

This paper proposes and evaluates a distributed multi-agent Q-learning solution for HD map updates in vehicular networks that reduces computational burdens and compatibility issues while significantly improving time latencies across various traffic scenarios compared to single-agent approaches.

Jeffrey Redondo, Nauman Aslam, Juan Zhang + 1 more2026-03-11🤖 cs.AI

Enhancing Heterogeneous Multi-Agent Cooperation in Decentralized MARL via GNN-driven Intrinsic Rewards

This paper proposes CoHet, a novel algorithm that leverages Graph Neural Network-driven intrinsic rewards to enable effective decentralized learning and cooperation among heterogeneous multi-agent systems despite challenges like partial observability and reward sparsity, demonstrating superior performance over state-of-the-art methods in standard benchmarks.

Jahir Sadik Monon, Deeparghya Dutta Barua, Md. Mosaddek Khan2026-03-11🤖 cs.AI

Sparse Variational Student-t Processes for Heavy-tailed Modeling

This paper introduces Sparse Variational Student-t Processes (SVTP), a scalable framework that extends sparse inducing point methods to Student-t processes via novel inference algorithms and natural gradient optimization, achieving superior robustness to outliers and heavy-tailed data with significantly faster convergence and lower prediction error compared to sparse Gaussian processes on large datasets.

Jian Xu, Delu Zeng, John Paisley2026-03-11🤖 cs.AI

Robust Training of Neural Networks at Arbitrary Precision and Sparsity

This paper introduces a unified framework that models quantization and sparsification as additive noise to derive a principled, noise-corrective gradient path, enabling the stable training of neural networks at arbitrary low precisions and sparsity levels without relying on heuristic estimators like the Straight-Through Estimator.

Chengxi Ye, Grace Chu, Yanfeng Liu, Yichi Zhang, Lukasz Lew, Li Zhang, Mark Sandler, Andrew Howard2026-03-11🤖 cs.AI

DRUPI: Dataset Reduction Using Privileged Information

The paper introduces DRUPI (Dataset Condensation using Privileged Information), a framework that enhances dataset condensation by synthesizing auxiliary privileged information, such as feature or attention labels, alongside reduced data to significantly improve model training performance across various benchmarks.

Shaobo Wang, Youxin Jiang, Tianle Niu, Yantai Yang, Ruiji Zhang, Shuhao Hu, Shuaiyu Zhang, Chenghao Sun, Weiya Li, Conghui He, Xuming Hu, Linfeng Zhang2026-03-11🤖 cs.AI

LAYOUTDREAMER: Physics-guided Layout for Text-to-3D Compositional Scene Generation

LayoutDreamer is a novel framework that leverages 3D Gaussian Splatting and physics-guided scene graphs to generate high-quality, physically plausible, and controllable text-to-3D compositional scenes, achieving state-of-the-art performance in multi-object generation.

Yang Zhou, Zongjin He, Qixuan Li + 1 more2026-03-11🤖 cs.AI

← Previous Next →