cs.AI papers | Gist.Science

When Fine-Tuning Fails and when it Generalises: Role of Data Diversity and Mixed Training in LLM-based TTS

This paper demonstrates that LoRA fine-tuning of compact LLM backbones significantly enhances voice cloning performance in terms of perceptual quality, speaker fidelity, and signal-to-noise ratio, provided the training data possesses sufficient acoustic diversity.

Anupam Purwar, Aditya Choudhary2026-03-12🤖 cs.AI

Historical Consensus: Preventing Posterior Collapse via Iterative Selection of Gaussian Mixture Priors

This paper introduces Historical Consensus Training, an iterative method that eliminates posterior collapse in Variational Autoencoders by progressively refining Gaussian Mixture Model priors to create a stable parameter barrier that prevents the degeneration of latent variables, achieving robust representations without relying on specific architectural constraints or hyperparameter tuning.

Zegu Zhang, Jian Zhang2026-03-12🤖 cs.LG

Safe RLHF Beyond Expectation: Stochastic Dominance for Universal Spectral Risk Control

This paper proposes Risk-sensitive Alignment via Dominance (RAD), a novel Safe RLHF framework that replaces traditional expected cost constraints with First-Order Stochastic Dominance constraints within an Optimal Transport framework to universally control spectral risk measures, thereby achieving superior robustness against tail risks and out-of-distribution failures while maintaining helpfulness.

Yaswanth Chittepu, Ativ Joshi, Rajarshi Bhattacharjee, Scott Niekum2026-03-12🤖 cs.LG

Contact Coverage-Guided Exploration for General-Purpose Dexterous Manipulation

This paper proposes Contact Coverage-Guided Exploration (CCGE), a general-purpose exploration method that leverages contact state counters and energy-based rewards to guide dexterous hands in discovering diverse contact patterns, thereby significantly improving training efficiency and real-world transferability across complex manipulation tasks.

Zixuan Liu, Ruoyi Qiao, Chenrui Tie, Xuanwei Liu, Yunfan Lou, Chongkai Gao, Zhixuan Xu, Lin Shao2026-03-12🤖 cs.AI

GroundCount: Grounding Vision-Language Models with Object Detection for Mitigating Counting Hallucinations

GroundCount proposes a framework that augments Vision-Language Models with explicit spatial grounding from object detection models to significantly mitigate counting hallucinations, demonstrating that structured prompt-based integration outperforms feature-level fusion and yields consistent accuracy improvements across most architectures.

Boyuan Chen, Minghao Shao, Siddharth Garg, Ramesh Karri, Muhammad Shafique2026-03-12🤖 cs.AI

Artificial Intelligence as a Catalyst for Innovation in Software Engineering

This paper argues that integrating Artificial Intelligence, particularly through Machine Learning and Natural Language Processing, acts as a catalyst for innovation in software engineering by automating tedious tasks and enhancing Agile practices to better manage evolving requirements while maintaining quality and speed.

Carlos Alberto Fernández-y-Fernández, Jorge R. Aguilar-Cisneros2026-03-12🤖 cs.AI

RCTs & Human Uplift Studies: Methodological Challenges and Practical Solutions for Frontier AI Evaluation

This paper synthesizes findings from interviews with 16 experts to identify methodological challenges in applying randomized controlled trials to evaluate frontier AI's impact on human performance and proposes practical solutions to address validity issues in high-stakes decision-making.

Patricia Paskov, Kevin Wei, Shen Zhou Hong, Dan Bateyko, Xavier Roberts-Gaal, Carson Ezell, Gailius Praninskas, Valerie Chen, Umang Bhatt, Ella Guest2026-03-12🤖 cs.AI

Does AI See like Art Historians? Interpreting How Vision Language Models Recognize Artistic Style

Through an interdisciplinary collaboration between computer scientists and art historians, this paper employs latent-space decomposition and quantitative analysis to reveal that Vision Language Models predict artistic styles using concepts that are largely coherent and relevant to human experts, often aligning with art historical reasoning even when utilizing formally interpreted features.

Marvin Limpijankit, Milad Alshomary, Yassin Oulad Daoud, Amith Ananthram, Tim Trombley, Elias Stengel-Eskin, Mohit Bansal, Noam M. Elcott, Kathleen McKeown2026-03-12🤖 cs.AI

Instruction set for the representation of graphs

This paper introduces IsalGraph, a novel method that encodes any finite simple graph into a compact, valid nine-character instruction string using a virtual machine, enabling efficient canonical representation and demonstrating strong correlation between string edit distance and graph edit distance for applications in similarity search and language modeling.

Ezequiel Lopez-Rubio, Mario Pascual-Gonzalez2026-03-12💬 cs.CL

V2M-Zero: Zero-Pair Time-Aligned Video-to-Music Generation

V2M-Zero introduces a zero-pair video-to-music generation framework that achieves superior temporal synchronization and semantic alignment by leveraging shared intra-modal temporal structures via event curves, eliminating the need for paired training data or cross-modal supervision.

Yan-Bo Lin, Jonah Casebeer, Long Mai, Aniruddha Mahapatra, Gedas Bertasius, Nicholas J. Bryan2026-03-12🤖 cs.AI

Neural Field Thermal Tomography: A Differentiable Physics Framework for Non-Destructive Evaluation

The paper introduces Neural Field Thermal Tomography (NeFTY), a differentiable physics framework that parameterizes 3D material diffusivity as a continuous neural field optimized via a rigorous numerical solver to achieve high-resolution, quantitative reconstruction of subsurface defects from transient surface temperature measurements, overcoming the limitations of traditional 1D approximations and soft-constrained PINNs.

Tao Zhong, Yixun Hu, Dongzhe Zheng, Aditya Sood, Christine Allen-Blanchette2026-03-12🔬 cond-mat.mtrl-sci

LiTo: Surface Light Field Tokenization

LiTo introduces a unified 3D latent representation that tokenizes surface light fields from RGB-depth images to jointly model geometry and view-dependent appearance, enabling high-fidelity 3D object generation with realistic specular effects and consistent lighting.

Jen-Hao Rick Chang, Xiaoming Zhao, Dorian Chan, Oncel Tuzel2026-03-12🤖 cs.AI

COMIC: Agentic Sketch Comedy Generation

The paper presents COMIC, a fully automated AI system that generates high-quality, diverse comedic sketch videos by employing a multi-agent framework with specialized roles and LLM-based critics trained on YouTube data to iteratively refine content toward professional standards.

Susung Hong, Brian Curless, Ira Kemelmacher-Shlizerman, Steve Seitz2026-03-12💬 cs.CL

SDR-GAIN: A High Real-Time Occluded Pedestrian Pose Completion Method for Autonomous Driving

This paper proposes SDR-GAIN, a novel real-time framework that utilizes self-supervised adversarial learning on keypoint coordinate distributions to accurately reconstruct occluded pedestrian poses for autonomous driving, outperforming existing methods in both accuracy and inference speed.

Honghao Fu, Yongli Gu, Yidong Yan + 3 more2026-03-11🤖 cs.AI

A Temporal-Spectral Fusion Transformer with Subject-Specific Adapter for Enhancing RSVP-BCI Decoding

This paper proposes TSformer-SA, a novel framework that integrates a temporal-spectral fusion transformer with subject-specific adapters and cross-view consistency learning to significantly enhance RSVP-BCI decoding performance while minimizing the training data and preparation time required for new subjects.

Xujin Li, Wei Wei, Shuang Qiu + 1 more2026-03-11🤖 cs.AI

PnLCalib: Sports Field Registration via Points and Lines Optimization

The paper proposes PnLCalib, an optimization-based calibration pipeline for sports field registration that leverages a 3D soccer field model, keypoints, and a novel line-based refinement module to achieve superior accuracy and robustness in diverse broadcast scenarios compared to traditional search-based methods.

Marc Gutiérrez-Pérez, Antonio Agudo2026-03-11🤖 cs.AI

DP-IQA: Utilizing Diffusion Prior for Blind Image Quality Assessment in the Wild

The paper proposes DP-IQA, a novel blind image quality assessment method that leverages the robust perceptual priors of pre-trained Stable Diffusion models and distills them into a lightweight CNN to achieve state-of-the-art generalization on in-the-wild datasets with limited training data.

Honghao Fu, Yufei Wang, Wenhan Yang + 2 more2026-03-11🤖 cs.AI

Dance of the ADS: Orchestrating Failures through Historically-Informed Scenario Fuzzing

This paper introduces ScenarioFuzz, a novel scenario-based fuzzing methodology that leverages historical test data, map networks, and graph neural networks to autonomously generate and optimize high-risk scenarios, significantly reducing testing time while uncovering numerous safety-critical bugs in autonomous driving systems.

Tong Wang, Taotao Gu, Huan Deng + 3 more2026-03-11🤖 cs.AI

Multi-agent Assessment with QoS Enhancement for HD Map Updates in a Vehicular Network

This paper proposes and evaluates a distributed multi-agent Q-learning solution for HD map updates in vehicular networks that reduces computational burdens and compatibility issues while significantly improving time latencies across various traffic scenarios compared to single-agent approaches.

Jeffrey Redondo, Nauman Aslam, Juan Zhang + 1 more2026-03-11🤖 cs.AI

Enhancing Heterogeneous Multi-Agent Cooperation in Decentralized MARL via GNN-driven Intrinsic Rewards

This paper proposes CoHet, a novel algorithm that leverages Graph Neural Network-driven intrinsic rewards to enable effective decentralized learning and cooperation among heterogeneous multi-agent systems despite challenges like partial observability and reward sparsity, demonstrating superior performance over state-of-the-art methods in standard benchmarks.

Jahir Sadik Monon, Deeparghya Dutta Barua, Md. Mosaddek Khan2026-03-11🤖 cs.AI

← Previous Next →