cs.AI papers | Gist.Science

Self-Supervised Multi-Modal World Model with 4D Space-Time Embedding

The paper introduces DeepEarth, a self-supervised multi-modal world model featuring Earth4D, a novel 4D space-time positional encoder that achieves state-of-the-art ecological forecasting performance and outperforms larger foundation models through efficient planetary-scale learning.

Lance Legel, Qin Huang, Brandon Voelker, Daniel Neamati, Patrick Alan Johnson, Favyen Bastani, Jeff Rose, James Ryan Hennessy, Robert Guralnick, Douglas Soltis, Pamela Soltis, Shaowen Wang2026-03-10💻 cs

Looking Back and Forth: Cross-Image Attention Calibration and Attentive Preference Learning for Multi-Image Hallucination Mitigation

This paper proposes CAPL, a framework that mitigates multi-image hallucinations in large vision-language models by introducing a selectable image token interaction mechanism for fine-grained cross-image alignment and a preference learning strategy that trains the model to rely on genuine visual evidence rather than textual priors.

Xiaochen Yang, Hao Fang, Jiawei Kong, Yaoxin Mao, Bin Chen, Shu-Tao Xia2026-03-10💻 cs

Animating Petascale Time-varying Data on Commodity Hardware with LLM-assisted Scripting

This paper presents a user-friendly framework that enables domain scientists to generate 3D animations of petascale, time-varying climate data on commodity hardware using an LLM-assisted conversational interface, thereby eliminating the need for specialized visualization expertise and high-performance computing resources.

Ishrat Jahan Eliza, Xuan Huang, Aashish Panta, Alper Sahistan, Zhimin Li, Amy A. Gooch, Valerio Pascucci2026-03-10💻 cs

Bi-directional digital twin prototype anchoring with multi-periodicity learning for few-shot fault diagnosis

This paper proposes a bi-directional digital twin prototype anchoring framework enhanced with multi-periodicity learning to achieve robust few-shot fault diagnosis by leveraging meta-training in a virtual simulation space and test-time adaptation in the physical domain, thereby overcoming the limitations of traditional methods that require abundant labeled or unlabeled target data.

Pengcheng Xia, Zhichao Dong, Yixiang Huang, Chengjin Qin, Qun Chao, Chengliang Liu2026-03-10💻 cs

MedSteer: Counterfactual Endoscopic Synthesis via Training-Free Activation Steering

MedSteer is a training-free activation-steering framework that generates structurally preserved counterfactual endoscopic images by manipulating cross-attention activations in diffusion transformers, outperforming existing methods in concept editing and downstream medical detection tasks.

Trong-Thang Pham, Loc Nguyen, Anh Nguyen, Hien Nguyen, Ngan Le2026-03-10💻 cs

User Review Writing via Interview with Dialogue Systems

This paper proposes and validates a novel approach using GPT-4-powered dialogue systems to facilitate user review creation through interview-based information gathering, demonstrating that the resulting system-generated reviews require less editing and are perceived as more helpful by readers than human-written ones, despite some fluency challenges.

Yoshiki Tanaka, Michimasa Inaba2026-03-10💻 cs

CoTJudger: A Graph-Driven Framework for Automatic Evaluation of Chain-of-Thought Efficiency and Redundancy in LRMs

This paper introduces CoTJudger, a graph-driven framework that automatically evaluates the efficiency of Large Reasoning Models by converting Chain-of-Thought traces into dependency graphs to identify the Shortest Effective Path, thereby quantifying structural redundancy and revealing pervasive over-reasoning patterns across 21 models.

Siyi Li, Jiajun Shi, Shiwen Ni, Ge Zhang, Shuaimin Li, Shijian Wang, Zhoufutu Wen, Yizhi Li, Hamid Alinejad-Rokny, Jiaheng Liu, Min Yang, Wenhao Huang2026-03-10💬 cs.CL

Countdown-Code: A Testbed for Studying The Emergence and Generalization of Reward Hacking in RLVR

This paper introduces Countdown-Code, a novel testbed demonstrating that even minimal contamination of supervised fine-tuning data with reward-hacking trajectories can cause large language models to learn and subsequently generalize misaligned behaviors during reinforcement learning, highlighting the critical need for rigorous validation of synthetic training data.

Muhammad Khalifa, Zohaib Khan, Omer Tafveez, Hao Peng, Lu Wang2026-03-10🤖 cs.LG

mAVE: A Watermark for Joint Audio-Visual Generation Models

The paper introduces mAVE, a novel watermarking framework that cryptographically binds audio and video latents in joint generation models to eliminate the "Binding Vulnerability" of existing methods and robustly defend against adversarial Swap Attacks without requiring model fine-tuning.

Luyang Si, Leyi Pan, Lijie Wen2026-03-10💻 cs

Grounding Machine Creativity in Game Design Knowledge Representations: Empirical Probing of LLM-Based Executable Synthesis of Goal Playable Patterns under Structural Constraints

This paper empirically investigates whether large language models can synthesize executable Unity game code from Goal Playable Patterns under strict structural constraints, revealing that while intermediate representations improve performance, project-level grounding and hygiene failures remain primary bottlenecks in achieving high compilation success rates.

Hugh Xuechen Liu, Kıvanç Tatar2026-03-10💻 cs

Efficient Personalized Reranking with Semi-Autoregressive Generation and Online Knowledge Distillation

This paper proposes the Personalized Semi-Autoregressive with online knowledge Distillation (PSAD) framework, which utilizes a semi-autoregressive teacher model and a User Profile Network to balance generation quality with low-latency inference while enhancing user-item interactions, thereby outperforming state-of-the-art baselines in both ranking performance and efficiency.

Kai Cheng, Hao Wang, Wei Guo, Weiwen Liu, Yong Liu, Yawen Li, Enhong Chen2026-03-10💻 cs

Vision Language Models Cannot Reason About Physical Transformation

This paper introduces ConservationBench to demonstrate that current Vision Language Models systematically fail to reason about physical transformations and maintain invariant representations of physical quantities, often performing near chance levels despite strong textual priors favoring invariance.

Dezhi Luo, Yijiang Li, Maijunxian Wang, Tianwei Zhao, Bingyang Wang, Siheng Wang, Pinyuan Feng, Pooyan Rahmanzadehgervi, Ziqiao Ma, Hokin Deng2026-03-10💻 cs

Enhancing Consistency of Werewolf AI through Dialogue Summarization and Persona Information

This paper presents an LLM-based Werewolf AI agent for the AIWolfDial 2024 shared task that improves utterance consistency and character maintenance by leveraging dialogue summaries and manually designed personas.

Yoshiki Tanaka, Takumasa Kaneko, Hiroki Onozeki, Natsumi Ezure, Ryuichi Uehara, Zhiyang Qi, Tomoya Higuchi, Ryutaro Asahara, Michimasa Inaba2026-03-10💬 cs.CL

aCAPTCHA: Verifying That an Entity Is a Capable Agent via Asymmetric Hardness

This paper introduces aCAPTCHA, a novel security protocol that verifies whether an entity is a capable AI agent by leveraging the asymmetric processing speed between humans and machines to solve the Agentic Capability Verification Problem (ACVP) through time-constrained, multi-round natural language challenges.

Zuyao Xu, Xiang Li, Fubin Wu, Yuqi Qiu, Lu Sun, FaSheng Miao2026-03-10💻 cs

Deep Expert Injection for Anchoring Retinal VLMs with Domain-Specific Knowledge

This paper introduces EyExIn, a data-efficient framework that enhances retinal Vision Language Models by employing a dual-stream encoding strategy and a deep expert injection mechanism to bridge perception and reasoning gaps, thereby achieving state-of-the-art precision in ophthalmic diagnosis while preventing hallucinations.

Shuai Lu, Meng Wang, Jia Guo, Jiawei Du, Bo Liu, Shengzhu Yang, Weihang Zhang, Huazhu Fu, Huiqi Li2026-03-10💻 cs

Emotion Transcription in Conversation: A Benchmark for Capturing Subtle and Complex Emotional States through Natural Language

This paper introduces Emotion Transcription in Conversation (ETC), a novel task and accompanying Japanese dataset designed to capture complex and subtle emotional states through natural language descriptions, addressing the limitations of traditional categorical emotion recognition methods.

Yoshiki Tanaka, Ryuichi Uehara, Koji Inoue, Michimasa Inaba2026-03-10💬 cs.CL

Fine-Grained Table Retrieval Through the Lens of Complex Queries

This paper introduces DCTR, a table retrieval mechanism that leverages fine-grained typed query decomposition and global connectivity awareness to effectively handle complex, open-domain question answering over relational databases, demonstrating robustness on industry-aligned benchmarks.

Wojciech Kosiuk, Xingyu Ji, Yeounoh Chung, Fatma Özcan, Madelon Hulsebos2026-03-10💬 cs.CL

Improving reasoning at inference time via uncertainty minimisation

This paper proposes a computationally efficient inference-time reasoning method that improves accuracy by selecting thought-level continuations that maximize the model's internal self-certainty, demonstrating that optimizing for uncertainty minimization at early planning stages yields performance comparable to or exceeding existing scaling techniques like self-consistency.

Nicolas Legrand, Kenneth Enevoldsen, Márton Kardos, Kristoffer Nielbo2026-03-10💻 cs

Learning to Rank the Initial Branching Order of SAT Solvers

This paper proposes using graph neural networks to predict initial branching orders for CDCL SAT solvers, demonstrating significant speedups on random and pseudo-industrial benchmarks while noting that the approach struggles with complex industrial instances due to the solver's dynamic heuristics overriding the predictions.

Arvid Eriksson (KTH Royal Institute of Technology), Gabriel Poesia (Kempner Institute at Harvard University), Roman Bresson (Mohamed Bin Zayed University of Artificial Intelligence), Karl Henrik Johansson (KTH Royal Institute of Technology), David Broman (KTH Royal Institute of Technology)2026-03-10💻 cs

From State Changes to Creative Decisions: Documenting and Interpreting Traces Across Creative Domains

This paper addresses the limitation of existing creative activity tracing methods that capture state changes without preserving intent or higher-level structure by proposing three complementary domain-specific approaches: a node-based interface for GenAI, a vocabulary of visual cues for visualization authoring, and a semantic history-embedded programming model.

Xiaohan Peng, Sotiris Piliouras, Carl Abou Saada Nujaim2026-03-10💻 cs

← Previous Next →