cs papers | Gist.Science

VirtueBench: Evaluating Trustworthiness under Uncertainty in Long Video Understanding

This paper introduces VirtueBench, a new benchmark designed to evaluate the trustworthiness of Vision-Language Models in long video understanding by distinguishing between answerable and unanswerable cases to prevent misleading accuracy scores caused by guessing under uncertainty.

Xueqing Yu, Bohan Li, Yan Li, Zhenheng Yang2026-03-10💻 cs

Physics-Guided VLM Priors for All-Cloud Removal

This paper introduces PhyVLM-CR, a novel unified framework that integrates Vision-Language Model semantic priors with physical scattering parameters to seamlessly remove both thin and thick clouds from optical remote sensing imagery without explicit cloud-type segmentation, thereby achieving high-fidelity, hallucination-free surface reconstruction.

Liying Xu, Huifang Li, Huanfeng Shen2026-03-10💻 cs

Retinex Meets Language: A Physics-Semantics-Guided Underwater Image Enhancement Network

This paper proposes PSG-UIENet, a novel underwater image enhancement network that integrates Retinex-based illumination correction with CLIP-derived textual semantics to overcome the limitations of existing methods, supported by the introduction of a new large-scale image-text dataset (LUIQD-TD) and a specialized semantic similarity loss function.

Shixuan Xu, Yabo Liu, Junyu Dong, Xinghui Dong2026-03-10💻 cs

Aligning What EEG Can See: Structural Representations for Brain-Vision Matching

This paper introduces a novel framework for EEG-based visual decoding that aligns brain signals with intermediate visual layers via a proposed "Neural Visibility" concept and a Hierarchically Complementary Fusion mechanism, achieving state-of-the-art performance by significantly reducing cross-modal information mismatch.

Jingyi Tang, Shuai Jiang, Fei Su, Zhicheng Zhao2026-03-10💻 cs

Multi-TAP: Multi-criteria Target Adaptive Persona Modeling for Cross-Domain Recommendation

The paper proposes Multi-TAP, a multi-criteria target-adaptive persona framework that addresses data sparsity and intra-domain heterogeneity in cross-domain recommendation by explicitly modeling semantic personas and selectively transferring relevant source-domain signals, thereby outperforming state-of-the-art methods on real-world datasets.

Daehee Kang, Yeon-Chang Lee2026-03-10💻 cs

mAVE: A Watermark for Joint Audio-Visual Generation Models

The paper introduces mAVE, a novel watermarking framework that cryptographically binds audio and video latents in joint generation models to eliminate the "Binding Vulnerability" of existing methods and robustly defend against adversarial Swap Attacks without requiring model fine-tuning.

Luyang Si, Leyi Pan, Lijie Wen2026-03-10💻 cs

Exploring the Reasoning Depth of Small Language Models in Software Architecture: A Multidimensional Evaluation Framework Towards Software Engineering 2.0

This study benchmarks ten small language models on architectural decision record generation to establish a multidimensional evaluation framework, revealing that models exceeding 3 billion parameters excel in zero-shot reasoning while sub-2 billion models benefit most from fine-tuning, and that few-shot prompting effectively calibrates mid-sized models despite high semantic diversity often correlating with hallucinations.

Ha Vo, Nhut Tran, Khang Vo, Phat T. Tran-Truong, Son Ha2026-03-10💻 cs

Facial Expression Generation Aligned with Human Preference for Natural Dyadic Interaction

This paper proposes a facial expression generation method for natural dyadic interaction that leverages human feedback within a vision-language-action framework and reinforcement learning strategy to produce contextually appropriate, identity-independent expressions aligned with human preferences.

Xu Chen, Rui Gao, Xinjie Zhang, Haoyu Zhang, Che Sun, Zhi Gao, Yuwei Wu, Yunde Jia2026-03-10💻 cs

Randomise Alone, Reach as a Team

This paper investigates concurrent graph games with distributed randomization where team players lack a shared random source, establishing that memoryless strategies suffice for the threshold problem (placing it in $\exists\mathbb{R}$ and proving NP-hardness) and that almost-sure reachability is NP-complete, while introducing the IRATL logic and a corresponding solver.

Léonard Brice, Thomas A. Henzinger, Alipasha Montaseri, Ali Shafiee, K. S. Thejaswini2026-03-10💻 cs

ACLM: ADMM-Based Distributed Model Predictive Control for Collaborative Loco-Manipulation

This paper proposes ACLM, an ADMM-based distributed model predictive control framework that enables scalable, real-time collaborative loco-manipulation of heavy payloads by decomposing the global optimization problem into parallel robot-level subproblems while preserving dynamic coupling through consensus constraints.

Ziyi Zhou, Pengyuan Shu, Ruize Cao, Yuntian Zhao, Ye Zhao2026-03-10💻 cs

Towards Scalable Probabilistic Human Motion Prediction with Gaussian Processes for Safe Human-Robot Collaboration

This paper proposes a scalable, structured multitask variational Gaussian Process framework for full-body human motion prediction that achieves competitive accuracy with significantly fewer parameters and provides well-calibrated, interpretable uncertainty estimates essential for safe real-time human-robot collaboration.

Jinger Chong, Xiaotong Zhang, Kamal Youcef-Toumi2026-03-10💻 cs

NuNext: Reframing Nucleus Detection as Next-Point Detection

NuNext reframes nucleus detection in histopathology as a next-point prediction task using a multimodal large language model trained with spatial-aware soft supervision and reinforcement fine-tuning to achieve superior performance across nine benchmarks.

Zhongyi Shui, Honglin Li, Xiaozhong Ji, Ye Zhang, Zijiang Yang, Chenglu Zhu, Yuxuan Sun, Kai Yao, Conghui He, Cheng Tan2026-03-10💻 cs

Grounding Machine Creativity in Game Design Knowledge Representations: Empirical Probing of LLM-Based Executable Synthesis of Goal Playable Patterns under Structural Constraints

This paper empirically investigates whether large language models can synthesize executable Unity game code from Goal Playable Patterns under strict structural constraints, revealing that while intermediate representations improve performance, project-level grounding and hygiene failures remain primary bottlenecks in achieving high compilation success rates.

Hugh Xuechen Liu, Kıvanç Tatar2026-03-10💻 cs

AutoUE: Automated Generation of 3D Games in Unreal Engine via Multi-Agent Systems

This paper presents AutoUE, a novel multi-agent system that leverages retrieval-augmented generation and game design patterns to automatically generate, code, and test complete 3D games within Unreal Engine, effectively addressing the complexities of tool usage and workflow orchestration.

Lei Yin, Wentao Cheng, Zhida Qin, Tianyu Huang, Yidong Li, Gangyi Ding2026-03-10💻 cs

Efficient Personalized Reranking with Semi-Autoregressive Generation and Online Knowledge Distillation

This paper proposes the Personalized Semi-Autoregressive with online knowledge Distillation (PSAD) framework, which utilizes a semi-autoregressive teacher model and a User Profile Network to balance generation quality with low-latency inference while enhancing user-item interactions, thereby outperforming state-of-the-art baselines in both ranking performance and efficiency.

Kai Cheng, Hao Wang, Wei Guo, Weiwen Liu, Yong Liu, Yawen Li, Enhong Chen2026-03-10💻 cs

Vision Language Models Cannot Reason About Physical Transformation

This paper introduces ConservationBench to demonstrate that current Vision Language Models systematically fail to reason about physical transformations and maintain invariant representations of physical quantities, often performing near chance levels despite strong textual priors favoring invariance.

Dezhi Luo, Yijiang Li, Maijunxian Wang, Tianwei Zhao, Bingyang Wang, Siheng Wang, Pinyuan Feng, Pooyan Rahmanzadehgervi, Ziqiao Ma, Hokin Deng2026-03-10💻 cs

Learning From Failures: Efficient Reinforcement Learning Control with Episodic Memory

The paper proposes Failure Episodic Memory Alert (FEMA), a technique that stores and retrieves short-horizon failure experiences to prevent robots from relapsing into unstable states, thereby significantly improving sample efficiency and enabling successful long-horizon exploration in challenging contact-rich reinforcement learning tasks.

Chenyang Miao2026-03-10💻 cs

Efficient Chest X-ray Representation Learning via Semantic-Partitioned Contrastive Learning

This paper introduces Semantic-Partitioned Contrastive Learning (S-PCL), a streamlined self-supervised pre-training framework for Chest X-rays that achieves superior accuracy and computational efficiency by enforcing agreement between randomly partitioned semantic subsets, thereby eliminating the need for heavy augmentations, auxiliary decoders, or momentum encoders.

Wangyu Feng, Shawn Young, Lijian Xu2026-03-10💻 cs

aCAPTCHA: Verifying That an Entity Is a Capable Agent via Asymmetric Hardness

This paper introduces aCAPTCHA, a novel security protocol that verifies whether an entity is a capable AI agent by leveraging the asymmetric processing speed between humans and machines to solve the Agentic Capability Verification Problem (ACVP) through time-constrained, multi-round natural language challenges.

Zuyao Xu, Xiang Li, Fubin Wu, Yuqi Qiu, Lu Sun, FaSheng Miao2026-03-10💻 cs

TIQA: Human-Aligned Text Quality Assessment in Generated Images

This paper introduces TIQA, a human-aligned text quality assessment task and dataset for generated images, along with the ANTIQA method that significantly outperforms existing OCR and VLM-based metrics in predicting text rendering fidelity and improving downstream generation selection.

Kirill Koltsov, Aleksandr Gushchin, Dmitriy Vatolin, Anastasia Antsiferova2026-03-10💻 cs

← Previous Next →