cs papers | Gist.Science

MedSteer: Counterfactual Endoscopic Synthesis via Training-Free Activation Steering

MedSteer is a training-free activation-steering framework that generates structurally preserved counterfactual endoscopic images by manipulating cross-attention activations in diffusion transformers, outperforming existing methods in concept editing and downstream medical detection tasks.

Trong-Thang Pham, Loc Nguyen, Anh Nguyen, Hien Nguyen, Ngan Le2026-03-10💻 cs

Geometry and design of popup structures

This paper explores the geometry of popup structures by combining origami and kirigami principles to establish a design pipeline that uses discrete surface curvature definitions to generate cut-fold patterns capable of creating complex, deployable shapes with applications in drag reduction, packaging, and architecture.

Jay Jayeshbhai Chavda, S Ganga Prasath2026-03-10✓ Author reviewed ⓘ💻 cs

Morphology-Independent Facial Expression Imitation for Human-Face Robots

This paper proposes a morphology-independent facial expression imitation method that decouples expression semantics from facial morphology using self-supervised learning and an error-perceiving transfer module, validated on a custom-designed human-face robot named Pengrui to achieve more natural and accurate human-robot interaction.

Xu Chen, Rui Gao, Che Sun, Zhehang Liu, Yuwei Wu, Shuo Yang, Yunde Jia2026-03-10💻 cs

User Review Writing via Interview with Dialogue Systems

This paper proposes and validates a novel approach using GPT-4-powered dialogue systems to facilitate user review creation through interview-based information gathering, demonstrating that the resulting system-generated reviews require less editing and are perceived as more helpful by readers than human-written ones, despite some fluency challenges.

Yoshiki Tanaka, Michimasa Inaba2026-03-10💻 cs

VirtueBench: Evaluating Trustworthiness under Uncertainty in Long Video Understanding

This paper introduces VirtueBench, a new benchmark designed to evaluate the trustworthiness of Vision-Language Models in long video understanding by distinguishing between answerable and unanswerable cases to prevent misleading accuracy scores caused by guessing under uncertainty.

Xueqing Yu, Bohan Li, Yan Li, Zhenheng Yang2026-03-10💻 cs

Physics-Guided VLM Priors for All-Cloud Removal

This paper introduces PhyVLM-CR, a novel unified framework that integrates Vision-Language Model semantic priors with physical scattering parameters to seamlessly remove both thin and thick clouds from optical remote sensing imagery without explicit cloud-type segmentation, thereby achieving high-fidelity, hallucination-free surface reconstruction.

Liying Xu, Huifang Li, Huanfeng Shen2026-03-10💻 cs

Retinex Meets Language: A Physics-Semantics-Guided Underwater Image Enhancement Network

This paper proposes PSG-UIENet, a novel underwater image enhancement network that integrates Retinex-based illumination correction with CLIP-derived textual semantics to overcome the limitations of existing methods, supported by the introduction of a new large-scale image-text dataset (LUIQD-TD) and a specialized semantic similarity loss function.

Shixuan Xu, Yabo Liu, Junyu Dong, Xinghui Dong2026-03-10💻 cs

Aligning What EEG Can See: Structural Representations for Brain-Vision Matching

This paper introduces a novel framework for EEG-based visual decoding that aligns brain signals with intermediate visual layers via a proposed "Neural Visibility" concept and a Hierarchically Complementary Fusion mechanism, achieving state-of-the-art performance by significantly reducing cross-modal information mismatch.

Jingyi Tang, Shuai Jiang, Fei Su, Zhicheng Zhao2026-03-10💻 cs

Multi-TAP: Multi-criteria Target Adaptive Persona Modeling for Cross-Domain Recommendation

The paper proposes Multi-TAP, a multi-criteria target-adaptive persona framework that addresses data sparsity and intra-domain heterogeneity in cross-domain recommendation by explicitly modeling semantic personas and selectively transferring relevant source-domain signals, thereby outperforming state-of-the-art methods on real-world datasets.

Daehee Kang, Yeon-Chang Lee2026-03-10💻 cs

mAVE: A Watermark for Joint Audio-Visual Generation Models

The paper introduces mAVE, a novel watermarking framework that cryptographically binds audio and video latents in joint generation models to eliminate the "Binding Vulnerability" of existing methods and robustly defend against adversarial Swap Attacks without requiring model fine-tuning.

Luyang Si, Leyi Pan, Lijie Wen2026-03-10💻 cs

Exploring the Reasoning Depth of Small Language Models in Software Architecture: A Multidimensional Evaluation Framework Towards Software Engineering 2.0

This study benchmarks ten small language models on architectural decision record generation to establish a multidimensional evaluation framework, revealing that models exceeding 3 billion parameters excel in zero-shot reasoning while sub-2 billion models benefit most from fine-tuning, and that few-shot prompting effectively calibrates mid-sized models despite high semantic diversity often correlating with hallucinations.

Ha Vo, Nhut Tran, Khang Vo, Phat T. Tran-Truong, Son Ha2026-03-10💻 cs

Facial Expression Generation Aligned with Human Preference for Natural Dyadic Interaction

This paper proposes a facial expression generation method for natural dyadic interaction that leverages human feedback within a vision-language-action framework and reinforcement learning strategy to produce contextually appropriate, identity-independent expressions aligned with human preferences.

Xu Chen, Rui Gao, Xinjie Zhang, Haoyu Zhang, Che Sun, Zhi Gao, Yuwei Wu, Yunde Jia2026-03-10💻 cs

Randomise Alone, Reach as a Team

This paper investigates concurrent graph games with distributed randomization where team players lack a shared random source, establishing that memoryless strategies suffice for the threshold problem (placing it in $\exists\mathbb{R}$ and proving NP-hardness) and that almost-sure reachability is NP-complete, while introducing the IRATL logic and a corresponding solver.

Léonard Brice, Thomas A. Henzinger, Alipasha Montaseri, Ali Shafiee, K. S. Thejaswini2026-03-10💻 cs

ACLM: ADMM-Based Distributed Model Predictive Control for Collaborative Loco-Manipulation

This paper proposes ACLM, an ADMM-based distributed model predictive control framework that enables scalable, real-time collaborative loco-manipulation of heavy payloads by decomposing the global optimization problem into parallel robot-level subproblems while preserving dynamic coupling through consensus constraints.

Ziyi Zhou, Pengyuan Shu, Ruize Cao, Yuntian Zhao, Ye Zhao2026-03-10💻 cs

Towards Scalable Probabilistic Human Motion Prediction with Gaussian Processes for Safe Human-Robot Collaboration

This paper proposes a scalable, structured multitask variational Gaussian Process framework for full-body human motion prediction that achieves competitive accuracy with significantly fewer parameters and provides well-calibrated, interpretable uncertainty estimates essential for safe real-time human-robot collaboration.

Jinger Chong, Xiaotong Zhang, Kamal Youcef-Toumi2026-03-10💻 cs

NuNext: Reframing Nucleus Detection as Next-Point Detection

NuNext reframes nucleus detection in histopathology as a next-point prediction task using a multimodal large language model trained with spatial-aware soft supervision and reinforcement fine-tuning to achieve superior performance across nine benchmarks.

Zhongyi Shui, Honglin Li, Xiaozhong Ji, Ye Zhang, Zijiang Yang, Chenglu Zhu, Yuxuan Sun, Kai Yao, Conghui He, Cheng Tan2026-03-10💻 cs

Grounding Machine Creativity in Game Design Knowledge Representations: Empirical Probing of LLM-Based Executable Synthesis of Goal Playable Patterns under Structural Constraints

This paper empirically investigates whether large language models can synthesize executable Unity game code from Goal Playable Patterns under strict structural constraints, revealing that while intermediate representations improve performance, project-level grounding and hygiene failures remain primary bottlenecks in achieving high compilation success rates.

Hugh Xuechen Liu, Kıvanç Tatar2026-03-10💻 cs

AutoUE: Automated Generation of 3D Games in Unreal Engine via Multi-Agent Systems

This paper presents AutoUE, a novel multi-agent system that leverages retrieval-augmented generation and game design patterns to automatically generate, code, and test complete 3D games within Unreal Engine, effectively addressing the complexities of tool usage and workflow orchestration.

Lei Yin, Wentao Cheng, Zhida Qin, Tianyu Huang, Yidong Li, Gangyi Ding2026-03-10💻 cs

Efficient Personalized Reranking with Semi-Autoregressive Generation and Online Knowledge Distillation

This paper proposes the Personalized Semi-Autoregressive with online knowledge Distillation (PSAD) framework, which utilizes a semi-autoregressive teacher model and a User Profile Network to balance generation quality with low-latency inference while enhancing user-item interactions, thereby outperforming state-of-the-art baselines in both ranking performance and efficiency.

Kai Cheng, Hao Wang, Wei Guo, Weiwen Liu, Yong Liu, Yawen Li, Enhong Chen2026-03-10💻 cs

Vision Language Models Cannot Reason About Physical Transformation

This paper introduces ConservationBench to demonstrate that current Vision Language Models systematically fail to reason about physical transformations and maintain invariant representations of physical quantities, often performing near chance levels despite strong textual priors favoring invariance.

Dezhi Luo, Yijiang Li, Maijunxian Wang, Tianwei Zhao, Bingyang Wang, Siheng Wang, Pinyuan Feng, Pooyan Rahmanzadehgervi, Ziqiao Ma, Hokin Deng2026-03-10💻 cs

← Previous Next →