cs papers | Gist.Science

Towards Scalable Probabilistic Human Motion Prediction with Gaussian Processes for Safe Human-Robot Collaboration

This paper proposes a scalable, structured multitask variational Gaussian Process framework for full-body human motion prediction that achieves competitive accuracy with significantly fewer parameters and provides well-calibrated, interpretable uncertainty estimates essential for safe real-time human-robot collaboration.

Jinger Chong, Xiaotong Zhang, Kamal Youcef-Toumi2026-03-10💻 cs

NuNext: Reframing Nucleus Detection as Next-Point Detection

NuNext reframes nucleus detection in histopathology as a next-point prediction task using a multimodal large language model trained with spatial-aware soft supervision and reinforcement fine-tuning to achieve superior performance across nine benchmarks.

Zhongyi Shui, Honglin Li, Xiaozhong Ji, Ye Zhang, Zijiang Yang, Chenglu Zhu, Yuxuan Sun, Kai Yao, Conghui He, Cheng Tan2026-03-10💻 cs

Grounding Machine Creativity in Game Design Knowledge Representations: Empirical Probing of LLM-Based Executable Synthesis of Goal Playable Patterns under Structural Constraints

This paper empirically investigates whether large language models can synthesize executable Unity game code from Goal Playable Patterns under strict structural constraints, revealing that while intermediate representations improve performance, project-level grounding and hygiene failures remain primary bottlenecks in achieving high compilation success rates.

Hugh Xuechen Liu, Kıvanç Tatar2026-03-10💻 cs

AutoUE: Automated Generation of 3D Games in Unreal Engine via Multi-Agent Systems

This paper presents AutoUE, a novel multi-agent system that leverages retrieval-augmented generation and game design patterns to automatically generate, code, and test complete 3D games within Unreal Engine, effectively addressing the complexities of tool usage and workflow orchestration.

Lei Yin, Wentao Cheng, Zhida Qin, Tianyu Huang, Yidong Li, Gangyi Ding2026-03-10💻 cs

Efficient Personalized Reranking with Semi-Autoregressive Generation and Online Knowledge Distillation

This paper proposes the Personalized Semi-Autoregressive with online knowledge Distillation (PSAD) framework, which utilizes a semi-autoregressive teacher model and a User Profile Network to balance generation quality with low-latency inference while enhancing user-item interactions, thereby outperforming state-of-the-art baselines in both ranking performance and efficiency.

Kai Cheng, Hao Wang, Wei Guo, Weiwen Liu, Yong Liu, Yawen Li, Enhong Chen2026-03-10💻 cs

Vision Language Models Cannot Reason About Physical Transformation

This paper introduces ConservationBench to demonstrate that current Vision Language Models systematically fail to reason about physical transformations and maintain invariant representations of physical quantities, often performing near chance levels despite strong textual priors favoring invariance.

Dezhi Luo, Yijiang Li, Maijunxian Wang, Tianwei Zhao, Bingyang Wang, Siheng Wang, Pinyuan Feng, Pooyan Rahmanzadehgervi, Ziqiao Ma, Hokin Deng2026-03-10💻 cs

Learning From Failures: Efficient Reinforcement Learning Control with Episodic Memory

The paper proposes Failure Episodic Memory Alert (FEMA), a technique that stores and retrieves short-horizon failure experiences to prevent robots from relapsing into unstable states, thereby significantly improving sample efficiency and enabling successful long-horizon exploration in challenging contact-rich reinforcement learning tasks.

Chenyang Miao2026-03-10💻 cs

Efficient Chest X-ray Representation Learning via Semantic-Partitioned Contrastive Learning

This paper introduces Semantic-Partitioned Contrastive Learning (S-PCL), a streamlined self-supervised pre-training framework for Chest X-rays that achieves superior accuracy and computational efficiency by enforcing agreement between randomly partitioned semantic subsets, thereby eliminating the need for heavy augmentations, auxiliary decoders, or momentum encoders.

Wangyu Feng, Shawn Young, Lijian Xu2026-03-10💻 cs

aCAPTCHA: Verifying That an Entity Is a Capable Agent via Asymmetric Hardness

This paper introduces aCAPTCHA, a novel security protocol that verifies whether an entity is a capable AI agent by leveraging the asymmetric processing speed between humans and machines to solve the Agentic Capability Verification Problem (ACVP) through time-constrained, multi-round natural language challenges.

Zuyao Xu, Xiang Li, Fubin Wu, Yuqi Qiu, Lu Sun, FaSheng Miao2026-03-10💻 cs

TIQA: Human-Aligned Text Quality Assessment in Generated Images

This paper introduces TIQA, a human-aligned text quality assessment task and dataset for generated images, along with the ANTIQA method that significantly outperforms existing OCR and VLM-based metrics in predicting text rendering fidelity and improving downstream generation selection.

Kirill Koltsov, Aleksandr Gushchin, Dmitriy Vatolin, Anastasia Antsiferova2026-03-10💻 cs

Inter-Image Pixel Shuffling for Multi-focus Image Fusion

This paper proposes Inter-image Pixel Shuffling (IPS), a novel multi-focus image fusion method that synthesizes training data by shuffling pixels between clear and low-pass filtered images to enable deep learning models to learn fusion without real multi-focus datasets, while utilizing a hybrid cross-image network combining CNNs and state space models to achieve superior fusion quality.

Huangxing Lin, Rongrong Ma, Cheng Wang2026-03-10💻 cs

Efficient Trajectory Optimization for Autonomous Racing via Formula-1 Data-Driven Initialization

This paper proposes a data-driven initialization strategy for autonomous racing trajectory optimization that utilizes a neural network trained on Formula 1 telemetry to predict expert-like raceline offsets, thereby significantly accelerating solver convergence and reducing runtime compared to traditional geometric baselines while maintaining optimal lap times.

Samir Shehadeh, Lukas Kutsch, Nils Dengler, Sicong Pan, Maren Bennewitz2026-03-10💻 cs

Toward Multimodal Industrial Fault Analysis: A Single-Speed Chain Conveyor Dataset with Audio and Vibration Signals

This paper introduces a comprehensive multimodal dataset comprising audio and vibration signals from a single-speed chain conveyor system, designed to benchmark robust industrial fault detection and classification under diverse operating conditions and noise levels through standardized evaluation protocols and baseline models.

Zhang Chen, Yucong Zhang, Xiaoxiao Miao, Ming Li2026-03-10💻 cs

Deep Expert Injection for Anchoring Retinal VLMs with Domain-Specific Knowledge

This paper introduces EyExIn, a data-efficient framework that enhances retinal Vision Language Models by employing a dual-stream encoding strategy and a deep expert injection mechanism to bridge perception and reasoning gaps, thereby achieving state-of-the-art precision in ophthalmic diagnosis while preventing hallucinations.

Shuai Lu, Meng Wang, Jia Guo, Jiawei Du, Bo Liu, Shengzhu Yang, Weihang Zhang, Huazhu Fu, Huiqi Li2026-03-10💻 cs

More Than 1v1: Human-AI Alignment in Early Developmental Communities with Multimodal LLMs

This paper argues that human-AI alignment in early developmental communities should be treated as a community-governed process involving layered collaboration between families and professionals, rather than an individual optimization problem, by establishing expert-grounded structures, professional guardrails, and family-level adaptations for multimodal LLM outputs.

Weiyan Shi, Kenny Tsu Wei Choo2026-03-10💻 cs

The Model Knows Which Tokens Matter: Automatic Token Selection via Noise Gating

The paper introduces AutoSelect, a training-free token pruning method for vision-language models that reformulates token selection as capacity-constrained communication using a noise-gating mechanism to identify and retain only the most informative visual tokens, thereby significantly accelerating inference while preserving nearly all model accuracy.

Landi He, Xiaoyu Yang, Lijian Xu2026-03-10💻 cs

DexKnot: Generalizable Visuomotor Policy Learning for Dexterous Bag-Knotting Manipulation

DexKnot is a generalizable visuomotor framework that combines keypoint affordances with diffusion policies to enable robots to reliably knot plastic bags across diverse, unseen instances by learning shape-agnostic representations from real-world manual deformations.

Jiayuan Zhang, Ruihai Wu, Haojun Chen, Yuran Wang, Yifan Zhong, Ceyao Zhang, Yaodong Yang, Yuanpei Chen2026-03-10💻 cs

Model-based thermal drift compensation for high-precision hexapod robot actuators

This paper proposes and experimentally validates a model-based method that links actuator expansion to surface temperatures to compensate for thermal drift in high-precision hexapod robots, achieving a reduction in thermally induced errors of over 80%.

Clément Robert, Alain Vissiere, Olivier Company, Pierre Noire, Thierry Roux, Sébastien Krut2026-03-10💻 cs

PDD: Manifold-Prior Diverse Distillation for Medical Anomaly Detection

The paper proposes PDD, a novel framework that unifies global contextual and local structural priors from dual frozen encoders into a shared manifold to distill diverse knowledge into complementary student networks, achieving state-of-the-art performance in medical image anomaly detection across multiple datasets.

Xijun Lu, Hongying Liu, Fanhua Shang, Yanming Hui, Liang Wan2026-03-10💻 cs

Tutorial on Aided Inertial Navigation Systems: A Modern Treatment Using Lie-Group Theoretical Methods

This tutorial provides a control-oriented introduction to aided inertial navigation systems by utilizing a Lie-group formulation based on the extended Special Euclidean group SE(2,3) to establish a clear, geometric framework for fusing inertial and aiding measurements while explicitly leveraging invariance and symmetry principles.

Soulaimane Berkane2026-03-10💻 cs

← Previous Next →