cs papers | Gist.Science

Inter-Image Pixel Shuffling for Multi-focus Image Fusion

This paper proposes Inter-image Pixel Shuffling (IPS), a novel multi-focus image fusion method that synthesizes training data by shuffling pixels between clear and low-pass filtered images to enable deep learning models to learn fusion without real multi-focus datasets, while utilizing a hybrid cross-image network combining CNNs and state space models to achieve superior fusion quality.

Huangxing Lin, Rongrong Ma, Cheng Wang2026-03-10💻 cs

Efficient Trajectory Optimization for Autonomous Racing via Formula-1 Data-Driven Initialization

This paper proposes a data-driven initialization strategy for autonomous racing trajectory optimization that utilizes a neural network trained on Formula 1 telemetry to predict expert-like raceline offsets, thereby significantly accelerating solver convergence and reducing runtime compared to traditional geometric baselines while maintaining optimal lap times.

Samir Shehadeh, Lukas Kutsch, Nils Dengler, Sicong Pan, Maren Bennewitz2026-03-10💻 cs

Toward Multimodal Industrial Fault Analysis: A Single-Speed Chain Conveyor Dataset with Audio and Vibration Signals

This paper introduces a comprehensive multimodal dataset comprising audio and vibration signals from a single-speed chain conveyor system, designed to benchmark robust industrial fault detection and classification under diverse operating conditions and noise levels through standardized evaluation protocols and baseline models.

Zhang Chen, Yucong Zhang, Xiaoxiao Miao, Ming Li2026-03-10💻 cs

Deep Expert Injection for Anchoring Retinal VLMs with Domain-Specific Knowledge

This paper introduces EyExIn, a data-efficient framework that enhances retinal Vision Language Models by employing a dual-stream encoding strategy and a deep expert injection mechanism to bridge perception and reasoning gaps, thereby achieving state-of-the-art precision in ophthalmic diagnosis while preventing hallucinations.

Shuai Lu, Meng Wang, Jia Guo, Jiawei Du, Bo Liu, Shengzhu Yang, Weihang Zhang, Huazhu Fu, Huiqi Li2026-03-10💻 cs

More Than 1v1: Human-AI Alignment in Early Developmental Communities with Multimodal LLMs

This paper argues that human-AI alignment in early developmental communities should be treated as a community-governed process involving layered collaboration between families and professionals, rather than an individual optimization problem, by establishing expert-grounded structures, professional guardrails, and family-level adaptations for multimodal LLM outputs.

Weiyan Shi, Kenny Tsu Wei Choo2026-03-10💻 cs

The Model Knows Which Tokens Matter: Automatic Token Selection via Noise Gating

The paper introduces AutoSelect, a training-free token pruning method for vision-language models that reformulates token selection as capacity-constrained communication using a noise-gating mechanism to identify and retain only the most informative visual tokens, thereby significantly accelerating inference while preserving nearly all model accuracy.

Landi He, Xiaoyu Yang, Lijian Xu2026-03-10💻 cs

DexKnot: Generalizable Visuomotor Policy Learning for Dexterous Bag-Knotting Manipulation

DexKnot is a generalizable visuomotor framework that combines keypoint affordances with diffusion policies to enable robots to reliably knot plastic bags across diverse, unseen instances by learning shape-agnostic representations from real-world manual deformations.

Jiayuan Zhang, Ruihai Wu, Haojun Chen, Yuran Wang, Yifan Zhong, Ceyao Zhang, Yaodong Yang, Yuanpei Chen2026-03-10💻 cs

Model-based thermal drift compensation for high-precision hexapod robot actuators

This paper proposes and experimentally validates a model-based method that links actuator expansion to surface temperatures to compensate for thermal drift in high-precision hexapod robots, achieving a reduction in thermally induced errors of over 80%.

Clément Robert, Alain Vissiere, Olivier Company, Pierre Noire, Thierry Roux, Sébastien Krut2026-03-10💻 cs

PDD: Manifold-Prior Diverse Distillation for Medical Anomaly Detection

The paper proposes PDD, a novel framework that unifies global contextual and local structural priors from dual frozen encoders into a shared manifold to distill diverse knowledge into complementary student networks, achieving state-of-the-art performance in medical image anomaly detection across multiple datasets.

Xijun Lu, Hongying Liu, Fanhua Shang, Yanming Hui, Liang Wan2026-03-10💻 cs

Tutorial on Aided Inertial Navigation Systems: A Modern Treatment Using Lie-Group Theoretical Methods

This tutorial provides a control-oriented introduction to aided inertial navigation systems by utilizing a Lie-group formulation based on the extended Special Euclidean group SE(2,3) to establish a clear, geometric framework for fusing inertial and aiding measurements while explicitly leveraging invariance and symmetry principles.

Soulaimane Berkane2026-03-10💻 cs

CanoVerse: 3D Object Scalable Canonicalization and Dataset for Generation and Pose

The paper introduces CanoVerse, a massive dataset of 320K canonicalized 3D objects and a high-throughput framework that resolves directional ambiguity to significantly improve 3D generation stability, cross-modal retrieval, and zero-shot orientation estimation.

Li Jin, Yuchen Yang, Weikai Chen, Yujie Wang, Dehao Hao, Tanghui Jia, Yingda Yin, Zeyu Hu, Runze Zhang, Keyang Luo, Li Yuan, Long Quan, Xin Wang, Xueying Qin2026-03-10💻 cs

LiveWorld: Simulating Out-of-Sight Dynamics in Generative Video World Models

This paper introduces LiveWorld, a novel framework that addresses the "out-of-sight dynamics" limitation in generative video world models by maintaining a persistent global state where unobserved entities continue to evolve, thereby enabling truly continuous 4D world simulation and long-term scene consistency.

Zicheng Duan, Jiatong Xia, Zeyu Zhang, Wenbo Zhang, Gengze Zhou, Chenhui Gou, Yefei He, Feng Chen, Xinyu Zhang, Lingqiao Liu2026-03-10💻 cs

Transition State Theory for Network Dynamics

This paper proposes a framework integrating transition state theory with dynamic network modeling to characterize and predict discrete structural changes, such as faction realignment, often using cross-sectional data to forecast future network evolution.

Carter T. Butts2026-03-10💻 cs

NarrativeLoom: Enhancing Creative Storytelling through Multi-Persona Collaborative Improvisation

The paper introduces NarrativeLoom, a multi-persona collaborative storytelling system grounded in Campbell's theory of blind variation and selective retention, which a controlled study of 50 participants shows significantly enhances the novelty, diversity, and overall creativity of co-authored stories compared to existing tools, particularly benefiting novice writers through structured scaffolding.

Yuxi Ma, Yongqian Peng, Fengyuan Yang, Siyu Zha, Chi Zhang, Zixia Jia, Zilong Zheng, Yixin Zhu2026-03-10💻 cs

Improving reasoning at inference time via uncertainty minimisation

This paper proposes a computationally efficient inference-time reasoning method that improves accuracy by selecting thought-level continuations that maximize the model's internal self-certainty, demonstrating that optimizing for uncertainty minimization at early planning stages yields performance comparable to or exceeding existing scaling techniques like self-consistency.

Nicolas Legrand, Kenneth Enevoldsen, Márton Kardos, Kristoffer Nielbo2026-03-10💻 cs

PromptGate Client Adaptive Vision Language Gating for Open Set Federated Active Learning

PromptGate is a dynamic, federated vision-language framework that utilizes adaptive, learnable prompts to effectively filter out-of-distribution noise from unlabeled medical data pools, thereby significantly improving the efficiency and privacy of open-set active learning across resource-constrained institutions.

Adea Nesturi, David Dueñas Gaviria, Jiajun Zeng, Shadi Albarqouni2026-03-10💻 cs

RoTri-Diff: A Spatial Robot-Object Triadic Interaction-Guided Diffusion Model for Bimanual Manipulation

The paper proposes RoTri-Diff, a diffusion-based imitation learning framework that explicitly models the spatial triadic relationship between two robot arms and an object to generate stable, coordinated bimanual manipulation trajectories, outperforming state-of-the-art baselines in both simulation and real-world tasks.

Zixuan Chen, Nga Teng Chan, Yiwen Hou, Chenrui Tie, Zixuan Liu, Haonan Chen, Junting Chen, Jieqi Shi, Yang Gao, Jing Huo, Lin Shao2026-03-10💻 cs

ACD-U: Asymmetric co-teaching with machine unlearning for robust learning with noisy labels

The paper proposes ACD-U, an asymmetric co-teaching framework that combines a CLIP-pretrained Vision Transformer with a CNN and incorporates machine unlearning to actively correct selection errors and achieve state-of-the-art robustness against noisy labels.

Reo Fukunaga, Soh Yoshida, Mitsuji Muneyasu2026-03-10💻 cs

Class Visualizations and Activation Atlases for Enhancing Interpretability in Deep Learning-Based Computational Pathology

This paper introduces a framework to evaluate class visualizations and activation atlases for transformer-based pathology models, revealing that while these feature visualization methods effectively capture coarse tissue-level concepts, their ability to represent fine-grained cancer subclasses is limited by intrinsic pathological complexity and reduced inter-observer agreement.

Marco Gustav, Fabian Wolf, Christina Glasner, Nic G. Reitsam, Stefan Schulz, Kira Aschenbroich, Bruno Märkl, Sebastian Foersch, Jakob Nikolas Kather2026-03-10💻 cs

Learning to Rank the Initial Branching Order of SAT Solvers

This paper proposes using graph neural networks to predict initial branching orders for CDCL SAT solvers, demonstrating significant speedups on random and pseudo-industrial benchmarks while noting that the approach struggles with complex industrial instances due to the solver's dynamic heuristics overriding the predictions.

Arvid Eriksson (KTH Royal Institute of Technology), Gabriel Poesia (Kempner Institute at Harvard University), Roman Bresson (Mohamed Bin Zayed University of Artificial Intelligence), Karl Henrik Johansson (KTH Royal Institute of Technology), David Broman (KTH Royal Institute of Technology)2026-03-10💻 cs

← Previous Next →