cs papers | Gist.Science

MoMaStage: Skill-State Graph Guided Planning and Closed-Loop Execution for Long-Horizon Indoor Mobile Manipulation

MoMaStage is a structured vision-language framework that enables robust long-horizon indoor mobile manipulation by guiding task planning through a topology-aware Skill-State Graph and ensuring execution reliability via a closed-loop mechanism that triggers semantic replanning upon detecting physical deviations, all without requiring explicit scene mapping.

Chenxu Li, Zixuan Chen, Yetao Li, Jiapeng Xu, Hongyu Ding, Jieqi Shi, Jing Huo, Yang Gao2026-03-10💻 cs

Rectified flow-based prediction of post-treatment brain MRI from pre-radiotherapy priors for patients with glioma

This study presents a rectified flow-based AI model that generates realistic post-treatment brain MRIs from pre-radiotherapy priors and dose maps for glioma patients, achieving high structural fidelity and significantly faster inference than diffusion models to support adaptive treatment planning.

Selena Huisman, Nordin Belkacemi, Vera Keil, Joost Verhoeff, Szabolcs David2026-03-10💻 cs

Real-Time Drone Detection in Event Cameras via Per-Pixel Frequency Analysis

This paper proposes DDHF, a novel real-time drone detection framework for event cameras that utilizes Non-uniform Discrete Fourier Transform (NDFT) to analyze per-pixel temporal frequency signatures, achieving superior accuracy and significantly lower latency compared to traditional deep learning methods like YOLO.

Michael Bezick, Majid Sahin2026-03-10💻 cs

AULLM++: Structural Reasoning with Large Language Models for Micro-Expression Recognition

AULLM++ is a structural reasoning framework that leverages Large Language Models to enhance micro-expression Action Unit detection by fusing multi-granularity visual features with learned AU correlations through a three-stage evidence construction, structure modeling, and deduction-based prediction process, achieving state-of-the-art performance and superior cross-domain generalization.

Zhishu Liu, Kaishen Yuan, Bo Zhao, Hui Ma, Zitong Yu2026-03-10💻 cs

A Hierarchical Error-Corrective Graph Framework for Autonomous Agents with LLM-Based Action Generation

This paper proposes the Hierarchical Error-Corrective Graph Framework (HECG) for autonomous agents, which integrates Multi-Dimensional Transferable Strategy (MDTS) for precise candidate selection, Error Matrix Classification (EMC) for structured failure attribution, and Causal-Context Graph Retrieval (CCGR) for enhanced contextual reasoning to improve execution reliability in complex, multi-step tasks.

Cong Cao, Jingyao Zhang, Kun Tong2026-03-10💻 cs

StructBiHOI: Structured Articulation Modeling for Long--Horizon Bimanual Hand--Object Interaction Generation

The paper proposes StructBiHOI, a hierarchical framework that combines a jointVAE for long-term planning, a maniVAE for frame-level refinement, and a Mamba-based diffusion denoiser to achieve stable, physically plausible, and semantically aligned long-horizon bimanual hand-object interaction generation.

Zhi Wang, Liu Liu, Ruonan Liu, Dan Guo, Meng Wang2026-03-10💻 cs

Trust Nothing: RTOS Security without Run-Time Software TCB (Extended Version)

This paper presents a novel capability architecture and a corresponding Zephyr-based real-time operating system that achieves comprehensive security for embedded devices by fully disaggregating and isolating all software subsystems and peripherals, thereby eliminating the need for a run-time software Trusted Computing Base (TCB) without requiring hardware modifications.

Eric Ackermann, Sven Bugiel2026-03-10💻 cs

SPIRAL: A Closed-Loop Framework for Self-Improving Action World Models via Reflective Planning Agents

SPIRAL is a closed-loop framework that enhances controllable long-horizon video generation by integrating a reflective planning process with iterative action world modeling, enabling self-improvement through explicit planning, object-centric decomposition, and feedback-driven refinement.

Yu Yang, Yue Liao, Jianbiao Mei, Baisen Wang, Xuemeng Yang, Licheng Wen, Jiangning Zhang, Xiangtai Li, Hanlin Chen, Botian Shi, Yong Liu, Shuicheng Yan, Gim Hee Lee2026-03-10💻 cs

Scalable On-the-fly Transcoding for Adaptive Streaming of Dynamic Point Clouds

This paper presents and evaluates a scalable dynamic point cloud streaming system that leverages on-the-fly transcoding, demonstrating how caching and speculative transcoding significantly reduce server loads and improve user Quality of Experience to support a higher number of simultaneous clients.

Michael Rudolph, Matthias De Fré, Finn Schnier, Tim Wauter, Amr Rizk2026-03-10💻 cs

Human-Aware Robot Behaviour in Self-Driving Labs

This paper proposes an AI-driven perception method with hierarchical human intention prediction to enable mobile robot chemists in self-driving laboratories to proactively distinguish between human preparatory actions and transient interactions, thereby overcoming the inefficiencies of passive obstruction detection and streamlining human-robot coordination in shared-access scenarios.

Satheeshkumar Veeramani, Anna Kisil, Abigail Bentley, Hatem Fakhruldeen, Gabriella Pizzuto, Andrew I. Cooper2026-03-10💻 cs

Client-Cooperative Split Learning

The paper presents CliCooper, a multi-client cooperative Split Learning framework that safeguards data privacy through differential privacy and label obfuscation while ensuring training integrity and model provenance via dynamic chained watermarking, effectively mitigating privacy and ownership attacks without compromising model accuracy.

Haiyu Deng, Yanna Jiang, Guangsheng Yu, Qin Wang, Xu Wang, Wei Ni, Shiping Chen, Ren Ping Liu2026-03-10💻 cs

Tactile Recognition of Both Shapes and Materials with Automatic Feature Optimization-Enabled Meta Learning

This paper proposes the AFOP-ML framework, an automatic feature optimization-enabled prototypical network that achieves rapid few-shot tactile recognition of both shapes and materials with high accuracy and robustness against perturbations, effectively addressing the challenges of data scarcity and time-consuming training in robotic applications.

Hongliang Zhao, Wenhui Yang, Yang Chen, Zhuorui Wang, Baiheng Liu, Longhui Qin2026-03-10💻 cs

FoMo: A Multi-Season Dataset for Robot Navigation in Forêt Montmorency

The FoMo dataset presents a comprehensive, multi-season collection of over 64 km of diverse robot navigation data from a boreal forest, featuring significant environmental changes like heavy snow and vegetation growth to challenge and evaluate the robustness of state-of-the-art odometry and SLAM systems.

Matej Boxan, Gabriel Jeanson, Alexander Krawciw, Effie Daum, Xinyuan Qiao, Sven Lilge, Timothy D. Barfoot, François Pomerleau2026-03-10💻 cs

Information Maximization for Long-Tailed Semi-Supervised Domain Generalization

This paper proposes IMaX, a simple yet effective objective based on the InfoMax principle that maximizes mutual information between learned features and latent labels while mitigating class-balance bias through an $\alpha$ -entropic term, thereby significantly improving the performance of state-of-the-art semi-supervised domain generalization methods in long-tailed distribution scenarios.

Leo Fillioux, Omprakash Chakraborty, Quentin Gopée, Pierre Marza, Paul-Henry Cournède, Stergios Christodoulidis, Maria Vakalopoulou, Ismail Ben Ayed, Jose Dolz2026-03-10💻 cs

LLM-Driven Online Aggregation for Unstructured Text Analytics

The paper introduces OLLA, an LLM-driven online aggregation framework that accelerates unstructured text analytics by incrementally processing data and employing semantic stratified sampling to achieve high accuracy with significantly reduced latency compared to full-batch processing.

Chao Hui, Weizheng Lu, Yanjie Gao, Lingfeng Xiong, Yunhai Wang, Yueguo Chen2026-03-10💻 cs

Alfa: Attentive Low-Rank Filter Adaptation for Structure-Aware Cross-Domain Personalized Gaze Estimation

The paper proposes Alfa, an attentive low-rank filter adaptation method that reweights pre-trained semantic features via singular value decomposition and attention mechanisms to achieve efficient, sample-efficient test-time personalization for cross-domain gaze estimation, outperforming existing methods while demonstrating applicability beyond computer vision.

He-Yen Hsieh, Wei-Te Mark Ting, H. T. Kung2026-03-10💻 cs

Efficient Policy Learning with Hybrid Evaluation-Based Genetic Programming for Uncertain Agile Earth Observation Satellite Scheduling

This paper proposes a Hybrid Evaluation-based Genetic Programming (HE-GP) framework that dynamically switches between exact and approximate evaluation modes within an Online Scheduling Algorithm to efficiently solve the Uncertain Agile Earth Observation Satellite Scheduling Problem, achieving significant computational cost reductions while maintaining superior scheduling performance compared to existing methods.

Junhua Xue, Yuning Chen2026-03-10💻 cs

Evolving Symbiosis, from Barricelli's Legacy to Collective Intelligence: a simulated and conceptual approach

This report from the ALICE 2026 workshop details the SymBa group's efforts to revive and extend Nils Aall Barricelli's 1953 research on symbiogenesis by replicating his 1D numerical organisms, developing 2D extensions and DNA-norm experiments, and exploring the implications of symbiogenesis for the origins of life, open-ended evolution, and collective intelligence in artificial systems.

James Ashford, Marko Cvjetko, Richard Löffler, Berfin Sakallioglu, Alessandro Valerio, Marta Tataryn, Benedikt Hartl, Léo Pio-Lopez, Stefano Nichele2026-03-10💻 cs

Multi-Mode Pinching-Antenna Systems: Mode Selection or Mode Combining?

This letter proposes and evaluates two operating protocols, mode selection and mode combining, for multi-mode pinching-antenna systems to maximize sum rate in multi-user downlink communications via a jointly optimized PSO-KKT algorithm, demonstrating that mode combining offers superior spectral efficiency while mode selection provides a low-complexity alternative with comparable performance.

Xiaoxia Xu, Xidong Mu, Yuanwei Liu, Arumugam Nallanathan2026-03-10💻 cs

R2F: Repurposing Ray Frontiers for LLM-free Object Navigation

The paper proposes R2F, an LLM-free framework for zero-shot open-vocabulary object navigation that repurposes ray frontiers as direction-conditioned semantic hypotheses to achieve competitive performance with real-time execution, eliminating the latency and computational overhead of iterative large-model queries.

Francesco Argenziano, John Mark Alexis Marcelo, Michele Brienza, Abdel Hakim Drid, Emanuele Musumeci, Daniele Nardi, Domenico D. Bloisi, Vincenzo Suriani2026-03-10💻 cs

← Previous Next →