cs papers | Gist.Science

Fast Attention-Based Simplification of LiDAR Point Clouds for Object Detection and Classification

This paper proposes an efficient, end-to-end learned point cloud simplification method that combines feature embedding with attention-based sampling to achieve a superior balance between computational speed and accuracy for LiDAR-based object detection and classification compared to traditional sampling techniques.

Z. Rozsa, Á. Madaras, Q. Wei, X. Lu, M. Golarits, H. Yuan, T. Sziranyi, R. Hamzaoui2026-03-10💻 cs

VB-NET: A physics-constrained gray-box deep learning framework for modeling air conditioning systems as virtual batteries

This paper introduces VB-NET, a physics-constrained gray-box deep learning framework that mathematically equates air conditioning systems to virtual batteries, enabling highly accurate, interpretable, and data-efficient modeling for grid regulation even with minimal historical data.

Yuchen Qi, Ye Guo, Yinliang Xu2026-03-10💻 cs

EmbedTalk: Triplane-Free Talking Head Synthesis using Embedding-Driven Gaussian Deformation

EmbedTalk introduces a triplane-free talking head synthesis method that leverages learned embeddings to drive 3D Gaussian deformations, achieving superior rendering quality, lip synchronization, and motion consistency while enabling real-time performance (over 60 FPS) on mobile GPUs through significantly more compact models.

Arpita Saggar, Jonathan C. Darling, Duygu Sarikaya, David C. Hogg2026-03-10💻 cs

Deep Research for Recommender Systems

This paper introduces RecPilot, a multi-agent framework that shifts the recommender system paradigm from passive item filtering to proactive, user-centric assistance by generating comprehensive, synthesized reports that significantly reduce user effort in item evaluation.

Kesha Ou, Chenghao Wu, Xiaolei Wang, Bowen Zheng, Wayne Xin Zhao, Weitao Li, Long Zhang, Sheng Chen, Ji-Rong Wen2026-03-10💻 cs

From Logs to Agents: Reconstructing High-Level Creative Workflows from Low-Level Raw System Traces

This paper proposes a method to reconstruct high-level creative workflows from low-level system logs by parsing raw data into structured behavioral graphs, thereby enabling future process-aware AI agents to better understand user intent and assist in creative tasks.

Tae Hee Jo, Kyung Hoon Hyun2026-03-10💻 cs

Beyond Semantic Similarity: Open Challenges for Embedding-Based Creative Process Analysis Across AI Design Tools

This paper argues that relying solely on fixed embedding similarity for analyzing creative processes in AI design tools is insufficient because it fails to capture meaningful creative pivots, and it outlines three key challenges—aligning metrics with creative significance, handling multimodal traces, and evaluating agentic systems—while proposing context-aware LLM interventions to better capture session-specific dynamics.

Seung Won Lee, Semin Jin, Kyung Hoon Hyun2026-03-10💻 cs

Looking Into the Water by Unsupervised Learning of the Surface Shape

This paper proposes an unsupervised deep learning method using two neural-field networks with periodic activation functions to model water surface height and reconstruct undistorted underwater images from aerial views, outperforming existing approaches on both simulated and real data.

Ori Lifschitz, Tali Treibitz, Dan Rosenbaum2026-03-10💻 cs

Overthinking Causes Hallucination: Tracing Confounder Propagation in Vision Language Models

This paper identifies "overthinking"—the propagation of incorrect intermediate hypotheses across decoder layers—as a primary cause of hallucinations in Vision Language Models and introduces the Overthinking Score, a layer-probing metric that significantly outperforms existing final-output-based detectors.

Abin Shoby, Ta Duc Huy, Tuan Dung Nguyen, Minh Khoi Ho, Qi Chen, Anton van den Hengel, Phi Le Nguyen, Johan W. Verjans, Vu Minh Hieu Phan2026-03-10💻 cs

Performance Evaluation of Automated Multi-Service Deployment in Edge-Cloud Environments with the CODECO Toolkit

This paper evaluates the open-source CODECO toolkit, demonstrating that it significantly reduces manual intervention and maintains competitive performance compared to baseline Kubernetes workflows for automating multi-service deployments across heterogeneous Edge-Cloud environments.

Georgios Koukis, Ioannis Dermentzis, Vassilis Tsaoussidis, Jan Lenke, Fabian Wolk, Daniel Uceda, Guillermo Sanchez, Miguel A. Puentes, Javier Serrano, Panagiotis Karamolegkos, Rute C. Sofia2026-03-10💻 cs

GeoLoco: Leveraging 3D Geometric Priors from Visual Foundation Model for Robust RGB-Only Humanoid Locomotion

GeoLoco is a robust, RGB-only humanoid locomotion framework that leverages geometric priors from a frozen Visual Foundation Model and a specialized cross-attention mechanism to achieve zero-shot sim-to-real transfer on the Unitree G1 without relying on active depth sensors.

Yufei Liu, Xieyuanli Chen, Hainan Pan, Chenghao Shi, Yanjie Chen, Kaihong Huang, Zhiwen Zeng, Huimin Lu2026-03-10💻 cs

Duala: Dual-Level Alignment of Subjects and Stimuli for Cross-Subject fMRI Decoding

The paper proposes Duala, a dual-level alignment framework that enhances cross-subject fMRI decoding by ensuring semantic consistency at the stimulus level and capturing individual neural variations at the subject level, thereby achieving state-of-the-art performance in image-to-brain retrieval and reconstruction with minimal adaptation data.

Shumeng Li, Jintao Guo, Jian Zhang, Yulin Zhou, Luyang Cao, Yinghuan Shi2026-03-10💻 cs

Task Breakpoint Generation using Origin-Centric Graph in Virtual Reality Recordings for Adaptive Playback

This paper proposes an automated method for generating task breakpoints in Virtual Reality recordings using an Origin-Centric Graph to analyze spatio-temporal scene changes, enabling adaptive playback and demonstrating high accuracy against user-perceived ground truth.

Selin Choi, Dooyoung Kim, Taewook Ha, Seonji Kim, Woontack Woo2026-03-10💻 cs

Real-Time Glottis Detection Framework via Spatial-decoupled Feature Learning for Nasal Transnasal Intubation

This paper proposes Mobile GlottisNet, a lightweight and efficient deep learning framework utilizing spatial-decoupled feature learning and adaptive mechanisms to achieve real-time, high-speed glottis detection for nasotracheal intubation on resource-constrained edge devices.

Jinyu Liu, Gaoyang Zhang, Yang Zhou, Ruoyi Hao, Yang Zhang, Hongliang Ren2026-03-10💻 cs

PoEW:Encryption as Consensus and Enabling Data Compression Services?

This paper proposes Proof-of-Encryption-Work (PoEW), a novel consensus mechanism that repurposes the energy-intensive process of exhaustive key search to simultaneously secure decentralized networks and achieve data compression by deriving a short key from a lengthy plaintext.

Chong Guan2026-03-10💻 cs

Coordination Games on Multiplex Networks: Consensus, Convergence, and Stability of Opinion Dynamics

This paper extends opinion dynamics to multiplex networks by modeling them as synchronous coordination games with merged and switching coupling mechanisms, using spectral analysis to demonstrate how cross-layer interactions can uniquely induce, accelerate, or disrupt global consensus compared to single-layer systems.

Ruey-An Shiu, Parinaz Naghizadeh2026-03-10💻 cs

PanoDP: Learning Collision-Free Navigation with Panoramic Depth and Differentiable Physics

PanoDP is a communication-free learning framework that integrates panoramic depth perception with differentiable physics-based training signals to enable robust, collision-free autonomous navigation in complex, partially observable environments with both static and dynamic obstacles.

Hao Zhong, Pei Chi, Jiang Zhao, Shenghai Yuan, Xuyang Gao, Thien-Minh Nguyen, Lihua Xie2026-03-10💻 cs

Registered Attribute-Based Encryption with Publicly Verifiable Certified Deletion, Everlasting Security, and More

This paper presents the first Registered Attribute-Based Encryption (RABE) schemes that support both certified deletion and certified everlasting security in both privately and publicly verifiable settings, thereby enabling decentralized, fine-grained access control with irreversible data deletion and information-theoretic security against future adversaries.

Shayeef Murshid, Ramprasad Sarkar, Mriganka Mandal2026-03-10💻 cs

TempoFit: Plug-and-Play Layer-Wise Temporal KV Memory for Long-Horizon Vision-Language-Action Manipulation

TempoFit is a training-free, plug-and-play method that enhances frozen Vision-Language-Action policies for long-horizon manipulation by retrieving and injecting layer-wise temporal key-value memory from previous timesteps, thereby improving success rates in non-Markovian environments without increasing inference latency or requiring model retraining.

Jun Sun, Boyu Yang, Jiahao Zhang, Ning Ma, Chencheng Wu, Siqing Zhang, Yiou Huang, Qiufeng Wang, Shan Liang, Yaran Chen2026-03-10💻 cs

AtomicVLA: Unlocking the Potential of Atomic Skill Learning in Robots

The paper proposes AtomicVLA, a unified planning-and-execution framework that utilizes a Skill-Guided Mixture-of-Experts architecture to dynamically compose atomic skill abstractions, thereby significantly improving scalability and performance in long-horizon robotic manipulation and continual learning tasks compared to existing monolithic VLA models.

Likui Zhang, Tao Tang, Zhihao Zhan, Xiuwei Chen, Zisheng Chen, Jianhua Han, Jiangtong Zhu, Pei Xu, Hang Xu, Hefeng Wu, Liang Lin, Xiaodan Liang2026-03-10💻 cs

Multi-Agent Off-World Exploration for Sparse Evidence Discovery via Gaussian Belief Mapping and Dual-Domain Coverage

This paper proposes a multi-agent informative path planning framework for off-world exploration that utilizes Gaussian belief mapping and dual-domain coverage to effectively discover sparse, visually ambiguous evidence while balancing information gain with operational safety in hazardous, communication-constrained environments.

Zhuoran Qiao, Tianxin Hu, Thien-Minh Nguyen, Shenghai Yuan2026-03-10💻 cs

← Previous Next →