cs papers | Gist.Science

Learning to Think Fast and Slow for Visual Language Models

This paper introduces DualMindVLM, a visual language model that leverages a dual-mode thinking mechanism to dynamically select between fast, intuitive responses and slow, deliberate reasoning based on problem complexity, thereby achieving state-of-the-art performance with significantly improved token efficiency.

Chenyu Lin, Cheng Chi, Jinlin Wu, Sharon Li, Kaiyang Zhou2026-03-10💻 cs

Radiative-Structured Neural Operator for Continuous and Extrapolative Spectral Super-Resolution

The paper proposes the Radiative-Structured Neural Operator (RSNO), a novel deep learning framework that reconstructs hyperspectral images from multispectral observations by learning continuous spectral mappings under radiative priors and employing angular-consistent projection to ensure physical consistency and eliminate color distortion.

Ziye Zhang, Bin Pan, Zhenwei Shi2026-03-10💻 cs

UnfoldLDM: Deep Unfolding-based Blind Image Restoration with Latent Diffusion Priors

The paper proposes UnfoldLDM, a deep unfolding framework that integrates a multi-granularity degradation-aware module for robust degradation estimation and a degradation-resistant latent diffusion model with an over-smoothing correction transformer to effectively address blind image restoration by overcoming degradation-specific dependencies and suppressing over-smoothing bias.

Chunming He, Rihan Zhang, Zheng Chen, Bowen Yang, Chengyu Fang, Yunlong Lin, Yulun Zhang, Fengyang Xiao, Sina Farsiu2026-03-10💻 cs

Privacy Concerns and ChatGPT: Exploring Online Discourse through the Lens of Information Practice on Reddit

This study analyzes Reddit discourse to reveal how users collectively negotiate ChatGPT privacy risks through practices of risk signaling, norm-setting, and advocacy for privacy-preserving alternatives, offering insights for AI design and privacy literacy.

S M Mehedi Zaman, Saubhagya Joshi, Yiyi Wu2026-03-10💻 cs

Stable Multi-Drone GNSS Tracking System for Marine Robots

This paper presents a stable, real-time multi-drone GNSS tracking system for marine robots that integrates visual detection, multi-object tracking, triangulation, and a confidence-weighted Extended Kalman Filter with cross-drone ID alignment to overcome the limitations of underwater signal loss and traditional alternatives.

Shuo Wen, Edwin Meriaux, Mariana Sosa Guzmán, Zhizun Wang, Junming Shi, Gregory Dudek2026-03-10💻 cs

Yo'City: Personalized and Boundless 3D Realistic City Scene Generation via Self-Critic Expansion

This paper introduces Yo'City, an agentic framework that leverages large models for hierarchical planning and a self-critic expansion loop to generate personalized, boundless, and spatially coherent 3D realistic city scenes, outperforming existing state-of-the-art methods across multiple evaluation metrics.

Keyang Lu, Sifan Zhou, Hongbin Xu, Gang Xu, Zhifei Yang, Yikai Wang, Zhen Xiao, Jieyi Long, Ming Li2026-03-10💻 cs

DOPD: A Dynamic PD-Disaggregation Architecture for Maximizing Goodput in LLM Inference Serving

DOPD is a dynamic LLM inference system that optimizes goodput and meets strict SLOs by adaptively adjusting the ratio of prefill to decoding instances based on real-time workload monitoring, outperforming existing aggregation and disaggregation approaches like vLLM and DistServe.

Junhan Liao, Minxian Xu, Wanyi Zheng, Yan Wang, Kejiang Ye, Rajkumar Buyya, Chengzhong Xu2026-03-10💻 cs

Sublinear Edge Fault Tolerant Spanners for Hypergraphs

This paper initiates the study of fault-tolerant spanners in hypergraphs by demonstrating that simple extensions fail to achieve sublinear size, and subsequently proposes a novel clustering-based algorithm that constructs edge fault-tolerant hyperspanners with improved sublinear size bounds alongside a matching lower bound.

Jialin He, Nicholas Popescu, Chunjiang Zhu2026-03-10💻 cs

An LLM-Assisted Multi-Agent Control Framework for Roll-to-Roll Manufacturing Systems

This paper presents an LLM-assisted multi-agent framework that automates the design, tuning, and adaptation of control systems for roll-to-roll manufacturing, significantly reducing manual effort while ensuring safety and maintaining high-quality tension and velocity regulation under model uncertainty.

Jiachen Li, Shihao Li, Christopher Martin, Zijun Chen, Dongmei Chen, Wei Li2026-03-10💻 cs

RadDiff: Retrieval-Augmented Denoising Diffusion for Protein Inverse Folding

This paper introduces RadDiff, a novel retrieval-augmented denoising diffusion model that effectively integrates up-to-date protein knowledge into the inverse folding process, significantly outperforming existing methods in sequence recovery rate and foldability while offering greater parameter efficiency and scalability.

Jin Han, Tianfan Fu, Wu-Jun Li2026-03-10💻 cs

Integrating a Causal Foundation Model into a Prescriptive Maintenance Framework for Optimising Production-Line OEE

This paper proposes a prescriptive maintenance framework that integrates a pre-trained causal foundation model as a "what-if" simulator to identify root causes and recommend optimal interventions, thereby overcoming the limitations of purely predictive models to enhance production-line Overall Equipment Effectiveness (OEE).

Felix Saretzky, Lucas Andersen, Thomas Engel, Fazel Ansari2026-03-10💻 cs

S2AM3D: Scale-controllable Part Segmentation of 3D Point Cloud

The paper proposes S2AM3D, a novel framework that integrates 2D segmentation priors with 3D consistent supervision and a scale-aware prompt decoder to achieve robust, generalizable, and real-time controllable part segmentation for 3D point clouds, supported by a newly introduced large-scale dataset.

Han Su, Tianyu Huang, Zichen Wan, Xiaohe Wu, Wangmeng Zuo2026-03-10💻 cs

Confidential, Attestable, and Efficient Inter-CVM Communication with Arm CCA

This paper presents CAEC, a system built on Arm Confidential Compute Architecture that introduces Confidential Shared Memory to enable secure, high-performance, and attestable direct data sharing between Confidential Virtual Machines while remaining inaccessible to the hypervisor.

Sina Abdollahi, Amir Al Sadi, Marios Kogias, David Kotz, Hamed Haddadi2026-03-10💻 cs

HiconAgent: History Context-aware Policy Optimization for GUI Agents

HiconAgent introduces History Context-aware Policy Optimization (HCPO), featuring Dynamic Context Sampling and Anchor-guided History Compression, to enable a compact 3B-parameter GUI agent to outperform larger models in navigation accuracy while significantly reducing computational costs.

Xurui Zhou, Gongwei Chen, Yuquan Xie, Zaijing Li, Kaiwen Zhou, Shuai Wang, Shuo Yang, Zhuotao Tian, Rui Shao2026-03-10💻 cs

MAViD: A Multimodal Framework for Audio-Visual Dialogue Understanding and Generation

MAViD is a novel multimodal framework that employs a Conductor-Creator architecture, combining autoregressive audio and diffusion-based video generation with a specialized fusion module, to overcome existing limitations and achieve seamless, long-duration, and contextually coherent audio-visual dialogue understanding and generation.

Youxin Pang, Jiajun Liu, Lingfeng Tan, Yong Zhang, Feng Gao, Xiang Deng, Zhuoliang Kang, Xiaoming Wei, Yebin Liu2026-03-10💻 cs

When Token Pruning is Worse than Random: Understanding Visual Token Information in VLLMs

This paper reveals that visual token information in Vision Large Language Models progressively vanishes at a depth-dependent "information horizon," beyond which existing pruning methods underperform random selection, leading to a novel strategy that integrates random pruning to achieve state-of-the-art efficiency without sacrificing accuracy.

Yahong Wang, Juncheng Wu, Zhangkai Ni, Longzhen Yang, Yihang Liu, Chengmei Yang, Ying Wen, Lianghua He, Xianfeng Tang, Hui Liu, Yuyin Zhou2026-03-10💻 cs

IPPO Learns the Game, Not the Team: A Study on Generalization in Heterogeneous Agent Teams

This study demonstrates that in heterogeneous multi-agent settings, a standard IPPO baseline trained via self-play generalizes effectively to novel teammate algorithms, achieving performance comparable to more complex training methods like Rotating Policy Training that explicitly expose agents to diverse partner strategies.

Ryan LeRoy, Jack Kolb2026-03-10💻 cs

Beyond Endpoints: Path-Centric Reasoning for Vectorized Off-Road Network Extraction

This paper addresses the challenges of off-road road network extraction by introducing the WildRoad dataset and MaGRoad, a novel path-centric framework that overcomes the limitations of existing node-centric models to achieve state-of-the-art performance and faster inference in wild terrains.

Wenfei Guan, Jilin Mei, Tong Shen, Xumin Wu, Shuo Wang, Chen Min, Yu Hu2026-03-10💻 cs

ReMeDI: Refined Memory for Disambiguation of Identities with SAM3 in Surgical Segmentation

The paper introduces ReMeDI-SAM3, a training-free extension of SAM3 that enhances surgical instrument segmentation in endoscopy by implementing relevance-aware memory filtering, piecewise interpolation, and feature-based re-identification to overcome challenges like occlusions and rapid motion, achieving significant zero-shot performance improvements over existing methods.

Valay Bundele, Mehran Hosseinzadeh, Hendrik P. A. Lensch2026-03-10💻 cs

It is not always greener on the other side: Greenery perception across demographics and personalities in multiple cities

This study analyzes the discrepancies between objective and subjective urban greenery perceptions across five countries using street view imagery and a survey of 1,000 participants, revealing that while demographics and personality have little influence, an individual's geographic location is a primary factor shaping how they perceive green spaces.

Matias Quintana, Fangqi Liu, Jussi Torkko, Youlong Gu, Xiucheng Liang, Yujun Hou, Koichi Ito, Yihan Zhu, Mahmoud Abdelrahman, Tuuli Toivonen, Yi Lu, Filip Biljecki2026-03-10💻 cs

← Previous Next →