cs papers | Gist.Science

Fast-BEV++: Fast by Algorithm, Deployable by Design

Fast-BEV++ is a vision-only Bird's-Eye-View perception framework that resolves the trade-off between accuracy and deployment efficiency by employing a hardware-oriented, kernel-free architecture to achieve a new state-of-the-art 0.488 NDS on nuScenes while delivering real-time inference at over 134 FPS.

Yuanpeng Chen, Hui Song, Sheng Yang, Wei Tao, Shanhui Mo, Shuang Zhang, Xiao Hua, Tiankun Zhao2026-03-09💻 cs

Photo3D: Advancing Photorealistic 3D Generation through Structure-Aligned Detail Enhancement

Photo3D is a framework that advances photorealistic 3D generation by leveraging GPT-4o-Image data within a structure-aligned multi-view synthesis pipeline to create detail-enhanced datasets, thereby enabling realistic texture refinement while preserving geometric consistency across diverse 3D-native generators.

Xinyue Liang, Zhinyuan Ma, Lingchen Sun, Yanjun Guo, Lei Zhang2026-03-09💻 cs

Modular Neural Image Signal Processing

This paper introduces a modular, fully learning-based neural image signal processing (ISP) framework that offers unprecedented control over intermediate rendering stages to enhance scalability, generalization, and flexibility, enabling a user-interactive photo-editing tool capable of unlimited post-editable re-rendering with competitive performance across multiple test sets.

Mahmoud Afifi, Zhongling Wang, Ran Zhang, Michael S. Brown2026-03-09💻 cs

UniCoR: Modality Collaboration for Robust Cross-Language Hybrid Code Retrieval

UniCoR is a novel self-supervised framework that addresses the challenges of insufficient semantic understanding, inefficient modality fusion, and weak cross-language generalization in hybrid code retrieval by employing multi-perspective supervised contrastive learning and representation distribution consistency, thereby achieving state-of-the-art performance on both empirical and large-scale benchmarks.

Yang Yang, Li Kuang, Jiakun Liu, Zhongxin Liu, Yingjie Xia, David Lo2026-03-09💻 cs

Towards Scalable Pre-training of Visual Tokenizers for Generation

This paper introduces VTP, a unified pre-training framework that optimizes visual tokenizers through joint image-text contrastive, self-supervised, and reconstruction losses to shift the latent space focus from low-level pixel accuracy to high-level semantics, thereby solving the "pre-training scaling problem" and enabling significantly improved, compute-efficient generative performance.

Jingfeng Yao, Yuda Song, Yucong Zhou, Xinggang Wang2026-03-09💻 cs

Reexamining Paradigms of End-to-End Data Movement

This paper argues that achieving high-performance end-to-end data movement requires shifting focus from raw network bandwidth to a holistic hardware-software co-design approach, introducing the "Drainage Basin Pattern" to identify and resolve bottlenecks across six critical paradigms ranging from network latency to host-side factors.

Chin Fang, Timothy Stitt, Michael J. McManus, Toshio Moriya2026-03-09✓ Author reviewed ⓘ💻 cs

SORS: A Modular, High-Fidelity Simulator for Soft Robots

This paper introduces SORS, a modular, high-fidelity simulator based on the finite element method and constrained nonlinear optimization that accurately models complex soft robot dynamics and contact interactions, effectively bridging the sim-to-real gap for prototyping and control optimization.

Manuel Mekkattu, Mike Y. Michelis, Robert K. Katzschmann2026-03-09💻 cs

Pretraining Frame Preservation for Lightweight Autoregressive Video History Embedding

This paper introduces a lightweight, pretrained history encoder that efficiently compresses long video histories into short embeddings using a frame query objective, enabling content-consistent autoregressive video generation under limited compute and memory constraints.

Lvmin Zhang, Shengqu Cai, Muyang Li, Chong Zeng, Beijia Lu, Anyi Rao, Song Han, Gordon Wetzstein, Maneesh Agrawala2026-03-09💻 cs

Spatial4D-Bench: A Versatile 4D Spatial Intelligence Benchmark

This paper introduces Spatial4D-Bench, a large-scale, multi-task benchmark comprising approximately 40,000 question-answer pairs across 18 tasks and six cognitive categories, designed to comprehensively evaluate and reveal the current limitations of Multimodal Large Language Models in achieving human-level 4D spatial intelligence.

Pan Wang, Yang Liu, Guile Wu, Eduardo R. Corral-Soto, Chengjie Huang, Binbin Xu, Dongfeng Bai, Xu Yan, Yuan Ren, Xingxin Chen, Yizhe Wu, Tao Huang, Wenjun Wan, Xin Wu, Pei Zhou, Xuyang Dai, Kangbo Lv, Hongbo Zhang, Yosef Fried, Aixue Ye, Bailan Feng, Zhenyu Chen, Zhen Li, Yingcong Chen, Yiyi Liao, Bingbing Liu2026-03-09💻 cs

VISO: Robust Underwater Visual-Inertial-Sonar SLAM with Photometric Rendering for Dense 3D Reconstruction

This paper presents VISO, a robust underwater SLAM system that fuses stereo cameras, IMUs, and 3D sonar with novel calibration and photometric rendering techniques to achieve accurate 6-DoF localization and real-time, high-fidelity dense 3D reconstruction in challenging aquatic environments.

Shu Pan, Simon Archieri, Ahmet Cinar, Jonatan Scharff Willners, Ignacio Carlucho, Yvan Petillot2026-03-09💻 cs

FlyPose: Towards Robust Human Pose Estimation From Aerial Views

The paper introduces FlyPose, a lightweight, real-time human pose estimation pipeline optimized for aerial UAV views that achieves significant accuracy improvements across multiple datasets and is successfully deployed on-board a quadrotor, accompanied by the release of a new challenging dataset called FlyPose-104.

Hassaan Farooq, Marvin Brenner, Peter Stütz2026-03-09💻 cs

InsSo3D: Inertial Navigation System and 3D Sonar SLAM for turbid environment inspection

This paper presents InsSo3D, a robust SLAM framework that fuses 3D sonar point clouds with Inertial Navigation System data to enable accurate, large-scale 3D mapping and drift correction for underwater inspections in turbid environments.

Simon Archieri, Ahmet Cinar, Shu Pan, Jonatan Scharff Willners, Michele Grimaldi, Ignacio Carlucho, Yvan Petillot2026-03-09💻 cs

OnlineSI: Taming Large Language Model for Online 3D Understanding and Grounding

The paper introduces OnlineSI, a framework that enables Multimodal Large Language Models to continuously improve spatial understanding and grounding in dynamic environments by maintaining a finite spatial memory and integrating 3D point cloud data with semantic information.

Zixian Liu, Zhaoxi Chen, Liang Pan, Ziwei Liu2026-03-09💻 cs

SRA 2: Variational Autoencoder Self-Representation Alignment for Efficient Diffusion Training

This paper introduces SRA 2, a lightweight intrinsic guidance framework that accelerates diffusion transformer training and improves generation quality by aligning intermediate latent features with pre-trained VAE features via a simple projection layer, eliminating the need for external encoders or dual-model setups while incurring minimal computational overhead.

Mengmeng Wang, Dengyang Jiang, Liuzhuozheng Li, Yucheng Lin, Guojiang Shen, Xiangjie Kong, Yong Liu, Guang Dai, Jingdong Wang2026-03-09💻 cs

A Structured Approach to Safety Case Construction for AI Systems

This paper addresses the inadequacy of traditional safety-case frameworks for modern AI systems by introducing comprehensive taxonomies of claims, arguments, and evidence, along with a reusable, structured template designed to construct credible and adaptive safety cases for generative and agentic AI.

Sung Une Lee, Liming Zhu, Md Shamsujjoha, Liming Dong, Qinghua Lu, Jieshan Chen, Lionel Briand2026-03-09💻 cs

FARTrack: Fast Autoregressive Visual Tracking with High Performance

The paper introduces FARTrack, a fast autoregressive visual tracking framework that leverages Task-Specific Self-Distillation and Inter-frame Autoregressive Sparsification to achieve real-time, high-performance tracking (70.6% AO on GOT-10k) with exceptional inference speeds of up to 343 FPS on GPU and 121 FPS on CPU.

Guijie Wang, Tong Lin, Yifan Bai, Anjia Cao, Shiyi Liang, Wangbo Zhao, Xing Wei2026-03-09💻 cs

SpatialReward: Bridging the Perception Gap in Online RL for Image Editing via Explicit Spatial Reasoning

The paper introduces SpatialReward, a reward model that leverages explicit spatial reasoning to overcome the "Attention Collapse" limitation in existing evaluators, thereby providing fine-grained, accurate signals that significantly enhance online reinforcement learning performance for image editing tasks.

Yancheng Long, Yankai Yang, Hongyang Wei, Wei Chen, Tianke Zhang, Haonan fan, Changyi Liu, Kaiyu Jiang, Jiankang Chen, Kaiyu Tang, Bin Wen, Fan Yang, Tingting Gao, Han Li, Shuo Yang2026-03-09💻 cs

(MGS) $^2$ -Net: Unifying Micro-Geometric Scale and Macro-Geometric Structure for Cross-View Geo-Localization

The paper proposes (MGS) $^2$ -Net, a geometry-grounded framework that unifies Micro-Geometric Scale Adaptation and Macro-Geometric Structure Filtering to overcome geometric misalignment and achieve state-of-the-art cross-view geo-localization performance.

Minglei Li, Mengfan He, Chunyu Li, Chao Chen, Xingyu Shao, Ziyang Meng2026-03-09💻 cs

APEX: Learning Adaptive High-Platform Traversal for Humanoid Robots

The paper presents APEX, a deep reinforcement learning framework that enables a 29-DoF Unitree G1 humanoid robot to autonomously traverse platforms up to 114% of its leg length by composing perceptive climbing, walking, and reconfiguration skills through a novel ratchet progress reward and robust sim-to-real perception strategies.

Yikai Wang, Tingxuan Leng, Changyi Lin, Shiqi Liu, Shir Simon, Bingqing Chen, Jonathan Francis, Ding Zhao2026-03-09💻 cs

Beyond Imitation: Reinforcement Learning-Based Sim-Real Co-Training for VLA Models

This paper proposes RL-Co, a reinforcement learning-based sim-real co-training framework that combines supervised fine-tuning on mixed real and simulated data with interactive simulation fine-tuning anchored by real-world data, achieving significant improvements in real-world success rates, generalization, and data efficiency for Vision-Language-Action models.

Liangzhi Shi, Shuaihang Chen, Feng Gao, Yinuo Chen, Kang Chen, Tonghe Zhang, Hongzhi Zang, Weinan Zhang, Chao Yu, Yu Wang2026-03-09💻 cs

← Previous Next →

cs