Post-Disaster Affected Area Segmentation with a Vision Transformer (ViT)-based EVAP Model using Sentinel-2 and Formosat-5 Imagery

This paper proposes a Vision Transformer-based framework that leverages PCA-driven weak supervision to expand limited manual annotations for refining disaster-affected area segmentation using Sentinel-2 and Formosat-5 imagery, thereby enhancing the reliability and scalability of the Taiwan Space Agency's Emergent Value Added Product (EVAP) in scenarios with scarce ground truth.

Yi-Shan Chu, Hsuan-Cheng Wei2026-03-10💻 cs

C-Koordinator: Interference-aware Management for Large-scale and Co-located Microservice Clusters

This paper presents C-Koordinator, an open-source platform developed at Alibaba that leverages multi-dimensional metrics to accurately predict CPI-based interference in large-scale, co-located microservice clusters, thereby achieving over 90.3% prediction accuracy and significantly reducing application latency across all percentiles compared to state-of-the-art systems.

Shengye Song, Minxian Xu, Zuowei Zhang + 5 more2026-03-10💻 cs

They See Me Rolling: High-Speed Event Vision-Based Tactile Roller Sensor for Large Surface Inspection

This paper presents a high-speed tactile roller sensor that integrates a neuromorphic camera with a modified event-based multi-view stereo approach and Bayesian fusion to achieve rapid, continuous, high-resolution 3D surface inspection of large industrial areas, operating 11 times faster than prior methods while maintaining sub-100-micron accuracy.

Akram Khairi, Hussain Sajwani, Abdallah Mohammad Alkilany, Laith AbuAssi, Mohamad Halwani, Islam Mohamed Zaid, Ahmed Awadalla, Dewald Swart, Abdulla Ayyad, Yahya Zweiri2026-03-10💻 cs

Dynamic Symbolic Execution for Semantic Difference Analysis of Component and Connector Architectures

This paper proposes and evaluates a Dynamic Symbolic Execution approach enhanced with runtime data collection to perform semantic difference analysis on MontiArc component-and-connector architectures, finding it promising for identifying execution traces while noting scalability as a primary limitation for larger systems.

Johanna Grahl, Bernhard Rumpe, Max Stachon, Sebastian Stüber2026-03-10💻 cs

Empowering Microscopic Traffic Simulators with Realistic Perception using Surrogate Sensor Models

This paper introduces MIDAR, a computationally efficient surrogate LiDAR detection model that leverages a geometry-aware Graph Transformer to generate realistic perception data from microscopic traffic simulators, thereby enabling high-fidelity, large-scale evaluation of autonomous vehicle applications without the heavy computational costs of traditional game-engine-based simulators.

Tianheng Zhu, Yiheng Feng2026-03-10💻 cs

TransUNet-GradCAM: A Hybrid Transformer-U-Net with Self-Attention and Explainable Visualizations for Foot Ulcer Segmentation

This paper presents TransUNet-GradCAM, a hybrid Vision Transformer-U-Net model that effectively segments diabetic foot ulcers by combining global attention with local feature extraction, achieving high accuracy on internal and external datasets while providing explainable visualizations for clinical utility.

Akwasi Asare, Mary Sagoe, Justice Williams Asare, Stephen Edward Moore2026-03-10💻 cs

S2^2Q-VDiT: Accurate Quantized Video Diffusion Transformer with Salient Data and Sparse Token Distillation

The paper proposes S2^2Q-VDiT, a post-training quantization framework for video diffusion transformers that achieves lossless performance under W4A6 quantization by utilizing Hessian-aware salient data selection and attention-guided sparse token distillation to overcome calibration variance and learning challenges caused by long token sequences.

Weilun Feng, Haotong Qin, Chuanguang Yang, Xiangqi Li, Han Yang, Yuqi Li, Zhulin An, Libo Huang, Michele Magno, Yongjun Xu2026-03-10💻 cs

SPEX: A Vision-Language Model for Land Cover Extraction on Spectral Remote Sensing Images

This paper introduces SPEX, the first multimodal vision-language model for land cover extraction in spectral remote sensing imagery, which leverages a newly constructed instruction-following dataset (SPIE) and specialized training strategies to effectively integrate spectral priors, achieving state-of-the-art performance and enhanced interpretability across five multispectral datasets.

Dongchen Si, Di Wang, Erzhong Gao, Xiaolei Qin, Liu Zhao, Jing Zhang, Minqiang Xu, Jianbo Zhan, Jianshe Wang, Lin Liu, Bo Du, Liangpei Zhang2026-03-10💻 cs

3D Gaussian Splatting with Fisheye Images: Field of View Analysis and Depth-Based Initialization

This paper presents the first evaluation of 3D Gaussian Splatting on real 200° fisheye imagery, demonstrating that 160° field-of-view yields optimal reconstruction quality and introducing a UniK3D-based depth initialization method that overcomes Structure-from-Motion failures in extreme wide-angle, distorted scenes.

Ulas Gunes, Matias Turkulainen, Mikhail Silaev, Juho Kannala, Esa Rahtu2026-03-10💻 cs

Unified and Semantically Grounded Domain Adaptation for Medical Image Segmentation

This paper introduces a unified, semantically grounded framework that learns a domain-agnostic probabilistic manifold of anatomical regularities to enable state-of-the-art, interpretable medical image segmentation in both source-accessible and source-free settings without relying on explicit cross-domain alignment.

Xin Wang, Yin Guo, Jiamin Xia, Kaiyu Zhang, Niranjan Balu, Mahmud Mossa-Basha, Linda Shapiro, Chun Yuan2026-03-10💻 cs

Video-EM: Event-Centric Episodic Memory for Long-Form Video Understanding

Video-EM introduces a training-free, event-centric episodic memory framework that enhances long-form video understanding by orchestrating an LLM to localize, segment, and refine query-relevant moments into a compact, temporally coherent event timeline, thereby overcoming the context limitations of existing Video-LLMs without requiring architectural changes.

Yun Wang, Long Zhang, Jingren Liu, Jiaqi Yan, Zhanjie Zhang, Jiahao Zheng, Ao Ma, Run Ling, Xun Yang, Dapeng Wu, Xiangyu Chen, Xuelong Li2026-03-10💻 cs

UniUGG: Unified 3D Understanding and Generation via Geometric-Semantic Encoding

This paper introduces UniUGG, the first unified framework that integrates 3D understanding and generation by employing an LLM for multimodal comprehension, a latent diffusion-based spatial decoder for high-quality 3D synthesis, and a geometric-semantic pretraining strategy to jointly capture spatial and semantic cues.

Yueming Xu, Jiahui Zhang, Ze Huang, Yurui Chen, Yanpeng Zhou, Zhenyu Chen, Yu-Jie Yuan, Pengxiang Xia, Guowei Huang, Xinyue Cai, Zhongang Qi, Xingyue Quan, Jianye Hao, Hang Xu, Li Zhang2026-03-10💻 cs

Efficient Diffusion-Based 3D Human Pose Estimation with Hierarchical Temporal Pruning

This paper proposes an efficient diffusion-based 3D human pose estimation framework that employs a Hierarchical Temporal Pruning (HTP) strategy to dynamically reduce computational costs through multi-level token pruning, achieving significant speedups and lower MACs while maintaining state-of-the-art performance on benchmark datasets.

Yuquan Bi, Hongsong Wang, Xinli Shi, Zhipeng Gui, Jie Gui, Yuan Yan Tang2026-03-10💻 cs

PointSlice: Accurate and Efficient Slice-Based Representation for 3D Object Detection from Point Clouds

PointSlice introduces a novel slice-based representation and a Slice Interaction Network to convert 3D point clouds into 2D data slices, achieving a superior balance between detection accuracy and efficiency by significantly reducing parameters and inference time while maintaining competitive performance on major autonomous driving benchmarks.

Liu Qifeng, Zhao Dawei, Dong Yabo, Xiao Liang, Wang Juan, Min Chen, Li Fuyang, Jiang Weizhong, Lu Dongming, Nie Yiming2026-03-10💻 cs