OD-RASE: Ontology-Driven Risk Assessment and Safety Enhancement for Autonomous Driving

The paper proposes OD-RASE, an ontology-driven framework that leverages large-scale visual language models and diffusion models to proactively identify accident-prone road structures and generate reliable infrastructure improvement proposals, thereby enhancing the safety of autonomous driving systems.

Kota Shimomura, Masaki Nambata, Atsuya Ishikawa, Ryota Mimura, Takayuki Kawabuchi, Takayoshi Yamashita, Koki Inoue2026-03-09💻 cs

SLER-IR: Spherical Layer-wise Expert Routing for All-in-One Image Restoration

The paper proposes SLER-IR, a novel all-in-one image restoration framework that utilizes spherical layer-wise expert routing, a spherical uniform degradation embedding with contrastive learning, and a global-local granularity fusion module to effectively overcome feature interference and spatial non-uniform degradations, achieving state-of-the-art performance across multiple restoration tasks.

Peng Shurui, Xin Lin, Shi Luo, Jincen Ou, Dizhe Zhang, Lu Qi, Truong Nguyen, Chao Ren2026-03-09💻 cs

LucidNFT: LR-Anchored Multi-Reward Preference Optimization for Generative Real-World Super-Resolution

LucidNFT is a multi-reward reinforcement learning framework for generative real-world super-resolution that addresses faithfulness hallucinations and optimization bottlenecks by introducing a degradation-robust consistency evaluator, a decoupled advantage normalization strategy, and a large-scale real-degradation dataset to achieve superior perceptual-faithfulness trade-offs.

Song Fei, Tian Ye, Sixiang Chen, Zhaohu Xing, Jianyu Lai, Lei Zhu2026-03-09💻 cs

Skeleton-to-Image Encoding: Enabling Skeleton Representation Learning via Vision-Pretrained Models

This paper introduces Skeleton-to-Image Encoding (S2I), a novel method that transforms heterogeneous 3D skeleton sequences into standardized image-like formats to leverage powerful vision-pretrained models for effective self-supervised skeleton representation learning and cross-modal action recognition.

Siyuan Yang, Jun Liu, Hao Cheng, Chong Wang, Shijian Lu, Hedvig Kjellstrom, Weisi Lin, Alex C. Kot2026-03-09🤖 cs.AI

CR-QAT: Curriculum Relational Quantization-Aware Training for Open-Vocabulary Object Detection

This paper proposes CR-QAT, a framework combining curriculum-based progressive quantization and text-centric relational knowledge distillation to mitigate the severe performance degradation of naive low-bit quantization in open-vocabulary object detection, thereby achieving significant accuracy improvements on zero-shot benchmarks.

Jinyeong Park, Donghwa Kim, Brent ByungHoon Kang, Hyeongboo Baek, Jibum Kim2026-03-09💻 cs

Breaking Smooth-Motion Assumptions: A UAV Benchmark for Multi-Object Tracking in Complex and Adverse Conditions

This paper introduces DynUAV, a comprehensive benchmark featuring over 1.7 million annotations across 42 video sequences to address the limitations of existing datasets in evaluating multi-object tracking under the intense ego-motion, scale variations, and motion blur characteristic of complex UAV operations.

Jingtao Ye, Kexin Zhang, Xunchi Ma, Yuehan Li, Guangming Zhu, Peiyi Shen, Linhua Jiang, Xiangdong Zhang, Liang Zhang2026-03-09💻 cs

HarvestFlex: Strawberry Harvesting via Vision-Language-Action Policy Adaptation in the Wild

This paper introduces HarvestFlex, the first study demonstrating that vision-language-action policies can be successfully adapted to real-world greenhouse strawberry harvesting using a closed-loop system with three-view RGB sensing and minimal teleoperated data, achieving a 74.0% success rate without relying on depth sensors or explicit geometric calibration.

Ziyang Zhao, Shuheng Wang, Zhonghua Miao, Ya Xiong2026-03-09💻 cs

Technical Report: Automated Optical Inspection of Surgical Instruments

This technical report details a collaboration with industry leaders in Pakistan's Sialkot surgical cluster to develop an Automated Optical Inspection system using deep learning models (YOLOv8, ResNet-152, and EfficientNet-b4) on a new dataset of 4,414 images to detect manufacturing defects in surgical instruments, thereby enhancing patient safety and manufacturing quality.

Zunaira Shafqat, Atif Aftab Ahmed Jilani, Qurrat Ul Ain2026-03-09🤖 cs.AI

RePer-360: Releasing Perspective Priors for 360^\circ Depth Estimation via Self-Modulation

RePer-360 is a distortion-aware self-modulation framework that adapts perspective-trained depth foundation models to 360° panoramic depth estimation by preserving pretrained priors through a lightweight geometry-aligned guidance module and a Self-Conditioned AdaLN-Zero mechanism, achieving superior performance with only 1% of the training data.

Cheng Guan, Chunyu Lin, Zhijie Shen, Junsong Zhang, Jiyuan Wang2026-03-09💻 cs