RiO-DETR: DETR for Real-time Oriented Object Detection

RiO-DETR is the first real-time oriented object detection transformer that addresses challenges in angle estimation, periodicity, and convergence through novel designs like Content-Driven Angle Estimation and Decoupled Periodic Refinement, achieving a new speed-accuracy trade-off on benchmark datasets.

Zhangchi Hu, Yifan Zhao, Yansong Peng, Wenzhang Sun, Xiangchen Yin, Jie Chen, Peixi Wu, Hebei Li, Xinghao Wang, Dongsheng Jiang, Xiaoyan Sun2026-03-11💻 cs

CIGPose: Causal Intervention Graph Neural Network for Whole-Body Pose Estimation

CIGPose introduces a Causal Intervention Graph Neural Network framework that enhances whole-body pose estimation robustness by using a Structural Causal Model to identify and replace context-confounded keypoint representations with invariant embeddings, thereby achieving state-of-the-art performance on COCO-WholeBody without relying on extra training data.

Bohao Li, Zhicheng Cao, Huixian Li, Yangming Guo2026-03-11💻 cs

First Steps towards Categorical Algebraic Artificial Chemistry

This paper constructs a functor to define dynamics for an algebraic model of interacting components, generalizing the AlChemy artificial life model and exploring how category theory can formally connect algebraic structures with dynamical systems in artificial chemistry.

Joe Pratt-Johns (Edinburgh Napier University), Toby St. Clere Smithe (Kodamai Ltd), Chris Guiver (Edinburgh Napier University), Kevin Hughes (Edinburgh Napier University), Peter Andras (Edinburgh Napier University)2026-03-11💻 cs

Stein Variational Ergodic Surface Coverage with SE(3) Constraints

This paper introduces a preconditioned SE(3) Stein Variational Gradient Descent framework that reformulates point-cloud surface coverage as a manifold-aware sampling problem, enabling robots to generate high-quality, SE(3)-constrained trajectories that outperform existing optimization-based and sampling-as-optimization methods in both simulation and real-world experiments.

Jiayun Li, Yufeng Jin, Sangli Teng, Dejian Gong, Georgia Chalvatzaki2026-03-11💻 cs

SEA-Nav: Efficient Policy Learning for Safe and Agile Quadruped Navigation in Cluttered Environments

The paper introduces SEA-Nav, a reinforcement learning framework that combines differentiable control barrier functions, adaptive collision replay, and kinematic constraints to enable quadruped robots to achieve safe, agile, and efficient navigation in densely cluttered environments with minute-level training time.

Shiyi Chen, Mingye Yang, Haiyan Mao, Jiaqi Zhang, Haiyi Liu, Shuheng He, Debing Zhang, Zihao Qiu, Chun Zhang2026-03-11💻 cs

TopoOR: A Unified Topological Scene Representation for the Operating Room

TopoOR introduces a novel topological scene representation for surgical operating rooms that leverages higher-order structures and attention mechanisms to preserve complex multimodal relationships and manifold geometry, thereby outperforming traditional graph and LLM-based methods in safety-critical tasks like sterility breach detection and robot phase prediction.

Tony Danjun Wang, Ka Young Kim, Tolga Birdal, Nassir Navab, Lennart Bastian2026-03-11💻 cs

Experience Report on the Adaptable Integration of Requirements Engineering Courses into Curricula for Professionals

This paper reports on the authors' experience developing three professional software engineering curricula and proposes a systematic, content-mapping-based approach with guiding principles for effectively integrating Requirements Engineering courses into these dynamic and modular programs.

Oleksandr Kosenkov, Konstantin Blaschke, Tony Gorschek, Michael Unterkalmsteiner, Oleksandr Adamov, Davide Fucci2026-03-11💻 cs

The Patrologia Graeca Corpus: OCR, Annotation, and Open Release of Noisy Nineteenth-Century Polytonic Greek Editions

This paper introduces the Patrologia Graeca Corpus, a large-scale open resource featuring OCR-processed, lemmatized, and part-of-speech tagged text from degraded nineteenth-century bilingual Greek-Latin editions, which achieves state-of-the-art recognition accuracy and establishes a new benchmark for noisy polytonic Greek processing.

Chahan Vidal-Gorène (CJM, LIPN), Bastien Kindt2026-03-11💻 cs

OmniEarth: A Benchmark for Evaluating Vision-Language Models in Geospatial Tasks

This paper introduces OmniEarth, a comprehensive benchmark comprising 9,275 images and 44,210 verified instructions that evaluates Vision-Language Models across 28 geospatial tasks with a focus on perception, reasoning, and robustness, revealing significant performance gaps in current models for remote sensing applications.

Ronghao Fu, Haoran Liu, Weijie Zhang, Zhiwen Lin, Xiao Yang, Peng Zhang, Bo Yang2026-03-11💻 cs

MORE-R1: Guiding LVLM for Multimodal Object-Entity Relation Extraction via Stepwise Reasoning with Reinforcement Learning

The paper introduces MORE-R1, a novel Large Vision-Language Model that leverages a two-stage training process combining Supervised Fine-Tuning on automatically constructed stepwise reasoning data and Reinforcement Learning with Group Relative Policy Optimization to achieve state-of-the-art performance in Multimodal Object-Entity Relation Extraction.

Xiang Yuan, Xu Chu, Xinrong Chen, Haochen Li, Zonghong Dai, Hongcheng Fan, Xiaoyue Yuan, Weiping Li, Tong Mo2026-03-11💻 cs

Prune Redundancy, Preserve Essence: Vision Token Compression in VLMs via Synergistic Importance-Diversity

PruneSID is a training-free, synergistic importance-diversity framework that significantly enhances Vision-Language Model efficiency by employing Principal Semantic Components Analysis and Intra-group Non-Maximum Suppression to achieve state-of-the-art accuracy with extreme token compression and faster prefilling speeds.

Zhengyao Fang, Pengyuan Lyu, Chengquan Zhang, Guangming Lu, Jun Yu, Wenjie Pei2026-03-11💻 cs

Component-Aware Sketch-to-Image Generation Using Self-Attention Encoding and Coordinate-Preserving Fusion

This paper proposes a novel component-aware, self-refining framework that combines a Self-Attention-based Autoencoder, a Coordinate-Preserving Gated Fusion module, and a Spatially Adaptive Refinement Revisor to generate high-fidelity, semantically accurate photorealistic images from freehand sketches, significantly outperforming existing GAN and diffusion models across diverse facial and non-facial datasets.

Ali Zia, Muhammad Umer Ramzan, Usman Ali, Muhammad Faheem, Abdelwahed Khamis, Shahnawaz Qureshi2026-03-11💻 cs