RealOSR: Latent Guidance Boosts Diffusion-based Real-world Omnidirectional Image Super-Resolutions

The paper proposes RealOSR, a diffusion-based framework for real-world omnidirectional image super-resolution that utilizes a novel Latent Gradient Alignment Routing (LaGAR) module to enable efficient one-step denoising, achieving significant visual quality improvements and over 200×\times inference acceleration compared to existing methods.

Xuhan Sheng, Runyi Li, Bin Chen + 3 more2026-03-04⚡ eess

HSSBench: Benchmarking Humanities and Social Sciences Ability for Multimodal Large Language Models

This paper introduces HSSBench, a comprehensive multilingual benchmark featuring over 13,000 samples generated through a novel expert-agent collaboration pipeline, designed to evaluate and address the current limitations of Multimodal Large Language Models in handling the interdisciplinary and abstract reasoning tasks characteristic of the Humanities and Social Sciences.

Zhaolu Kang, Junhao Gong, Jiaxu Yan + 15 more2026-03-04🤖 cs.AI

Perception-R1: Advancing Multimodal Reasoning Capabilities of MLLMs via Visual Perception Reward

Perception-R1 addresses the limitation of existing RLVR methods in enhancing multimodal perception by introducing a novel visual perception reward derived from Chain-of-Thought annotations, which effectively boosts both perception and reasoning capabilities of Multimodal Large Language Models to achieve state-of-the-art performance with minimal training data.

Tong Xiao, Xin Xu, Zhenya Huang + 4 more2026-03-04🤖 cs.AI

StreamSplat: Towards Online Dynamic 3D Reconstruction from Uncalibrated Video Streams

StreamSplat is a fully feed-forward framework that enables real-time, online reconstruction of dynamic 3D scenes from uncalibrated video streams into 3D Gaussian Splatting representations, achieving state-of-the-art quality with a 1200x speedup over traditional optimization-based methods through probabilistic sampling, bidirectional deformation, and adaptive Gaussian fusion.

Zike Wu, Qi Yan, Xuanyu Yi + 2 more2026-03-04🤖 cs.LG

SceneStreamer: Continuous Scenario Generation as Next Token Group Prediction

SceneStreamer is a unified autoregressive transformer framework that generates continuous, long-horizon traffic scenarios by predicting sequences of tokens representing dynamic elements like agents and traffic signals, thereby enabling the creation of realistic, diverse, and adaptive environments that significantly improve the robustness and generalization of autonomous driving policies.

Zhenghao Peng, Yuxin Liu, Bolei Zhou2026-03-04💻 cs