Why iCloud Fails: The Category Mistake of Cloud Synchronization

This paper argues that iCloud's fundamental failure in supporting complex workflows stems from a "Category Mistake" where its POSIX-like filesystem interface falsely projects a linear temporal chain onto a distributed causal graph, a structural error that causes data divergence and corruption but could be resolved by adopting Open Atomic Ethernet's transactional semantics to align protocol behavior with physical reality.

Paul Borrill2026-03-10💻 cs

Object-Scene-Camera Decomposition and Recomposition for Data-Efficient Monocular 3D Object Detection

This paper proposes an online data manipulation scheme that decomposes training images into independent object, scene, and camera components and recomposes them with perturbed poses to generate diverse training data, thereby improving the data efficiency and performance of monocular 3D object detection models across both fully and sparsely supervised settings.

Zhaonian Kuang, Rui Ding, Meng Yang + 2 more2026-03-10💻 cs

Cycle-Consistent Tuning for Layered Image Decomposition

This paper presents a cycle-consistent tuning framework that leverages lightweight LoRA adaptation of pretrained diffusion models to achieve robust, high-fidelity layered image decomposition, specifically for challenging logo-object separation, by enforcing bidirectional reconstruction consistency and iteratively refining performance through a progressive self-improving process.

Zheng Gu, Min Lu, Zhida Sun, Dani Lischinski, Daniel Cohen-Or, Hui Huang2026-03-10💻 cs

See It, Say It, Sorted: An Iterative Training-Free Framework for Visually-Grounded Multimodal Reasoning in LVLMs

This paper proposes "See It, Say It, Sorted," a lightweight, training-free, and plug-and-play framework that mitigates visual hallucination in large vision-language models by iteratively supervising each reasoning step with dynamically extracted visual evidence, thereby significantly improving reasoning accuracy without requiring additional model training.

Yongchang Zhang, Oliver Ma, Tianyi Liu, Guangquan Zhou, Yang Chen2026-03-10💻 cs

ARLArena: A Unified Framework for Stable Agentic Reinforcement Learning

This paper introduces ARLArena, a unified framework that systematically analyzes training instability in agentic reinforcement learning to derive SAMPO, a stable optimization method that ensures consistent performance across diverse agentic tasks.

Xiaoxuan Wang, Han Zhang, Haixin Wang, Yidan Shi, Ruoyan Li, Kaiqiao Han, Chenyi Tong, Haoran Deng, Renliang Sun, Alexander Taylor, Yanqiao Zhu, Jason Cong, Yizhou Sun, Wei Wang2026-03-10💻 cs

Vibe Researching as Wolf Coming: Can AI Agents with Skills Replace or Augment Social Scientists?

This paper argues that AI agents equipped with specialized skills can augment, but not fully replace, social scientists by executing codifiable research tasks autonomously through "vibe researching," while highlighting the enduring necessity of human theoretical originality and tacit knowledge alongside the profession's emerging risks of stratification and pedagogical crisis.

Yongjun Zhang2026-03-10💻 cs

WISER: Wider Search, Deeper Thinking, and Adaptive Fusion for Training-Free Zero-Shot Composed Image Retrieval

WISER is a training-free framework for Zero-Shot Composed Image Retrieval that unifies Text-to-Image and Image-to-Image paradigms through a "retrieve-verify-refine" pipeline, leveraging wider search, adaptive fusion, and self-reflection to significantly outperform existing methods across diverse benchmarks.

Tianyue Wang, Leigang Qu, Tianyu Yang, Xiangzhao Hao, Yifan Xu, Haiyun Guo, Jinqiao Wang2026-03-10💻 cs

PackUV: Packed Gaussian UV Maps for 4D Volumetric Video

The paper introduces PackUV, a novel 4D Gaussian representation and fitting method that maps volumetric video attributes into structured UV atlases for efficient, codec-compatible storage and streaming, while demonstrating superior temporal consistency and rendering fidelity on the newly proposed large-scale PackUV-2B dataset.

Aashish Rai, Angela Xing, Anushka Agarwal, Xiaoyan Cong, Zekun Li, Tao Lu, Aayush Prakash, Srinath Sridhar2026-03-10💻 cs

Annotation-Free Visual Reasoning for High-Resolution Large Multimodal Models via Reinforcement Learning

This paper proposes HART, an annotation-free framework that leverages a novel Advantage Preference Group Relative Policy Optimization (AP-GRPO) algorithm to enable Large Multimodal Models to autonomously identify and verify key high-resolution image regions, thereby improving reasoning performance without requiring costly human grounding labels.

Jiacheng Yang, Anqi Chen, Yunkai Dang, Qi Fan, Cong Wang, Wenbin Li, Feng Miao, Yang Gao2026-03-10💻 cs

Self-Attention And Beyond the Infinite: Towards Linear Transformers with Infinite Self-Attention

This paper introduces Infinite Self-Attention (InfSA) and its linear-time variant, Linear-InfSA, a spectral reformulation of self-attention as a diffusion process on token graphs that achieves state-of-the-art ImageNet accuracy and enables efficient, memory-free inference at ultra-high resolutions (up to 9216×9216) by replacing the quadratic softmax cost with a Neumann series approximation.

Giorgio Roffo, Luke Palmer2026-03-10💻 cs