DocCogito: Aligning Layout Cognition and Step-Level Grounded Reasoning for Document Understanding

DocCogito is a unified framework for document understanding that aligns global layout perception with structured, region-grounded reasoning through a lightweight layout tower and a deterministic Visual-Semantic Chain, achieving state-of-the-art performance on multiple benchmarks by enforcing systematic coupling between layout priors and evidence-based reasoning.

Yuchuan Wu, Minghan Zhuo, Teng Fu, Mengyang Zhao, Bin Li, Xiangyang Xue2026-03-10💻 cs

Inverse-dynamics observer design for a linear single-track vehicle model with distributed tire dynamics

This paper proposes an innovative inverse-dynamics observer that integrates a linear single-track vehicle model with a distributed tire representation described by hyperbolic partial differential equations to accurately estimate sideslip angles and tire forces using only yaw rate and lateral acceleration measurements, even under noise and model uncertainties.

Luigi Romano, Ole Morten Aamo, Jan Åslund, Erik Frisk2026-03-10💻 cs

EvolveReason: Self-Evolving Reasoning Paradigm for Explainable Deepfake Facial Image Identification

The paper proposes EvolveReason, a self-evolving reasoning paradigm that combines a human-like chain-of-thought framework, a forgery latent-space distribution capture module, and a reinforcement learning-based self-evolution strategy to enhance the accuracy, detail, and reliability of explainable deepfake facial image identification.

Binjia Zhou, Dawei Luo, Shuai Chen, Feng Xu, Seow, Haoyuan Li, Jiachi Wang, Jiawen Wang, Zunlei Feng, Yijun Bei2026-03-10💻 cs

GP-Tree: An in-memory spatial index combining adaptive grid cells with a prefix tree for efficient spatial querying

The paper proposes GP-Tree, a novel in-memory spatial index that combines adaptive grid cells with a prefix tree structure to replace coarse minimum bounding rectangles with fine-grained approximations, thereby significantly improving filtering accuracy and query performance for complex spatial objects compared to traditional indexes.

Xiangyang Yang, Xuefeng Guan, Lanxue Dang, Yi Xie, Qingyang Xu, Huayi Wu, Jiayao Wang2026-03-10💻 cs

On the Effectiveness of Code Representation in Deep Learning-Based Automated Patch Correctness Assessment

This paper presents the first extensive study evaluating over 500 models to demonstrate that graph-based code representations consistently outperform other methods in predicting patch correctness, thereby significantly improving the effectiveness of automated program repair tools.

Quanjun Zhang, Chunrong Fang, Haichuan Hu, Yuan Zhao, Weisong Sun, Yun Yang, Tao Zheng, Zhenyu Chen2026-03-10💻 cs

SketchGraphNet: A Memory-Efficient Hybrid Graph Transformer for Large-Scale Sketch Corpora Recognition

This paper introduces SketchGraphNet, a memory-efficient hybrid graph transformer that models free-hand sketches as structured graphs to achieve state-of-the-art recognition accuracy on the newly constructed 3.44-million-sample SketchGraph benchmark while significantly reducing computational resource requirements.

Shilong Chen, Mingyuan Li, Zhaoyang Wang, Zhonglin Ye, Haixing Zhao2026-03-10💻 cs

ICLR: In-Context Imitation Learning with Visual Reasoning

The paper presents ICLR, a novel framework that enhances in-context imitation learning for robots by augmenting demonstration prompts with structured visual reasoning traces and jointly training a unified autoregressive transformer to predict both future trajectories and actions, thereby improving success rates and generalization in complex manipulation tasks.

Toan Nguyen, Weiduo Yuan, Songlin Wei, Hui Li, Daniel Seita, Yue Wang2026-03-10💻 cs

Scale-Aware UAV-to-Satellite Cross-View Geo-Localization: A Semantic Geometric Approach

This paper proposes a semantic geometric framework that leverages small vehicles as metric anchors within a decoupled stereoscopic projection model to recover absolute scale from monocular UAV images, thereby enabling scale-adaptive satellite image cropping and significantly improving cross-view geo-localization robustness under real-world scale ambiguity.

Yibin Ye, Shuo Chen, Kun Wang, Xiaokai Song, Jisheng Dang, Qifeng Yu, Xichao Teng, Zhang Li2026-03-10💻 cs

How Long Can Unified Multimodal Models Generate Images Reliably? Taming Long-Horizon Interleaved Image Generation via Context Curation

This paper introduces UniLongGen, a training-free inference strategy that improves long-horizon interleaved image generation by dynamically curating context to discard accumulated visual noise, thereby overcoming the reliability collapse caused by dense visual token interference in unified multimodal models.

Haoyu Chen, Qing Liu, Yuqian Zhou, He Zhang, Zhaowen Wang, Mengwei Ren, Jingjing Ren, Xiang Wang, Zhe Lin, Lei Zhu2026-03-10💻 cs

CONSTANT: Towards High-Quality One-Shot Handwriting Generation with Patch Contrastive Enhancement and Style-Aware Quantization

The paper introduces CONSTANT, a novel one-shot handwriting generation framework that leverages Style-Aware Quantization and a latent patch-based contrastive objective within a diffusion model to overcome existing limitations in capturing diverse writer styles and generating high-quality, realistic handwritten images across multiple languages.

Anh-Duy Le, Van-Linh Pham, Thanh-Nam Vo, Xuan Toan Mai, Tuan-Anh Tran2026-03-10💻 cs

Evaluating Parkinson's Disease Detection in Anonymized Speech: A Performance and Acoustic Analysis

This paper evaluates the trade-off between privacy and Parkinson's disease detection in anonymized speech, demonstrating that while STT-TTS anonymization severely degrades diagnostic performance by erasing prosodic cues, kNN-VC effectively preserves macro-prosodic features to maintain high detection accuracy with only a minor performance drop.

Carlos Franzreb, Francisco Teixeira, Ben Luks, Sebastian Möller, Alberto Abad2026-03-10💻 cs

Targeted Speaker Poisoning Framework in Zero-Shot Text-to-Speech

This paper introduces a novel Speech Generation Speaker Poisoning (SGSP) framework to address privacy risks in zero-shot text-to-speech by modifying trained models to prevent the generation of specific speaker identities while maintaining utility for others, demonstrating effective protection for up to 15 speakers but revealing scalability challenges with larger sets due to identity overlap.

Thanapat Trachu, Thanathai Lertpetchpun, Sai Praneeth Karimireddy, Shrikanth Narayanan2026-03-10💻 cs