cs papers | Gist.Science

DocCogito: Aligning Layout Cognition and Step-Level Grounded Reasoning for Document Understanding

DocCogito is a unified framework for document understanding that aligns global layout perception with structured, region-grounded reasoning through a lightweight layout tower and a deterministic Visual-Semantic Chain, achieving state-of-the-art performance on multiple benchmarks by enforcing systematic coupling between layout priors and evidence-based reasoning.

Yuchuan Wu, Minghan Zhuo, Teng Fu, Mengyang Zhao, Bin Li, Xiangyang Xue2026-03-10💻 cs

From Thinker to Society: Security in Hierarchical Autonomy Evolution of AI Agents

This paper proposes the Hierarchical Autonomy Evolution (HAE) framework to address critical security vulnerabilities in evolving AI agents by categorizing threats and defenses across three tiers: Cognitive, Execution, and Collective Autonomy.

Xiaolei Zhang, Lu Zhou, Xiaogang Xu, Jiafei Wu, Tianyu Du, Heqing Huang, Hao Peng, Zhe Liu2026-03-10💻 cs

AMR-CCR: Anchored Modular Retrieval for Continual Chinese Character Recognition

This paper proposes AMR-CCR, an anchored modular retrieval framework with script-conditioned injection and multi-prototype dictionaries to address the challenges of continual, class-incremental ancient Chinese character recognition, accompanied by the new EvoCON benchmark for systematic evaluation.

Yuchuan Wu, Yinglian Zhu, Haiyang Yu, Ke Niu, Bin Li, Xiangyang Xue2026-03-10💻 cs

Inverse-dynamics observer design for a linear single-track vehicle model with distributed tire dynamics

This paper proposes an innovative inverse-dynamics observer that integrates a linear single-track vehicle model with a distributed tire representation described by hyperbolic partial differential equations to accurately estimate sideslip angles and tire forces using only yaw rate and lateral acceleration measurements, even under noise and model uncertainties.

Luigi Romano, Ole Morten Aamo, Jan Åslund, Erik Frisk2026-03-10💻 cs

SeDa: A Unified System for Dataset Discovery and Multi-Entity Augmented Semantic Exploration

SeDa is a unified framework that aggregates over 7.6 million datasets from more than 200 platforms to enable trustworthy, semantically enriched, and multi-entity augmented exploration through standardized metadata, a dynamic tag graph, and provenance assurance.

Kan Ling, Zhen Qin, Yichi Zhu, Hengrun Zhang, Huiqun Yu, Guisheng Fan2026-03-10💻 cs

High-Fidelity Medical Shape Generation via Skeletal Latent Diffusion

This paper proposes a skeletal latent diffusion framework that leverages a differentiable skeletonization module and a large-scale MedSDF dataset to achieve high-fidelity, computationally efficient medical shape generation while effectively addressing challenges posed by anatomical geometric complexity and data scarcity.

Guoqing Zhang, Jingyun Yang, Siqi Chen, Anping Zhang, Yang Li2026-03-10💻 cs

Brexit Means Brexit: Selection Bias, Echo Chambers, and Entrenched Opinion on Reddit

This paper presents an end-to-end framework analyzing the r/Brexit subreddit to demonstrate that political polarization on Reddit is driven by self-selection and echo chambers, where user opinions become entrenched rather than softened by cross-cutting exposure.

Marian-Andrei Rizoiu, Duy Khuu, Andrew Law, Christine Largeron2026-03-10💻 cs

EvolveReason: Self-Evolving Reasoning Paradigm for Explainable Deepfake Facial Image Identification

The paper proposes EvolveReason, a self-evolving reasoning paradigm that combines a human-like chain-of-thought framework, a forgery latent-space distribution capture module, and a reinforcement learning-based self-evolution strategy to enhance the accuracy, detail, and reliability of explainable deepfake facial image identification.

Binjia Zhou, Dawei Luo, Shuai Chen, Feng Xu, Seow, Haoyuan Li, Jiachi Wang, Jiawen Wang, Zunlei Feng, Yijun Bei2026-03-10💻 cs

InterReal: A Unified Physics-Based Imitation Framework for Learning Human-Object Interaction Skills

InterReal is a unified physics-based imitation learning framework that enables humanoid robots to robustly learn and execute complex human-object interaction skills in real-world settings through a novel motion data augmentation scheme and an automatic reward learner.

Dayang Liang, Yuhang Lin, Xinzhe Liu, Jiyuan Shi, Yunlong Liu, Chenjia Bai2026-03-10💻 cs

GP-Tree: An in-memory spatial index combining adaptive grid cells with a prefix tree for efficient spatial querying

The paper proposes GP-Tree, a novel in-memory spatial index that combines adaptive grid cells with a prefix tree structure to replace coarse minimum bounding rectangles with fine-grained approximations, thereby significantly improving filtering accuracy and query performance for complex spatial objects compared to traditional indexes.

Xiangyang Yang, Xuefeng Guan, Lanxue Dang, Yi Xie, Qingyang Xu, Huayi Wu, Jiayao Wang2026-03-10💻 cs

On the Effectiveness of Code Representation in Deep Learning-Based Automated Patch Correctness Assessment

This paper presents the first extensive study evaluating over 500 models to demonstrate that graph-based code representations consistently outperform other methods in predicting patch correctness, thereby significantly improving the effectiveness of automated program repair tools.

Quanjun Zhang, Chunrong Fang, Haichuan Hu, Yuan Zhao, Weisong Sun, Yun Yang, Tao Zheng, Zhenyu Chen2026-03-10💻 cs

SketchGraphNet: A Memory-Efficient Hybrid Graph Transformer for Large-Scale Sketch Corpora Recognition

This paper introduces SketchGraphNet, a memory-efficient hybrid graph transformer that models free-hand sketches as structured graphs to achieve state-of-the-art recognition accuracy on the newly constructed 3.44-million-sample SketchGraph benchmark while significantly reducing computational resource requirements.

Shilong Chen, Mingyuan Li, Zhaoyang Wang, Zhonglin Ye, Haixing Zhao2026-03-10💻 cs

ICLR: In-Context Imitation Learning with Visual Reasoning

The paper presents ICLR, a novel framework that enhances in-context imitation learning for robots by augmenting demonstration prompts with structured visual reasoning traces and jointly training a unified autoregressive transformer to predict both future trajectories and actions, thereby improving success rates and generalization in complex manipulation tasks.

Toan Nguyen, Weiduo Yuan, Songlin Wei, Hui Li, Daniel Seita, Yue Wang2026-03-10💻 cs

MIRO: Multi-radar Identity and Ranging for Occupational Safety

MIRO is a privacy-preserving framework that combines distributed particulate matter sensors with a multi-radar mmWave re-identification system, utilizing GAN-based view adaptation to track workers and estimate their specific exposure to airborne pollutants in industrial environments without relying on visual data.

Tirthankar Halder, Argha Sen, Swadhin Pradhan, Rijurekha Sen, Sandip Chakraborty2026-03-10💻 cs

ACCURATE: Arbitrary-shaped Continuum Reconstruction Under Robust Adaptive Two-view Estimation

The paper proposes ACCURATE, a robust 3D reconstruction framework that combines image segmentation with geometry-constrained topology traversal and dynamic programming to achieve high-accuracy reconstruction of arbitrary-shaped, deformable continuum bodies like guidewires and catheters under biplanar X-ray imaging.

Yaozhi Zhang, Shun Yu, Yugang Zhang, Yang Liu2026-03-10💻 cs

Scale-Aware UAV-to-Satellite Cross-View Geo-Localization: A Semantic Geometric Approach

This paper proposes a semantic geometric framework that leverages small vehicles as metric anchors within a decoupled stereoscopic projection model to recover absolute scale from monocular UAV images, thereby enabling scale-adaptive satellite image cropping and significantly improving cross-view geo-localization robustness under real-world scale ambiguity.

Yibin Ye, Shuo Chen, Kun Wang, Xiaokai Song, Jisheng Dang, Qifeng Yu, Xichao Teng, Zhang Li2026-03-10💻 cs

How Long Can Unified Multimodal Models Generate Images Reliably? Taming Long-Horizon Interleaved Image Generation via Context Curation

This paper introduces UniLongGen, a training-free inference strategy that improves long-horizon interleaved image generation by dynamically curating context to discard accumulated visual noise, thereby overcoming the reliability collapse caused by dense visual token interference in unified multimodal models.

Haoyu Chen, Qing Liu, Yuqian Zhou, He Zhang, Zhaowen Wang, Mengwei Ren, Jingjing Ren, Xiang Wang, Zhe Lin, Lei Zhu2026-03-10💻 cs

CONSTANT: Towards High-Quality One-Shot Handwriting Generation with Patch Contrastive Enhancement and Style-Aware Quantization

The paper introduces CONSTANT, a novel one-shot handwriting generation framework that leverages Style-Aware Quantization and a latent patch-based contrastive objective within a diffusion model to overcome existing limitations in capturing diverse writer styles and generating high-quality, realistic handwritten images across multiple languages.

Anh-Duy Le, Van-Linh Pham, Thanh-Nam Vo, Xuan Toan Mai, Tuan-Anh Tran2026-03-10💻 cs

Evaluating Parkinson's Disease Detection in Anonymized Speech: A Performance and Acoustic Analysis

This paper evaluates the trade-off between privacy and Parkinson's disease detection in anonymized speech, demonstrating that while STT-TTS anonymization severely degrades diagnostic performance by erasing prosodic cues, kNN-VC effectively preserves macro-prosodic features to maintain high detection accuracy with only a minor performance drop.

Carlos Franzreb, Francisco Teixeira, Ben Luks, Sebastian Möller, Alberto Abad2026-03-10💻 cs

Targeted Speaker Poisoning Framework in Zero-Shot Text-to-Speech

This paper introduces a novel Speech Generation Speaker Poisoning (SGSP) framework to address privacy risks in zero-shot text-to-speech by modifying trained models to prevent the generation of specific speaker identities while maintaining utility for others, demonstrating effective protection for up to 15 speakers but revealing scalability challenges with larger sets due to identity overlap.

Thanapat Trachu, Thanathai Lertpetchpun, Sai Praneeth Karimireddy, Shrikanth Narayanan2026-03-10💻 cs

← Previous Next →