A comprehensive study of time-of-flight non-line-of-sight imaging

This paper presents a comprehensive study of Time-of-Flight non-line-of-sight imaging methods by unifying their theoretical formulations and hardware implementations to establish a common framework for analysis and demonstrate that, under equal constraints, existing techniques share similar performance limitations despite method-specific differences.

Julio Marco, Adrian Jarabo, Ji Hyun Nam, Alberto Tosi, Diego Gutierrez, Andreas VeltenWed, 11 Ma💻 cs

GeoSolver: Scaling Test-Time Reasoning in Remote Sensing with Fine-Grained Process Supervision

The paper introduces GeoSolver, a framework that enhances remote sensing reasoning by leveraging a large-scale process supervision dataset (Geo-PRM-2M) and a novel Process-Aware Tree-GRPO algorithm to train a token-level reward model (GeoPRM), thereby enabling verifiable, step-by-step reasoning and robust test-time scaling for both specialized and general-purpose Vision-Language Models.

Lang Sun, Ronghao Fu, Zhuoran Duan, Haoran Liu, Xueyan Liu, Bo YangWed, 11 Ma💻 cs

Trajectory Optimization for Self-Wrap-Aware Cable-Towed Planar Object Manipulation under Implicit Tension Constraints

This paper formulates cable-towed planar object manipulation as a routing-aware, tensioning-implicit trajectory optimization problem that leverages self-wrapping to dynamically redirect torque, proposing a relaxation hierarchy where the Implicit-Mode Relaxation (IMR) effectively exploits self-wrap for turning maneuvers without the conservatism of explicit routing decisions.

Yu Li, Amin Fakhari, Hamid SadeghianWed, 11 Ma💻 cs

ReTac-ACT: A State-Gated Vision-Tactile Fusion Transformer for Precision Assembly

ReTac-ACT is a state-gated vision-tactile fusion transformer that achieves high-precision assembly in occluded, contact-rich environments by dynamically prioritizing tactile feedback through bidirectional cross-attention and proprioception-conditioned gating, outperforming vision-only baselines on the NIST Assembly Task Board M1 benchmark.

Minchi Ruan, LiangQing Zhou, Hongtong Li, Zongtao Wang, ZhaoMing Lu, Jianwei Zhang, Bin FangWed, 11 Ma💻 cs

GeoAlignCLIP: Enhancing Fine-Grained Vision-Language Alignment in Remote Sensing via Multi-Granular Consistency Learning

The paper introduces GeoAlignCLIP, a unified framework that enhances fine-grained vision-language alignment in remote sensing by leveraging multi-granular semantic learning and intra-modal consistency, supported by a newly constructed hierarchical dataset (RSFG-100k) to outperform existing methods on diverse benchmarks.

Xiao Yang, Ronghao Fu, Zhuoran Duan, Zhiwen Lin, Xueyan Liu, Bo YangWed, 11 Ma💻 cs

More than the Sum: Panorama-Language Models for Adverse Omni-Scenes

This paper introduces the Panorama-Language Modeling (PLM) paradigm and the PanoVQA dataset to enable holistic $360^\circ$ vision-language reasoning in adverse omni-scenes, demonstrating that a unified panoramic approach yields superior understanding compared to stitching multiple narrow-field-of-view inputs.

Weijia Fan, Ruiping Liu, Jiale Wei, Yufan Chen, Junwei Zheng, Zichao Zeng, Jiaming Zhang, Qiufu Li, Linlin Shen, Rainer StiefelhagenWed, 11 Ma💻 cs

Preparing Students for AI-Driven Agile Development: A Project-Based AI Engineering Curriculum

This paper presents a project-based AI engineering curriculum that integrates agile practices with generative AI tools to prepare students for modern software development, demonstrating through a seven-sprint case study that embedding AI across the engineering lifecycle fosters hands-on competence while necessitating adaptations for tool evolution and foundational learning verification.

Andreas Rausch, Stefan Wittek, Tobias Geger, David InkermannWed, 11 Ma💻 cs

Nemo: A Low-Write-Amplification Cache for Tiny Objects on Log-Structured Flash Devices

Nemo is a novel flash cache design that reduces application-level write amplification for tiny-object workloads by intentionally increasing hash collisions to improve set fill rates, while simultaneously maintaining high memory efficiency and low miss ratios through a bloom filter-based indexing mechanism and hybrid hotness tracking.

Xufeng Yang, Tingting Tan, Jingxin Hu, Congming Gao, Mingyang Liu, Tianyang Jiang, Jian Chen, Linbo Long, Yina Lv, Jiwu ShuWed, 11 Ma💻 cs

A saccade-inspired approach to image classification using visiontransformer attention maps

This paper proposes a saccade-inspired image classification method that leverages DINO's Vision Transformer attention maps to selectively focus processing on task-relevant regions, achieving performance comparable to or better than full-image analysis while offering a biologically plausible approach to efficient visual processing.

Matthis Dallain, Laurent Rodriguez, Laurent Udo Perrinet, Benoît MiramondWed, 11 Ma💻 cs