Benchmarking Vision-Based Object Tracking for USVs in Complex Maritime Environments

This study proposes and validates a vision-guided object-tracking framework for unmanned surface vehicles (USVs) in complex maritime environments by benchmarking seven deep learning-based trackers and control algorithms, ultimately identifying the Transformer-based SeqTrack and Linear Quadratic Regulator (LQR) controller as the most robust solution for stable tracking under adverse conditions.

Muhayy Ud Din, Ahsan B. Bakht, Waseem Akram + 3 more2026-02-26💻 cs

VOILA: Evaluation of MLLMs For Perceptual Understanding and Analogical Reasoning

This paper introduces VOILA, a dynamic benchmark that evaluates multimodal large language models' ability to perform abstract relational reasoning through visual analogies, revealing that current models significantly struggle with inter-image relationships compared to human performance despite improvements from multi-step prompting strategies.

Nilay Yilmaz, Maitreya Patel, Yiran Lawrence Luo + 4 more2026-02-26💬 cs.CL

PD-VLA: Accelerating Vision-Language-Action Model Integrated with Action Chunking via Parallel Decoding

This paper introduces PD-VLA, a training-free parallel decoding framework that accelerates Vision-Language-Action models integrated with action chunking by reformulating autoregressive decoding as a parallel fixed-point iteration system, thereby significantly improving inference speed while maintaining competitive performance in both simulation and real-world robotic tasks.

Wenxuan Song, Jiayi Chen, Pengxiang Ding + 9 more2026-02-26💻 cs

Identifying Memorization of Diffusion Models through pp-Laplace Analysis: Estimators, Bounds and Applications

This paper proposes a novel method for identifying memorized training data in diffusion models by leveraging pp-Laplace operators derived from estimated score functions, providing both theoretical error bounds and empirical validation on text-to-image models where the approach successfully detects memorization even without access to the conditioning text.

Jonathan Brokman, Itay Gershon, Amit Giloni + 4 more2026-02-26🔢 math

Transformer-based cardiac substructure segmentation from contrast and non-contrast computed tomography for radiotherapy planning

This study demonstrates that a hybrid pretrained transformer-convolutional network (SMIT) utilizing balanced curriculum learning achieves data-efficient, robust cardiac substructure segmentation across diverse CT imaging protocols and patient populations, outperforming standard nnU-Net and TotalSegmentator while requiring significantly fewer annotated training scans.

Aneesh Rangnekar, Nikhil Mankuzhy, Jonas Willmann + 5 more2026-02-26⚡ eess

Voxel Densification for Serialized 3D Object Detection: Mitigating Sparsity via Pre-serialization Expansion

This paper proposes a Voxel Densification Module (VDM) that utilizes pre-serialization spatial expansion via sparse 3D convolutions to overcome the inherent voxel dimension constraints of serialized 3D object detection frameworks, thereby significantly enhancing detection accuracy across multiple benchmarks while managing computational costs through strategic downsampling.

Qifeng Liu, Dawei Zhao, Yabo Dong + 6 more2026-02-26💻 cs

MedicalPatchNet: A Patch-Based Self-Explainable AI Architecture for Chest X-ray Classification

MedicalPatchNet is a novel, inherently self-explainable deep learning architecture for chest X-ray classification that achieves performance comparable to EfficientNetV2-S while significantly improving diagnostic interpretability and pathology localization accuracy through a transparent patch-based aggregation mechanism, thereby enhancing clinical trust and safety.

Patrick Wienholt, Christiane Kuhl, Jakob Nikolas Kather + 2 more2026-02-26🤖 cs.LG