Small Object Detection in Complex Backgrounds with Multi-Scale Attention and Global Relation Modeling

This paper proposes a novel framework for small object detection in complex backgrounds that integrates Residual Haar Wavelet Downsampling, Global Relation Modeling, Cross-Scale Hybrid Attention, and a Center-Assisted Loss to preserve fine-grained details, suppress noise, and enhance localization accuracy, achieving state-of-the-art performance on the RGBT-Tiny benchmark.

Wenguang Tao, Xiaotian Wang, Tian Yan + 2 more2026-03-05💻 cs

Adaptive Enhancement and Dual-Pooling Sequential Attention for Lightweight Underwater Object Detection with YOLOv10

This paper proposes a lightweight underwater object detection framework based on YOLOv10 that integrates a Multi-Stage Adaptive Enhancement module, a Dual-Pooling Sequential Attention mechanism, and a Focal Generalized IoU loss to significantly improve accuracy and robustness on benchmark datasets while maintaining a compact model size suitable for resource-constrained environments.

Md. Mushibur Rahman, Umme Fawzia Rahim, Enam Ahmed Taufik2026-03-05💻 cs

From Narrow to Panoramic Vision: Attention-Guided Cold-Start Reshapes Multimodal Reasoning

This paper identifies "Lazy Attention Localization" as a key bottleneck in multimodal cold-start training, where models fail to increase visual attention, and proposes the Attention-Guided Visual Anchoring and Reflection (AVAR) framework to effectively reshape attention distributions, achieving a 7.0% performance gain on multimodal reasoning benchmarks.

Ruilin Luo, Chufan Shi, Yizhen Zhang + 10 more2026-03-05🤖 cs.AI

DeepScan: A Training-Free Framework for Visually Grounded Reasoning in Large Vision-Language Models

DeepScan is a training-free framework that enhances visually grounded reasoning in Large Vision-Language Models by employing a bottom-up approach of Hierarchical Scanning, Refocusing, and Evidence-Enhanced Reasoning to effectively mitigate distractive contexts and achieve state-of-the-art performance across diverse architectures and scales.

Yangfu Li, Hongjian Zhan, Jiawei Chen + 3 more2026-03-05💻 cs

UniSync: Towards Generalizable and High-Fidelity Lip Synchronization for Challenging Scenarios

The paper introduces UniSync, a unified lip synchronization framework that combines mask-free pose-anchored training with mask-based blending inference to achieve high-fidelity, generalizable results across diverse real-world scenarios, including stylized avatars and challenging lighting conditions, while also proposing the RealWorld-LipSync benchmark for evaluation.

Ruidi Fan, Yang Zhou, Siyuan Wang + 3 more2026-03-05💻 cs

From Misclassifications to Outliers: Joint Reliability Assessment in Classification

This paper proposes a unified evaluation framework with new metrics (DS-F1 and DS-AURC) and an improved method (SURE+) to jointly assess and enhance classifier reliability by integrating out-of-distribution detection and in-distribution failure prediction, demonstrating that double scoring functions significantly outperform traditional single scoring approaches.

Yang Li, Youyang Sha, Yinzhi Wang + 4 more2026-03-05🤖 cs.LG

Architecture and evaluation protocol for transformer-based visual object tracking in UAV applications

This paper proposes a Modular Asynchronous Tracking Architecture (MATA) that integrates a transformer-based tracker with an Extended Kalman Filter and ego-motion compensation to address UAV tracking challenges, while introducing a hardware-independent evaluation protocol and a new Normalized Time to Failure (NT2F) metric to better quantify robustness and real-time performance on embedded systems.

Augustin Borne, Pierre Notin, Christophe Hennequin + 4 more2026-03-05💻 cs

Fine-grained Image Aesthetic Assessment: Learning Discriminative Scores from Relative Ranks

This paper introduces FGAesthetics, a large-scale fine-grained image aesthetic assessment database with pairwise comparison annotations, and proposes FGAesQ, a novel framework that leverages relative ranks through specialized tokenization and alignment techniques to achieve superior discriminative performance in both fine-grained and coarse-grained aesthetic evaluation scenarios.

Zhichao Yang, Jianjie Wang, Zhixianhe Zhang + 4 more2026-03-05💻 cs