Thinking Beyond Labels: Vocabulary-Free Fine-Grained Recognition using Reasoning-Augmented LMMs

The paper proposes FiNDR, a novel framework that leverages reasoning-augmented large multi-modal models to achieve state-of-the-art, vocabulary-free fine-grained image recognition by automatically generating, filtering, and utilizing descriptive candidate labels, thereby surpassing traditional methods that rely on fixed human-defined vocabularies.

Dmitry Demidov, Zaigham Zaheer, Zongyan Han + 2 more2026-02-27💻 cs

Beyond Pixel Simulation: Pathology Image Generation via Diagnostic Semantic Tokens and Prototype Control

UniPath is a novel framework that overcomes limitations in computational pathology image generation by leveraging mature diagnostic understanding to produce controllable, semantics-driven images via multi-stream control (raw text, diagnostic semantic tokens, and morphological prototypes) and a curated large-scale dataset, achieving state-of-the-art performance and fine-grained semantic fidelity.

Minghao Han, Yichen Liu, Yizhou Liu + 5 more2026-02-27💻 cs

WebGym: Scaling Training Environments for Visual Web Agents with Realistic Tasks

This paper introduces WebGym, a large-scale open-source environment with nearly 300,000 realistic web tasks and a high-throughput asynchronous rollout system, which enables reinforcement learning to significantly improve the performance of visual web agents on out-of-distribution websites, surpassing both proprietary models and prior open-source approaches.

Hao Bai, Alexey Taymanov, Tong Zhang + 2 more2026-02-27🤖 cs.LG

ThinkRL-Edit: Thinking in Reinforcement Learning for Reasoning-Centric Image Editing

This paper introduces ThinkRL-Edit, a reasoning-centric reinforcement learning framework that enhances instruction-driven image editing by decoupling visual reasoning from synthesis through Chain-of-Thought sampling, unbiased reward grouping, and binary checklist-based VLM evaluation to overcome limitations in exploration, reward fusion, and reward stability.

Hengjia Li, Liming Jiang, Qing Yan + 6 more2026-02-27💻 cs

Visible Light Positioning With Lamé Curve LEDs: A Generic Approach for Camera Pose Estimation

This paper proposes a generic Visible Light Positioning (VLP) algorithm called LC-VLP that utilizes Lamé curves as a unified representation for diverse LED shapes, enabling accurate camera pose estimation through a correspondence-free initialization and nonlinear optimization, which achieves superior performance over state-of-the-art methods with sub-4 cm average position accuracy.

Wenxuan Pan, Yang Yang, Dong Wei + 4 more2026-02-27⚡ eess

VQ-Style: Disentangling Style and Content in Motion with Residual Quantized Representations

This paper proposes VQ-Style, a novel framework that leverages Residual Vector Quantized Variational Autoencoders combined with contrastive learning and an information leakage loss to effectively disentangle human motion into coarse content and fine style representations, enabling zero-shot style transfer and other applications through a simple Quantized Code Swapping technique.

Fatemeh Zargarbashi, Dhruv Agrawal, Jakob Buhmann + 3 more2026-02-27🤖 cs.AI

Benchmarking Video Foundation Models for Remote Parkinson's Disease Screening

This paper presents a large-scale systematic benchmark of seven video foundation models on a novel dataset of 32,847 videos from 1,888 participants, revealing that model performance for remote Parkinson's disease screening is highly task-dependent and establishing a rigorous baseline with AUCs up to 85.3% while highlighting the need for task-aware calibration to improve sensitivity.

Md Saiful Islam, Ekram Hossain, Abdelrahman Abdelkader + 11 more2026-02-27💻 cs