Pix2Key: Controllable Open-Vocabulary Retrieval with Semantic Decomposition and Self-Supervised Visual Dictionary Learning

Pix2Key is a novel composed image retrieval framework that utilizes semantic decomposition and self-supervised visual dictionary learning to represent queries and candidates as open-vocabulary dictionaries, thereby achieving superior intent-aware matching and diversity-aware reranking without relying on supervised triplets.

Guoyizhe Wei, Yang Jiao, Nan Xi + 4 more2026-02-27💻 cs

HARU-Net: Hybrid Attention Residual U-Net for Edge-Preserving Denoising in Cone-Beam Computed Tomography

This paper introduces HARU-Net, a novel Hybrid Attention Residual U-Net architecture that integrates hybrid attention transformers and residual learning to effectively denoise low-dose Cone-Beam Computed Tomography (CBCT) images while preserving critical anatomical edges, outperforming state-of-the-art methods in both image quality metrics and computational efficiency.

Khuram Naveed, Ruben Pauwels2026-02-27⚡ eess

DisQ-HNet: A Disentangled Quantized Half-UNet for Interpretable Multimodal Image Synthesis Applications to Tau-PET Synthesis from T1 and FLAIR MRI

DisQ-HNet is a novel, interpretable framework that synthesizes tau-PET images from T1 and FLAIR MRI by employing a Partial Information Decomposition-guided vector-quantized encoder and a Half-UNet decoder to disentangle modality contributions while preserving anatomical details and disease-relevant signals for Alzheimer's disease analysis.

Agamdeep S. Chopra, Caitlin Neher, Tianyi Ren + 2 more2026-02-27🤖 cs.AI

DrivePTS: A Progressive Learning Framework with Textual and Structural Enhancement for Driving Scene Generation

DrivePTS is a progressive learning framework that enhances autonomous driving scene generation by mitigating geometric condition inter-dependencies, enriching semantic context through multi-view hierarchical text descriptions, and improving structural fidelity via frequency-guided loss, thereby achieving state-of-the-art realism and controllability.

Zhechao Wang, Yiming Zeng, Lufan Ma + 4 more2026-02-27🤖 cs.AI

Guidance Matters: Rethinking the Evaluation Pitfall for Text-to-Image Generation

This paper exposes a critical evaluation pitfall where common human preference models are biased toward large guidance scales, leading to inflated scores despite degraded image quality, and proposes a novel guidance-aware evaluation framework (GA-Eval) alongside a new method (TDG) to demonstrate that many recent diffusion guidance improvements are illusory and that simply increasing CFG scales often outperforms them in practice.

Dian Xie, Shitong Shao, Lichen Bai + 5 more2026-02-27🤖 cs.AI

BetterScene: 3D Scene Synthesis with Representation-Aligned Generative Model

BetterScene enhances novel view synthesis for sparse, unconstrained real-world photos by integrating a feed-forward 3D Gaussian Splatting model with a Stable Video Diffusion backbone that is fine-tuned via temporal equivariance regularization and vision foundation model-aligned representations within its VAE module to produce consistent, artifact-free views.

Yuci Han, Charles Toth, John E. Anderson + 2 more2026-02-27🤖 cs.AI

ϕϕ-DPO: Fairness Direct Preference Optimization Approach to Continual Learning in Large Multimodal Models

This paper introduces ϕ\phi-DPO, a novel Fairness Direct Preference Optimization framework for Large Multimodal Models that mitigates both catastrophic forgetting and data imbalance-induced bias through a new loss function and pairwise preference alignment, achieving state-of-the-art performance in continual learning benchmarks.

Thanh-Dat Truong, Huu-Thien Tran, Jackson Cothren + 2 more2026-02-27🤖 cs.LG