Guidance Matters: Rethinking the Evaluation Pitfall for Text-to-Image Generation

This paper exposes a critical evaluation pitfall where common human preference models are biased toward large guidance scales, leading to inflated scores despite degraded image quality, and proposes a novel guidance-aware evaluation framework (GA-Eval) alongside a new method (TDG) to demonstrate that many recent diffusion guidance improvements are illusory and that simply increasing CFG scales often outperforms them in practice.

Dian Xie, Shitong Shao, Lichen Bai + 5 more2026-02-27🤖 cs.AI

BetterScene: 3D Scene Synthesis with Representation-Aligned Generative Model

BetterScene enhances novel view synthesis for sparse, unconstrained real-world photos by integrating a feed-forward 3D Gaussian Splatting model with a Stable Video Diffusion backbone that is fine-tuned via temporal equivariance regularization and vision foundation model-aligned representations within its VAE module to produce consistent, artifact-free views.

Yuci Han, Charles Toth, John E. Anderson + 2 more2026-02-27🤖 cs.AI

ϕϕ-DPO: Fairness Direct Preference Optimization Approach to Continual Learning in Large Multimodal Models

This paper introduces ϕ\phi-DPO, a novel Fairness Direct Preference Optimization framework for Large Multimodal Models that mitigates both catastrophic forgetting and data imbalance-induced bias through a new loss function and pairwise preference alignment, achieving state-of-the-art performance in continual learning benchmarks.

Thanh-Dat Truong, Huu-Thien Tran, Jackson Cothren + 2 more2026-02-27🤖 cs.LG

Monocular Open Vocabulary Occupancy Prediction for Indoor Scenes

This paper introduces a novel framework for monocular open-vocabulary 3D occupancy prediction in indoor scenes that leverages geometry-only supervision and 3D Language-Embedded Gaussians, enhanced by an opacity-aware Poisson-based aggregation operator and a progressive temperature decay schedule to overcome feature mixing and convergence challenges, thereby achieving state-of-the-art performance on the Occ-ScanNet benchmark.

Changqing Zhou, Yueru Luo, Han Zhang + 2 more2026-02-27💻 cs

SPMamba-YOLO: An Underwater Object Detection Network Based on Multi-Scale Feature Enhancement and Global Context Modeling

This paper proposes SPMamba-YOLO, a novel underwater object detection network that integrates a Spatial Pyramid Pooling Enhanced Layer Aggregation Network (SPPELAN), a Pyramid Split Attention (PSA) mechanism, and a Mamba-based state space modeling module to effectively address challenges like light attenuation and small targets, achieving a 4.9% mAP@0.5 improvement over YOLOv8n on the URPC2022 dataset.

Guanghao Liao, Zhen Liu, Liyuan Cao + 2 more2026-02-27💻 cs