DrivePTS: A Progressive Learning Framework with Textual and Structural Enhancement for Driving Scene Generation

DrivePTS is a progressive learning framework that enhances autonomous driving scene generation by mitigating geometric condition inter-dependencies, enriching semantic context through multi-view hierarchical text descriptions, and improving structural fidelity via frequency-guided loss, thereby achieving state-of-the-art realism and controllability.

Zhechao Wang, Yiming Zeng, Lufan Ma + 4 more2026-02-27🤖 cs.AI

Guidance Matters: Rethinking the Evaluation Pitfall for Text-to-Image Generation

This paper exposes a critical evaluation pitfall where common human preference models are biased toward large guidance scales, leading to inflated scores despite degraded image quality, and proposes a novel guidance-aware evaluation framework (GA-Eval) alongside a new method (TDG) to demonstrate that many recent diffusion guidance improvements are illusory and that simply increasing CFG scales often outperforms them in practice.

Dian Xie, Shitong Shao, Lichen Bai + 5 more2026-02-27🤖 cs.AI

BetterScene: 3D Scene Synthesis with Representation-Aligned Generative Model

BetterScene enhances novel view synthesis for sparse, unconstrained real-world photos by integrating a feed-forward 3D Gaussian Splatting model with a Stable Video Diffusion backbone that is fine-tuned via temporal equivariance regularization and vision foundation model-aligned representations within its VAE module to produce consistent, artifact-free views.

Yuci Han, Charles Toth, John E. Anderson + 2 more2026-02-27🤖 cs.AI

ϕϕ-DPO: Fairness Direct Preference Optimization Approach to Continual Learning in Large Multimodal Models

This paper introduces ϕ\phi-DPO, a novel Fairness Direct Preference Optimization framework for Large Multimodal Models that mitigates both catastrophic forgetting and data imbalance-induced bias through a new loss function and pairwise preference alignment, achieving state-of-the-art performance in continual learning benchmarks.

Thanh-Dat Truong, Huu-Thien Tran, Jackson Cothren + 2 more2026-02-27🤖 cs.LG

Monocular Open Vocabulary Occupancy Prediction for Indoor Scenes

This paper introduces a novel framework for monocular open-vocabulary 3D occupancy prediction in indoor scenes that leverages geometry-only supervision and 3D Language-Embedded Gaussians, enhanced by an opacity-aware Poisson-based aggregation operator and a progressive temperature decay schedule to overcome feature mixing and convergence challenges, thereby achieving state-of-the-art performance on the Occ-ScanNet benchmark.

Changqing Zhou, Yueru Luo, Han Zhang + 2 more2026-02-27💻 cs