See the Speaker: Crafting High-Resolution Talking Faces from Speech with Prior Guidance and Region Refinement

This paper presents a novel method for generating high-resolution, high-quality talking face videos exclusively from a single speech input by utilizing a speech-conditioned diffusion model with statistical facial priors, region-enhanced lip synchronization, and a Transformer-based discrete codebook for end-to-end detail refinement.

Jinting Wang, Jun Wang, Hei Victor Cheng + 1 more2026-03-03⚡ eess

ThinkMorph: Emergent Properties in Multimodal Interleaved Chain-of-Thought Reasoning

The paper introduces ThinkMorph, a unified model fine-tuned on high-quality interleaved reasoning traces that treats text and image thoughts as complementary modalities, achieving significant performance gains on vision-centric benchmarks and demonstrating emergent multimodal intelligence such as adaptive reasoning and unseen visual manipulation skills.

Jiawei Gu, Yunzhuo Hao, Huichen Will Wang + 5 more2026-03-03💻 cs

AdaptVision: Efficient Vision-Language Models via Adaptive Visual Acquisition

AdaptVision is an efficient Vision-Language Model paradigm that mimics human active vision by using a reinforcement learning framework with Decoupled Turn Policy Optimization to autonomously determine and acquire the minimum necessary visual tokens via a coarse-to-fine process, thereby achieving superior performance with significantly reduced computational overhead compared to existing methods.

Zichuan Lin, Yicheng Liu, Yang Yang + 2 more2026-03-03💬 cs.CL

Fourier-Attentive Representation Learning: A Fourier-Guided Framework for Few-Shot Generalization in Vision-Language Models

This paper proposes Fourier-Attentive Representation Learning (FARL), a novel framework that enhances few-shot generalization in Vision-Language Models by explicitly disentangling image structure and style via Fourier analysis and a dual cross-attention mechanism to guide robust vision-language alignment.

Hieu Dinh Trung Pham, Huy Minh Nhat Nguyen, Cuong Tuan Nguyen2026-03-03💻 cs