IDOL: Instant Photorealistic 3D Human Creation from a Single Image

We introduce a large-scale HUman-centric GEnerated dataset, HuGe100K. Leveraging the diversity in views, poses, and appearances within HuGe100K, we propose a scalable feed-forward transformer model to predict a 3D human Gaussian representation in a uniform space from a given human image.
CVPR 2025 Project Page Code

FATE: Full-head Gaussian Avatar with Textural Editing from Monocular Video

We introduce FATE — a novel method for reconstructing an editable full-head avatar from a single monocular video. FATE outperforms previous approaches in both qualitative and quantitative evaluations, achieving state-of-the-art performance. To the best of our knowledge, FATE is the first animatable and 360° full-head monocular reconstruction method for a 3D head avatar.
CVPR 2025 Project Page Code

Hallo2: Long-Duration and High-Resolution Audio-Driven Portrait Image Animation

We present advancements in portrait image animation through the enhanced capabilities of the Hallo framework. By extending animation durations to tens of minutes while maintaining highresolution 4K output, our approach addresses significant limitations of existing methods.
ICLR 2025 Project Page Code

VividTalk: One-Shot Audio-Driven Talking Head Generation Based on 3D Hybrid Prior

We propose VividTalk, a two-stage generic framework that supports generating high-visual quality talking head videos with all the above properties. Extensive experiments show that the proposed VividTalk can generate high-visual quality talking head videos with lip-sync and realistic enhanced by a large margin.
3DV 2025 Project Page

Towards Native Generative Model for 3D Head Avatar

We explore learning a native generative model for 360° full head from limited 3D head data. Three key problems are studied: 1) utilizing various representations for 360°-renderable head generation; 2) disentangling face appearance, shape, and motion for editable and motion-driven 3D head models; 3) enhancing model generalization for downstream tasks.
arXiv 2024

Head360: Learning a Parametric 3D Full-Head for Free-View Synthesis in 360°

We propose a novel parametric 360° renderable head model built from artist-designed high-fidelity 3D head models, disentangling facial motion/shape and appearance. The model is the first parametric 3D full-head that achieves 360° free-view synthesis, image-based fitting, appearance editing, and animation within a single model.
ECCV 2024 Project Page Code

EmoTalk3D: High-Fidelity Free-View Synthesis of Emotional 3D Talking Head

We present a novel approach for synthesizing 3D talking heads with controllable emotion, enhancing lip synchronization and rendering quality. To address multi-view consistency and emotional expressiveness issues, we propose a ‘Speech-to-Geometry-to-Appearance’ mapping framework trained on the EmoTalk3D dataset, enabling controllable emotion, wide-range view rendering, and fine facial details.
ECCV 2024 Project Page

STAG4D: Spatial-Temporal Anchored Generative 4D Gaussians

We propose STAG4D, a novel framework for high-quality 4D generation, integrating pre-trained diffusion models with dynamic 3D Gaussian splatting. Our method outperforms prior 4D generation works in rendering quality, spatial-temporal consistency, and generation robustness, setting a new state-of-the-art for 4D generation from diverse inputs, including text, image, and video.
ECCV 2024 Project Page Code

Champ: Controllable and Consistent Human Image Animation with 3D Parametric Guidance

We introduce a method for animating human images, using the SMPL 3D human parametric model within a latent diffusion framework to improve shape alignment and motion guidance. By incorporating various maps and skeleton-based guidance, we enrich the model with detailed 3D shape and pose attributes, fusing them via a multi-layer motion fusion module with self-attention mechanisms.
ECCV 2024 Project Page Code

Relightable 3D Gaussian: Real-time Point Cloud Relighting with BRDF Decomposition and Ray Tracing

We present a novel differentiable point-based rendering framework for material and lighting decomposition from multi-view images, enabling editing, ray-tracing, and real-time relighting of the 3D point cloud. Our framework showcases the potential to revolutionize the mesh-based graphics pipeline with a relightable, traceable, and editable rendering pipeline solely based on point cloud.
ECCV 2024 Project Page Code

Hao Zhu