SpatialVID: A Large-Scale Video Dataset with Spatial Annotations

SpatialVID provides a massive-scale dataset of over 21,000 hours of in-the-wild videos with dense 3D annotations—including camera poses, depth maps, and motion instructions—to overcome the data scarcity and scalability limitations currently hindering spatial intelligence and 3D vision research.
CVPR 2026 Project Page Code

Hao Zhu

NJU-3DV Lab, Nanjing University
E-mail: zh@nju.edu.cn

Assistant Professor, PhD Advisor

Nanjing, China