SpatialVID provides a massive-scale dataset of over 21,000 hours of in-the-wild videos with dense 3D annotations—including camera poses, depth maps, and motion instructions—to overcome the data scarcity and scalability limitations currently hindering spatial intelligence and 3D vision research.
CVPR 2026    Project Page    Code