unidepth v2 vitl14
lpiccinelli/unidepth-v2-vitl14
published Jun 2024 · updated Feb 2025
A popular open specialist model model, with 6.3M downloads a month. gigarouter benchmarks and hosts it as an OpenAI-compatible API.
about this model
UniDepthV2 is a universal monocular metric depth estimation model that predicts metric depth from a single RGB image, operating without camera intrinsics and generalizing across diverse indoor and outdoor scenes.
Capabilities
The model directly outputs depth in metric units (e.g., meters) from any single image, eliminating the need for per-scene calibration or transformation. It uses a Vision Transformer (ViT-L/14) backbone and is designed for zero-shot cross-dataset performance.
Benchmark Results
At the time of submission, UniDepthV2 achieved first place on the KITTI depth prediction benchmark. It also holds state-of-the-art results on the NYU Depth V2 and KITTI Eigen test sets, as recorded on Papers With Code.
Visual Examples
References
- UniDepthV2 paper: arXiv 2502.20110
- UniDepthV1 paper (CVPR 2024): arXiv 2403.18913
- Project page: lpiccinelli-eth.github.io/pub/unidepth/
We're benchmarking and onboarding unidepth v2 vitl14 as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.