unidepth v2 vitl14

lpiccinelli/unidepth-v2-vitl14

published Jun 2024 · updated Feb 2025

A popular open specialist model model, with 6.3M downloads a month. gigarouter benchmarks and hosts it as an OpenAI-compatible API.

status

coming soon

API providers

downloads / mo

6.3M

about this model

UniDepthV2 is a universal monocular metric depth estimation model that predicts metric depth from a single RGB image, operating without camera intrinsics and generalizing across diverse indoor and outdoor scenes.

Capabilities

The model directly outputs depth in metric units (e.g., meters) from any single image, eliminating the need for per-scene calibration or transformation. It uses a Vision Transformer (ViT-L/14) backbone and is designed for zero-shot cross-dataset performance.

Benchmark Results

At the time of submission, UniDepthV2 achieved first place on the KITTI depth prediction benchmark. It also holds state-of-the-art results on the NYU Depth V2 and KITTI Eigen test sets, as recorded on Papers With Code.

Visual Examples

References

UniDepthV2 paper: arXiv 2502.20110
UniDepthV1 paper (CVPR 2024): arXiv 2403.18913
Project page: lpiccinelli-eth.github.io/pub/unidepth/

not yet live

We're benchmarking and onboarding unidepth v2 vitl14 as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related specialist model models

compare all →

electra-base-discriminator

wespeaker-voxceleb-resnet34-LM

6.8M dl/mo

stable-diffusion-v1-5-archive

5.8M dl/mo