Abstract
Depth completion, predicting dense depth maps from sparse
depth measurements, is an ill-posed problem requiring prior
knowledge. Recent methods adopt learning-based approaches
to implicitly capture priors, but the priors primarily fit in-
domain data and do not generalize well to out-of-domain
scenarios. To address this, we propose a zero-shot depth completion method composed
of an affine-invariant depth diffusion model and test-time alignment. We use pre-trained depth
diffusion models as depth prior knowledge, which implicitly
understand how to fill in depth for scenes. Our approach aligns
the affine-invariant depth prior with metric-scale sparse measurements,
enforcing them as hard constraints via an optimization loop
at test-time. Our zero-shot depth completion method
demonstrates generalization across various domain datasets,
achieving up to a 21% average performance improvement over
the previous state-of-the-art methods while enhancing spatial
understanding by sharpening scene details. We demonstrate
that aligning a monocular affine-invariant depth prior with
sparse metric measurements is a sufficient strategy to achieve
domain-generalizable depth completion without relying on
extensive training datasets.