Zero-shot Depth Completion via Test-time Alignment with Affine-invariant Depth Prior

Abstract

Depth completion, predicting dense depth maps from sparse depth measurements, is an ill-posed problem requiring prior knowledge. Recent methods adopt learning-based approaches to implicitly capture priors, but the priors primarily fit in- domain data and do not generalize well to out-of-domain scenarios. To address this, we propose a zero-shot depth completion method composed of an affine-invariant depth diffusion model and test-time alignment. We use pre-trained depth diffusion models as depth prior knowledge, which implicitly understand how to fill in depth for scenes. Our approach aligns the affine-invariant depth prior with metric-scale sparse measurements, enforcing them as hard constraints via an optimization loop at test-time. Our zero-shot depth completion method demonstrates generalization across various domain datasets, achieving up to a 21% average performance improvement over the previous state-of-the-art methods while enhancing spatial understanding by sharpening scene details. We demonstrate that aligning a monocular affine-invariant depth prior with sparse metric measurements is a sufficient strategy to achieve domain-generalizable depth completion without relying on extensive training datasets.

Video

Key Approach

We propose a zero-shot depth completion method that just aligns the affine-invariant depth prior with metric-scale sparse depth measurements. How can we leverage depth prior for alignment?

Concept of our prior-based alignemnt approach

We formulate depth completion as an "inverse problem": estimating unknown dense depth from observed sparse measurements. In this formulation, we can leverage the pre-trained monocular depth diffusion model as a depth prior, regularizing the solution to be dense and well-structured depth map. We propose test-time alignment with a correction step enforcing measurements as hard constraints, ensuring desirable solutions. This alignment process consists of two steps including optimization loop and resampling. Our prior-based approach enables our method to generalize well across various domain datasets.

Overall test-time alignment method

Domain Generalization Results

Below table summarizes the domain generalization performance of our method and previous test-time adaptation methods on indoor (NYU, SceneNet) and outdoor (Waymo, nuScenes) datasets. Across various domain datasets, our prior-based approach consistently achieves the best or second-best performance.

Quantitative Reults

Qualitative Reults

BibTeX

@inproceedings{hyoseok2024zeroshot,
  author    = {Lee Hyoseok and Kyeong Seon Kim and Kwon Byung-Ki and Tae-Hyun Oh},
  title     = {Zero-shot Depth Completion via Test-time Alignment with Affine-invariant Depth Prior},
  journal   = {The 39th Annual AAAI Conference on Artificial Intelligence},
  year      = {2025},
}

Zero-shot Depth Completion via Test-time Alignment with Affine-invariant Depth Priors

TL;DR, Aligning affine-invariant depth prior with sparse metric measurement is a proven strategy for zero-shot generalization.