Ashkan Ganj1 · Yiqin Zhao1 . Hang Su2 · Tian Guo1
1Worcester Polytechnic Institute 2Nvidia Research
For any questions or concerns, please feel free to reach out to Ashkan Ganj
Welcome to the official repository for our HotMobile24 paper. This work presents the challenges and opportunities of achieving accurate metric depth estimation in mobile AR. We tested four different state-of-the-art monocular depth estimation models performance in AR scenarios and identified three types of challenges: hardware, data, and model related challenges. Furthermore, our research provides promising future directions to explore and solve those challenges.
This repository is structured into two main directories, Analysis
and Models
, each serving a distinct purpose in the context of our research.
├── Analysis/
│ ├── notebooks/
│ │ ├── ARKitScenes/
│ │ ├── vidar/
│ │ └── ZoeDepth/
│ └── results/
│ ├── ARkitScenes/
│ ├── DistDepth/
│ ├── ZeroDepth(Vidar)/
│ └── ZoeDepth/
├── models/
│ ├── DistDepth/
│ │ ├── ...
│ ├── vidar/
│ │ ├── ...
│ └── ZoeDepth/
│ ├── ...
├── LICENSE
└── README.md
The Analysis
directory is the central hub for all codes, results (including CSV files, figures, and images), and analytical notebooks associated with our paper. By running the notebooks located within this directory for each model, users can replicate our analytical process and generate all the outputs—ranging from CSV data files to the exact figures published in our paper. This directory aims to provide a transparent and replicable pathway for understanding our research findings and methodology.
In the Models
directory, you'll find modified versions of various depth estimation models, specifically adapted to work with the ARKitScenes dataset. Each model within this directory comes with its own README.md
, containing detailed instructions on setup, usage, and evaluation. This ensures that users can easily navigate and utilize the modified models for their depth estimation projects.
For all evaluations, we utilized the ARKitScenes dataset.
To download and preprocess the ARKitScenes dataset, please use the following notebook: ARKitScenes Notebook. This notebook provides step-by-step instructions for obtaining and preparing the dataset for use with the models.
Alongside the preprocessing notebook, we have developed several other notebooks aimed at analyzing the dataset in depth. These notebooks explore various aspects of the data, helping to better understand its characteristics and how it impacts the performance of our depth estimation models.
The additional notebooks, found within the same directory, cover topics such as:
analysis_per_obj.ipynb
: Examines the performance of models across different object types, identifying any meaningful differences.confidence_eval.ipynb
: Analyzes missing points in the depth map based on confidence levels, including a visualization of the distribution and frame-by-frame analysis.main.ipynb
: Contains comprehensive analysis for the number of missing points in both ARKit and ground truth (GT) depths, offering code for analysis based on different thresholds and visualizing missing points in the depth map.
To access these analysis notebooks, navigate to the following directory in our repository: /Analysis/notebooks/ARKitScenes
.
For comprehensive details about ZoeDepth, including its methodologies and achievements, we encourage you to visit the ZoeDepth GitHub repository and consult the ZoeDepth paper.
It's crucial to note that ZoeDepth's implementation might encounter compatibility issues with newer versions of PyTorch. To ensure optimal functionality, we strongly recommend using the exact version of PyTorch specified in the ZoeDepth repository instructions (any version <= 2.1.0 is acceptable).
To begin working with ZoeDepth, please adhere to the instructions provided in the ZoeDepth repository. These steps will guide you through cloning the repository, setting up the required environment, and downloading the necessary pre-trained weights.
To evaluate ZoeDepth on the ARKitScenes dataset:
- Open the
Analysis/notebooks/ZoeDepth/evaluateOnARkitScenes.ipynb
notebook found within our repository. - Ensure the dataset path is correctly set to where your ARKitScenes dataset is stored.
- Adjust the notebook to point to the pre-trained weights, located in the
models/ZoeDepth/
directory. For ARKitScenes, use the specific pre-trained weights provided (midas_train, midas_freeze). - Execute the notebook to start the evaluation process.
- After the evaluation, results will be summarized and saved in a
.csv
file, ready for any further analysis or visualization.
To assess the impact of cropping on model performance using the NYUv2 dataset:
- Go to the
Analysis/notebooks/ZoeDepth/cropped_effect.ipynb
notebook within our repository. - Update the dataset path variable to direct to your NYUv2 dataset location.
- Run the notebook to perform the evaluation.
- The notebook will present a series of average depth errors associated with different cropping percentages, offering insights into the cropping effect on depth estimation accuracy.
models/ZoeDepth/zoedepth/utils/config.py
: This file contains general configuration for dataset. Please update the file according to your requirements before initiating the training process(dataset path, splits path).models/ZoeDepth/train_mono.py
: This script is the entry point for initiating the training process. It reads fromconfig.py
and applies the specified configurations during training. Depending on your requirements, you might want to modify this script to alter the training workflow, add custom logging, or implement additional features.
Once you have made the necessary adjustments to the configuration files, you can start the training process by running the following command from your terminal:
python train_mono.py
For more details, please refer to the DistDepth's Github repository and paper.
To utilize DistDepth, follow the steps outlined below. Detailed instructions and additional information are available in the DistDepth Readme.md.
-
Download Pre-trained Weights: Access the pre-trained weights via the link provided in the DistDepth Readme.md. These weights are essential for evaluating the model's performance.
-
Prepare Your Environment: Ensure your setup meets the prerequisites listed in the
README.md
, including necessary libraries and dependencies. -
Running Evaluation:
-
Place the pre-trained weights in the same directory as the
eval.sh
script. -
Update the
eval.sh
script with the path to the ARKitScenes dataset. -
Execute the following command to evaluate the DistDepth model on the ARKitScenes dataset:
sh eval.sh
-
The script outputs the evaluation results and save them in .csv file which can be later be used for visualization.
-
For a comprehensive understanding of the underlying methodology and insights into the model's development, we direct readers to the official resources:
- For detailed information and the latest updates on ViDepth, visit Vidar's GitHub repository.
- To dive deeper into the research and technical details, the ViDepth paper provides a thorough explanation of the technology and its applications.
To begin working with ZeroDepth, you should first set up your environment and acquire the necessary pre-trained weights by following the instructions in the ZeroDepth Readme.md.
- Navigate to the
Analysis/notebooks/vidar/vidar-inference.ipynb
notebook within our repository. - Update the dataset path variables:
- ARKitScenes Dataset Path: Ensure the path points to where you've stored the ARKitScenes dataset.
- NyuV2 Dataset Path: Similarly, update this to the location of your NyuV2 dataset.
- Execute the notebook, following the provided instructions to initiate the evaluation of the ZeroDepth model on the ARKitScenes dataset.
- Upon completion, the notebook will present the evaluation results and automatically save them to a
.csv
file. This file can be utilized for further analysis or visualization purposes.
These steps are designed to facilitate a smooth experience in assessing the performance of ZeroDepth with the ARKitScenes dataset, enabling users to effectively leverage this model in their depth estimation projects.
If our work assists you in your research, please cite it as follows:
@inproceedings{10.1145/3638550.3641122,
author = {Ganj, Ashkan and Zhao, Yiqin and Su, Hang and Guo, Tian},
title = {Mobile AR Depth Estimation: Challenges \& Prospects},
year = {2024},
isbn = {9798400704970},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3638550.3641122},
doi = {10.1145/3638550.3641122},
abstract = {Accurate metric depth can help achieve more realistic user interactions such as object placement and occlusion detection in mobile augmented reality (AR). However, it can be challenging to obtain metricly accurate depth estimation in practice. We tested four different state-of-the-art (SOTA) monocular depth estimation models on a newly introduced dataset (ARKitScenes) and observed obvious performance gaps on this real-world mobile dataset. We categorize the challenges to hardware, data, and model-related challenges and propose promising future directions, including (i) using more hardware-related information from the mobile device's camera and other available sensors, (ii) capturing high-quality data to reflect real-world AR scenarios, and (iii) designing a model architecture to utilize the new information.},
booktitle = {Proceedings of the 25th International Workshop on Mobile Computing Systems and Applications},
pages = {21–26},
numpages = {6},
location = {<conf-loc>, <city>San Diego</city>, <state>CA</state>, <country>USA</country>, </conf-loc>},
series = {HOTMOBILE '24}
}
This work was supported in part by NSF Grants #2105564 and #2236987, a VMware grant, the Worcester Polytechnic Institute’s Computer Science Department. Most results presented in this work were obtained using CloudBank, which is supported by the National Science Foundation under award #1925001.