Yuedong Chen
·
Haofei Xu
·
Chuanxia Zheng
·
Bohan Zhuang
Marc Pollefeys
·
Andreas Geiger
·
Tat-Jen Cham
·
Jianfei Cai
- 08/11/24 Update: Explore our MVSplat360 [NeurIPS '24], an upgraded MVSplat that combines video diffusion to achieve 360° NVS for large-scale scenes from just 5 input views!
- 21/10/24 Update: Check out Haofei's DepthSplat if you are interested in feed-forward 3DGS on more complex scenes (DL3DV-10K) and more input views (up to 12 views)!
mvsplat_teaser.mp4
To get started, clone this project, create a conda virtual environment using Python 3.10+, and install the requirements:
git clone https://github.com/donydchen/mvsplat.git
cd mvsplat
conda create -n mvsplat python=3.10
conda activate mvsplat
pip install torch==2.1.2 torchvision==0.16.2 torchaudio==2.1.2 --index-url https://download.pytorch.org/whl/cu118
pip install -r requirements.txt
Our MVSplat uses the same training datasets as pixelSplat. Below we quote pixelSplat's detailed instructions on getting datasets.
pixelSplat was trained using versions of the RealEstate10k and ACID datasets that were split into ~100 MB chunks for use on server cluster file systems. Small subsets of the Real Estate 10k and ACID datasets in this format can be found here. To use them, simply unzip them into a newly created
datasets
folder in the project root directory.
If you would like to convert downloaded versions of the Real Estate 10k and ACID datasets to our format, you can use the scripts here. Reach out to us (pixelSplat) if you want the full versions of our processed datasets, which are about 500 GB and 160 GB for Real Estate 10k and ACID respectively.
- Download the preprocessed DTU data dtu_training.rar.
- Convert DTU to chunks by running
python src/scripts/convert_dtu.py --input_dir PATH_TO_DTU --output_dir datasets/dtu
- [Optional] Generate the evaluation index by running
python src/scripts/generate_dtu_evaluation_index.py --n_contexts=N
, where N is the number of context views. (For N=2 and N=3, we have already provided our tested version under/assets
.)
To render novel views and compute evaluation metrics from a pretrained model,
-
get the pretrained models, and save them to
/checkpoints
-
run the following:
# re10k
python -m src.main +experiment=re10k \
checkpointing.load=checkpoints/re10k.ckpt \
mode=test \
dataset/view_sampler=evaluation \
test.compute_scores=true
# acid
python -m src.main +experiment=acid \
checkpointing.load=checkpoints/acid.ckpt \
mode=test \
dataset/view_sampler=evaluation \
dataset.view_sampler.index_path=assets/evaluation_index_acid.json \
test.compute_scores=true
- the rendered novel views will be stored under
outputs/test
To render videos from a pretrained model, run the following
# re10k
python -m src.main +experiment=re10k \
checkpointing.load=checkpoints/re10k.ckpt \
mode=test \
dataset/view_sampler=evaluation \
dataset.view_sampler.index_path=assets/evaluation_index_re10k_video.json \
test.save_video=true \
test.save_image=false \
test.compute_scores=false
Run the following:
# download the backbone pretrained weight from unimatch and save to 'checkpoints/'
wget 'https://s3.eu-central-1.amazonaws.com/avg-projects/unimatch/pretrained/gmdepth-scale1-resumeflowthings-scannet-5d9d7964.pth' -P checkpoints
# train mvsplat
python -m src.main +experiment=re10k data_loader.train.batch_size=14
Our models are trained with a single A100 (80GB) GPU. They can also be trained on multiple GPUs with smaller RAM by setting a smaller data_loader.train.batch_size
per GPU.
Training on multiple nodes (#32)
Since this project is built on top of pytorch_lightning, it can be trained on multiple nodes hosted on the SLURM cluster. For example, to train on 2 nodes (with 2 GPUs on each node), add the following lines to the SLURM job script#SBATCH --nodes=2 # should match with trainer.num_nodes
#SBATCH --gres=gpu:2 # gpu per node
#SBATCH --ntasks-per-node=2
# optional, for debugging
export NCCL_DEBUG=INFO
export HYDRA_FULL_ERROR=1
# optional, set network interface, obtained from ifconfig
export NCCL_SOCKET_IFNAME=[YOUR NETWORK INTERFACE]
# optional, set IB GID index
export NCCL_IB_GID_INDEX=3
# run the command with 'srun'
srun python -m src.main +experiment=re10k \
data_loader.train.batch_size=4 \
trainer.num_nodes=2
References:
Fine-tune from the released weights (#45)
To fine-tune from the released weights without loading the optimizer states, run the following:python -m src.main +experiment=re10k data_loader.train.batch_size=14 \
checkpointing.load=checkpoints/re10k.ckpt \
checkpointing.resume=false
We also provide a collection of our ablation models (under folder 'ablations'). To evaluate them, e.g., the 'base' model, run the following command
# Table 3: base
python -m src.main +experiment=re10k \
checkpointing.load=checkpoints/ablations/re10k_worefine.ckpt \
mode=test \
dataset/view_sampler=evaluation \
test.compute_scores=true \
wandb.name=abl/re10k_base \
model.encoder.wo_depth_refine=true
We use the default model trained on RealEstate10K to conduct cross-dataset evaluations. To evaluate them, e.g., on DTU, run the following command
# Table 2: RealEstate10K -> DTU
python -m src.main +experiment=dtu \
checkpointing.load=checkpoints/re10k.ckpt \
mode=test \
dataset/view_sampler=evaluation \
dataset.view_sampler.index_path=assets/evaluation_index_dtu_nctx2.json \
test.compute_scores=true
More running commands can be found at more_commands.sh.
@article{chen2024mvsplat,
title = {MVSplat: Efficient 3D Gaussian Splatting from Sparse Multi-View Images},
author = {Chen, Yuedong and Xu, Haofei and Zheng, Chuanxia and Zhuang, Bohan and Pollefeys, Marc and Geiger, Andreas and Cham, Tat-Jen and Cai, Jianfei},
journal = {arXiv preprint arXiv:2403.14627},
year = {2024},
}
The project is largely based on pixelSplat and has incorporated numerous code snippets from UniMatch. Many thanks to these two projects for their excellent contributions!