Code for the papers:
G. Wang, H. Wang, Y. Liu, and W. Chen, Unsupervised Learning of Monocular Depth and Ego-Motion Using Multiple Masks, in International Conference on Robotics and Automation, pp. 4724-4730, 2019. [Poster]
G. Wang, C. Zhang, H. Wang, J. Wang, Y. Wang, and X. Wang, Unsupervised Learning of Depth, Optical Flow and Pose with Occlusion from 3D Geometry, in IEEE Transactions on Intelligent Transportation Systems, doi: 10.1109/TITS.2020.3010418.
Python3 and pytorch are required. Besides, other libraries need to be installed by runing:
pip3 install -r requirements.txt
For KITTI, first download the dataset using this script provided on the official website, and then run the following command.
python3 data/prepare_train_data.py /path/to/raw/kitti/dataset/ --dataset-format 'kitti' --dump-root /path/to/resulting/formatted/data/ --width 832 --height 256 --num-threads 1 --static-frames data/static_frames.txt --with-gt
For testing optical flow ground truths on KITTI, download KITTI2015 dataset. You need to download 1) stereo 2015/flow 2015/scene flow 2015
data set (2 GB), 2) multi-view extension
(14 GB), and 3) calibration files
(1 MB) . You should have the following directory structure:
kitti2015
| data_scene_flow
| data_scene_flow_calib
| data_scene_flow_multiview
For Cityscapes, download the following packages: 1) leftImg8bit_sequence_trainvaltest.zip
, 2) camera_trainvaltest.zip
. You will probably need to contact the administrators to be able to get it.
python3 data/prepare_train_data.py /path/to/cityscapes/dataset/ --dataset-format 'cityscapes' --dump-root /path/to/resulting/formatted/data/ --width 832 --height 342 --num-threads 1
Notice that for Cityscapes the img_height
is set to 342 because we crop out the bottom part of the image that contains the car logo, and the resulting image will have height 256.
python3 train.py /path/to/prepared/data \
--dispnet DispResNetS6 --posenet PoseNetB6 --flownet Back2Future \
-b 4 -pc 1.0 -pf 0.0 -m 0.0 -c 0.0 -s 0.2 \
--epoch-size 100 --log-output -f 30 --nlevels 6 --lr 1e-4 -wssim 0.85 --epochs 4000 \
--smoothness-type edgeaware --fix-masknet --fix-flownet --with-depth-gt --log-terminal \
--spatial-normalize-max --workers 8 --kitti-dir /data/to/kitti --add-less-than-mean-mask \
--add-maskp01 --using-none-mask --name demo \
--pretrained-disp /path/to/disp/model \
--pretrained-pose /path/to/pose/model
Tensorboard can be open with the command:
tensorboard --logdir=./
and visualize the training progress by opening https://localhost:6006 on your browser.
python3 test_disp.py --dispnet DispResNetS6 --pretrained-dispnet /path/to/dispnet --pretrained-posent /path/to/posenet --dataset-dir /path/to/KITTI_raw --dataset-list /path/to/test_files_list
python test_pose.py pretrained/pose_model_best.pth.tar --img-width 832 --img-height 256 --dataset-dir /path/to/kitti/odometry/ --sequences 09 --posenet PoseNetB6
python test_flow.py --pretrained-disp /path/to/dispnet --pretrained-pose /path/to/posenet --pretrained-mask /path/to/masknet --pretrained-flow /path/to/flownet --kitti-dir /path/to/kitti2015/dataset
- DispNet, PoseNet, and FlowNet in joint unsupervised learning of depth, pose and optical flow.
We are grateful to Anurag Ranjan for his github repository. Our code is based on theirs.
G. Wang, H. Wang, Y. Liu, and W. Chen, Unsupervised Learning of Monocular Depth and Ego-Motion Using Multiple Masks, in International Conference on Robotics and Automation, pp. 4724-4730, 2019.
G. Wang, C. Zhang, H. Wang, J. Wang, Y. Wang, and X. Wang, Unsupervised Learning of Depth, Optical Flow and Pose with Occlusion from 3D Geometry, in IEEE Transactions on Intelligent Transportation Systems, doi: 10.1109/TITS.2020.3010418.