This repository contains Pytorch code for the ICPR2020 paper "Future Urban Scene Generation Through Vehicle Synthesis" [arXiv]
Our framework is composed by two stages:
- Interpretable information extraction: high level interpretable information is gathered from raw RGB frames (bounding boxes, trajectories, keypoints).
- Novel view completion: condition a reprojected 3D model with the original 2D appearance.
In this work we propose a deep learning pipeline to predict the visual future appearance of an urban scene. Despite recent advances, generating the entire scene in an end-to-end fashion is still far from being achieved. Instead, here we follow a two stage approach, where interpretable information are included in the loop and each actor is modelled independently. We leverage a per-object novel view synthesis paradigm; i.e. generating a synthetic representation of an object undergoing a geometrical roto-translation in the 3D space. Our model can be easily conditioned with constraints (e.g. input trajectories) provided by state-of-the-art tracking methods or by the user.
Code was tested with an Anaconda environment (Python version 3.6) on both Linux and Windows based systems.
Run the following commands to install all requirements in a new virtual environment:
conda create -n <env_name> python=3.6
conda activate <env_name>
pip install -r requirements.txt
Install PyTorch package (version 1.3 or above).
To run the demo of our project, please firstly download all
the required data at this link
and save them in a <data_dir>
of your choice. We tested
our pipeline on the Cityflow dataset that already have
annotated bounding boxes and trajectories of vehicles.
The test script is run_test.py
that expects some
arguments as mandatory: video, 3D keypoints and checkpoints
directories.
python run_test.py <data_dir>/<video_dir> <data_dir>/pascal_cads <data_dir>/checkpoints --det_mode ssd512|yolo3|mask_rcnn --track_mode tc|deepsort|moana --bbox_scale 1.15 --device cpu|cuda
Add the parameter --inpaint
to use the inpainting on the
vehicle instead of the static background.
If everything went well, you should see the main GUI in which you can choose whichever vehicle you want that was detected in the video frame or change the video frame.
The commands working on this window are:
RIGHT ARROW
= go to next frameLEFT ARROW
= go to previous frameSINGLE MOUSE LEFT BUTTON CLICK
= visualize car trajectoryBACKSPACE
= delete the drawn trajectoriesDOUBLE MOUSE LEFT BUTTON CLICK
= select one of the vehicles bounding boxes
Once you selected some vehicles of your chioce by
double-clicking in their bounding boxes, you can push the
RUN
button to start the inference. The resulting frames
will be saved in ./results
directory.
If you find this repository useful for your research, please cite the following paper:
@inproceedings{simoni2021future,
title={Future urban scenes generation through vehicles synthesis},
author={Simoni, Alessandro and Bergamini, Luca and Palazzi, Andrea and Calderara, Simone and Cucchiara, Rita},
booktitle={2020 25th International Conference on Pattern Recognition (ICPR)},
pages={4552--4559},
year={2021},
organization={IEEE}
}