This project was developed as part of the Vision and Cognitive Systems course as an exam in the Master's Degree in Computer Engineering in Unimore.
Name | |
---|---|
Luca Denti | 211805@studenti.unimore.it |
Cristian Mercadante | 213808@studenti.unimore.it |
Alberto Vitto | albertovitto@outlook.com |
Given a dataset of videos taken in "Gallerie Estensi" in Modena together with pictures of its paintings, it was required to implement a software in Python capable of detecting paintings in videos and retrieve the original image from the dataset.
In particular:
- Painting detection
- Given an input video, the code should output a list of bounding boxes
(x, y, w, h)
, being(x, y)
the upper left corner, each containing one painting. - Create an interface to visualize given an image the ROI of a painting.
- Select painting and discard other artifacts.
- Optional: segment precisely paintings with frames and also statues.
- Given an input video, the code should output a list of bounding boxes
- Painting rectification
- Given an input video and detections (from the previous point), the code should output a new image for each painting, containing the rectified version of the painting.
- Pay attention to not-squared paintings.
- Painting retrieval
- Given one rectified painting (from the previous point), the code should return a ranked list of all the images in the painting DB, sorted by descending similarity with the detected painting. Ideally, the first retrieved item should be the picture of the detected painting.
- People detection
- Given an input video, the code should output a list of bounding boxes
(x, y, w, h)
, being(x, y)
the upper left corner, each containing one person.
- Given an input video, the code should output a list of bounding boxes
- People localization
- Given an input video and people bounding boxes (from the previous point), the code should assign each person to one of the rooms of the Gallery. To do that, exploit the painting retrieval procedure (third point), and the mapping between paintings and rooms (in
data.csv
). Also, a map of the Gallery is available (map.png
) for a better visualization.
- Given an input video and people bounding boxes (from the previous point), the code should assign each person to one of the rooms of the Gallery. To do that, exploit the painting retrieval procedure (third point), and the mapping between paintings and rooms (in
Optional tasks:
- Given an input video, people and paintings' detections, determine whether each person is facing a painting or not.
- Given a view taken from the 3D model, detect each painting and replace it with its corresponding picture in the paintings DB, appropriately deformed to match the 3D view.
- Determine the distance of a person to the closest door: find the door, find the walls and the floor, try to compensate and predict distance.
.
├── dataset
│ ├── data.csv
│ ├── features_db.npy
│ ├── ground_truth
│ │ ├── 000_0.json
│ │ ├── ...
│ │ └── 014_16.json
│ ├── img_features_db.npy
│ ├── map.png
│ ├── paintings_db
│ │ ├── 000.png
│ │ ├── ...
│ │ └── 094.png
│ ├── test_set
│ │ ├── 000_0.png
│ │ ├── ...
│ │ └── 014_16.png
│ └── videos
│ ├── 000
│ │ ├── VIRB0391.MP4
│ │ └── ...
│ ├── ...
│ │ └── ...
│ └── 014
│ ├── VID_20180529_112517.mp4
│ └── ...
├── venv
├── estensi
│ ├── painting_detection
│ │ ├── constants.py
│ │ ├── detection.py
│ │ ├── evaluation.py
│ │ └── utils.py
│ ├── painting_rectification
│ │ ├── rectification.py
│ │ └── utils.py
│ ├── painting_retrieval
│ │ ├── evaluation.py
│ │ ├── retrieval.py
│ │ └── utils.py
│ ├── people_detection
│ │ ├── cfg
│ │ │ └── yolov3.cfg
│ │ ├── darknet.py
│ │ ├── data
│ │ │ └── coco.names
│ │ ├── detection.py
│ │ ├── preprocess.py
│ │ ├── utils.py
│ │ └── yolov3.weights
│ ├── people_localization
│ │ ├── localization.py
│ │ └── utils.py
│ └── utils.py
├── estensi.py
├── painting_detection_evaluation.py
├── painting_retrieval_evaluation.py
├── README.md
├── requirements.txt
├── torch_cpu_requirements.txt
└── torch_requirements.txt
- Make sure to have installed all requirements (see
requirements.txt
). - Make sure to have installed also the PyTorch requirements, depending from the system (see
torch_requirements.txt
for CUDA version ortorch_cpu_requirements.txt
for CPU version). - Place the
dataset
folder at the same level asestensi.py
and theestensi
package (make sure to havepaintings_db
,videos
,data.csv
, andmap.png
inside as shown in the project structure). - Download
yolov3.weights
and place it intoestensi/people_detection
.
estensi.py --video <path/to/video> --folder <path/to/folder/> --skip_frames <int_number> [--include_steps]
where:
--video
targets the video to analyze.--folder
targets the folder containing different videos to analyze.--skip_frames
number of frames to skip during analysis, default is 1 (don't skip any frame).--include_steps
tells the script to show useful debug information.
- Press
R
to start the painting retrieval, rectification and localization tasks. You will see the outputs in new windows and more details in the command line. Press any key to resume. - Press
P
to pause the video. Press any key to resume. - Press
Q
to quit the video. If--folder
is specified, goes to the next video.
Following videos were used for the evaluation phase:
Folder | Video |
---|---|
000 | VIRB0393.MP4 |
001 | GOPR5825.MP4 |
002 | 20180206_114720.mp4 |
003 | GOPR1929.MP4 |
004 | IMG_3803.MOV |
005 | GOPR2051.MP4 |
006 | IMG_9629.MOV |
007 | IMG_7852.MOV |
008 | VIRB0420.MP4 |
009 | IMG_2659.MOV |
010 | VID_20180529_112706.mp4 |
012 | IMG_4087.MOV |
013 | 20180529_112417_ok.mp4 |
014 | VID_20180529_113001.mp4 |
To get the same results as in the report, download this test set.
- The script will create a
test_set
folder under thedataset
folder, containing the frames captured from the videos listed above. - Place the
ground_truth
folder under thedataset
folder. - If no argument is passed to the script, the test set will be evaluated with the system hyperparameters configuration. Otherwise, it will be evaluated with the passed configuration.
- Run:
where:
painting_detection_evaluation.py [--param <param_grid_file_path>]
-
--param
is the path of a JSON file containing the parameters grid for grid search evaluation.Example of JSON file:
{ "MIN_ROTATED_BOX_AREA_PERCENT": [0.5, 0.8, 0.9], "MIN_ROTATED_ELLIPSE_AREA_PERCENT": [0.4, 0.6], "MAX_GRAY_80_PERCENTILE": [170, 200], "MIN_VARIANCE": [11, 18], "MIN_HULL_AREA_PERCENT_OF_MAX_HULL": [0.08, 0.15], "THRESHOLD_BLOCK_SIZE_FACTOR": [50, 80] }
They key values are taken from
estensi/painting_detection/constants.py
-
- The script will create a
test_set
folder under thedataset
folder, containing the frames captured from the videos listed above. - Place the
ground_truth
folder under thedataset
folder. - Run:
where:
painting_retrieval_evaluation.py --mode <mode_str> [--rank_scope <scope_int>]
--mode
is the mode (eitherclassification
orretrieval
) in which the evaluation is done,--rank_scope
is the scope of the ranking list where a relevant item can be found. Default value is 5. It will be ignored in classification mode.
This work has only been tested with PyCharm 2020.1.2 (Professional Edition) as IDE and Windows 10 as OS.