Learn how to start an object detection deep learning project using PyTorch and the Faster-RCNN architecture in this beginner-friendly tutorial. Based on the blog series Train your own object detector with Faster-RCNN & PyTorch by Johannes Schmidt.
You can train the model using the training script.
In addition, I provide jupyter-notebooks for various tasks such as creating & exploring datasets, running inference and visualizing anchor boxes:
After cloning the repository, follow these steps to install the dependencies in a new environment and start a jupyter server:
-
Set up & activate a new environment with an environment manager (recommended):
-
Install the libraries with pip or poetry:
-
Start a jupyter server:
jupyter-notebook
(not jupyter-lab, because of a dependency issue with the neptune-client<1.0.0)
Note: This will install the CPU-version of torch.
If you want to use a GPU or TPU, please refer to the instructions
on the PyTorch website.
To check whether pytorch uses the nvidia gpu, check
if torch.cuda.is_available()
returns True
in a Python shell.
Windows user: If you can not start jupyter-lab or jupyter-notebook on Windows because of
ImportError: DLL load failed while importing win32api
, try to run conda install pywin32
with the conda package
manager.
These are the libraries that are used in this project:
- High-level deep learning library for PyTorch: PyTorch Lightning
- Visualization software: Custom code with the image-viewer Napari
- [OPTIONAL] Experiment tracking software/logging module: Neptune
If you want to use Neptune for your own experiments, add the API-Key to the NEPTUNE
variable in
the .env file.
Please make sure that you meet these requirements:
The dataset consists of 20 selfie-images randomly selected from the internet.
Most of the model's code is based on PyTorch's Faster-RCNN implementation. Metrics can be computed based on the PASCAL VOC (Visual Object Classes) evaluator in the metrics section.
Anchor sizes/aspect ratios are really important for training a Faster-RCNN model (but also similar models like SSD, YOLO). These "default" boxes are compared to those outputted by the network, therefore choosing adequate sizes/ratios can be critical for the success of a project. The PyTorch implementation of the AnchorGenerator (and also the helper classes here) generally expect the following format:
- anchor_size:
Tuple[Tuple[int, ...], ...]
- aspect_ratios:
Tuple[Tuple[float, ...]]
The ResNet backbone without the FPN always returns a single feature map that is used to create anchor boxes. Because of
that we must create a Tuple
that contains a single Tuple
: e.g. ((32, 64, 128, 256, 512),)
or (((32, 64),)
With FPN we can use 4 feature maps (output from a ResNet + FPN) and map our anchor sizes with the feature maps. Because
of that we must create a Tuple
that contains exactly 4 Tuples
: e.g. ((32,), (64,), (128,), (256,))
or ((8, 16, 32), (32, 64), (32, 64, 128, 256, 512), (200, 300))
Examples on how to create a Faster-RCNN model with pretrained ResNet backbone (ImageNet) are provided in
the tests section. Pay special attention to
the test function test_get_faster_rcnn_resnet
in test_faster_RCNN.py.
Recommendation: Run the test in debugger mode.
- Sliders in the inference script do not work right now due to dependency updates.
- Please note that the library "neptune-client" is deprecated but the migration to "neptune" has not finished yet. Therefore, the library "neptune-client" is still used in this project.