Mask-YOLO: A Multi-task Learning Architecture for Object Detection and Instance Segmentation

1. Architecture and Results

This work combines the one-stage detection pipeline, YOLOv2 with the idea of two-branch architecture from Mask R-CNN. Due to the hardware limitation, I only implemented it on a small CNN backbone ( MobileNet) with depthwise separable blocks, though it has the potential to be implemented with deeper network, e.g. ResNet-50 or ResNet-101 with FPN (Feature Pyramid Networks).
The overall architecture can be visualized like this:

myolo - the main implementation of Mask-YOLO. model.py is the model instantiation.

example - including three training examples with inference: Shapes dataset is randomly generated by dataset_shapes.py. Rice and Food are small datasets I hand-annotated by VGG Image Annotator (VIA), and can be downloaded from https://drive.google.com/file/d/1druK4Kgx5AhfchClU2aq5kf7UVoDtkvu/view.

Name		Name	Last commit message	Last commit date
Latest commit History 46 Commits
asset		asset
datasets		datasets
deprecated		deprecated
example		example
img_results		img_results
myolo		myolo
.gitignore		.gitignore
README.md		README.md
mask_yolo.png		mask_yolo.png