Skip to content

Latest commit

 

History

History
47 lines (28 loc) · 2.35 KB

README.md

File metadata and controls

47 lines (28 loc) · 2.35 KB

Mask-YOLO: A Multi-task Learning Architecture for Object Detection and Instance Segmentation

1. Architecture and Results

  • This work combines the one-stage detection pipeline, YOLOv2 with the idea of two-branch architecture from Mask R-CNN. Due to the hardware limitation, I only implemented it on a small CNN backbone ( MobileNet) with depthwise separable blocks, though it has the potential to be implemented with deeper network, e.g. ResNet-50 or ResNet-101 with FPN (Feature Pyramid Networks).
  • The overall architecture can be visualized like this:

  • Training results on Shapes dataset:
  • Training results on Rice and Generic Food:

2. How to use it

myolo - the main implementation of Mask-YOLO. model.py is the model instantiation.

example - including three training examples with inference: Shapes dataset is randomly generated by dataset_shapes.py. Rice and Food are small datasets I hand-annotated by VGG Image Annotator (VIA), and can be downloaded from https://drive.google.com/file/d/1druK4Kgx5AhfchClU2aq5kf7UVoDtkvu/view.

3. Reference