Labeling images for object detection is commonly required task to get started with Computer Vision related project. Good news that you do not have to label all images (draw bounding boxes) from scratch --- the goal of this project is to add (semi)automation to the process. Please refer to this blog post that describes Active Learning and semi-automated flow: Active Learning for Object Detection in Partnership with Conservation Metrics We will use Transfer Learning and Active Learning as core Machine Learning components of the pipeline. -- Transfer Learning: use powerful pre-trained on big dataset (COCO) model as a startining point for fine-tuning foe needed classes. -- Active Learning: human annotator labels small set of images (set1), trains Object Detection Model (model1) on this set1 and then uses model1 to predict bounding boxes on images (thus pre-labeling those). Human annotator reviews mode1's predictions where the model was less confident -- and thus comes up with new set of images -- set2. Next phase will be to train more powerful model2 on bigger train set that includes set1 and set2 and use model2 prediction results as draft of labeled set3… The plan is to have 2 versions of pipeline set-up.
This one (ideally) includes minimum setup. The core components here are:
- Azure Blob Storage with images to be labeled. It will also be used to save "progress" logs of labeling activities
- "Tagger" machine(s)
This is computer(s) that human annotator(s) is using as environment for labeling portion of images -- for example VOTT.
Here example of labeling flow in VOTT: I've labled wood "knots" (round shapes) and "defect" (pretty much non-round shaped type of defect):
- Model re-training machine (or service) This is environment were Object Detection model is retrained with growing train set as well as does predictions of bounding boxes on unlabeled images. There is config.ini that needs to be updated with details like blob storage connection and model retraining configuration.
More details TBD.
Basically the idea is to kick off Active Learning cycle with model retaining as soon as human annotator revises new set of images.
- The steps below refer to updating config.ini. You can find detailed description of config here
- Got several thousands of images (or much more) and not sure if random sampling will be helpful to get rolling with labeling data? Take a look at Guide to "initialization" predictions.
The flow below assumes the following:
- We use Tensorflow Object Detection API (Faster RCNN with Resnet 50 as default option) to fine tune object detection.
- Tensorflow Object Detection API is setup on Linux box (Azure DSVM is an option) that you can ssh to. See docs for Tensorflow Object Detection API regarding its general config.
- Data(images) is in Azure blob storage
- Human annotators use VOTT to label\revise images. To support another tagging tool it's output (boudin boxes) need to be converted to csv form -- pull requests are welcomed!
Here is general flow has 2 steps:
- Environments setup
- Active Learnining cycle: labeling data and running scipts to update model and feed back results for human annotator to review.
The whole flow is currenly automated with 4 scrips user needs to run.
- Provision Azure Blob storage. Create 2 containers: "activelearningimages" and "activelearninglabels"
- Upload unzipped folder with images to "activelearningimages" container.
- Setup Tensorflow Object Detection API if you have not already.
This will include cloning of https://github.com/tensorflow/models. (On my machine I have it cloned to
/home/olgali/repos/models
). Runresearch/object_detection/object_detection_tutorial.ipynb
to make sure Tensorflow Object Detection API is functioning. - Clone this repo to the machine (for example:
/home/olgali/repos/models/research/active-learning-detect/
) - Update config.ini:
- set values for AZURE_STORAGE_ACCOUNT and AZURE_STORAGE_KEY
- set (update if needed) values for # Data Information section
- set values for # Training Machine and # Tensorflow sections of the config.ini
"python_file_directory" config value should point to the "train" scripts from this project. Example:
python_file_directory=/home/olgali/repos/models/research/active-learning-detect/train
- pip install azure-blob packages: azure.storage.blob
- Have Python 3.6 up and running.
- Pip install azure-blob packages: azure.storage.blob
- Clone this repo, copy updated config.ini from Model re-training box (as it has Azure Blob Storage and other generic info already).
- Update config.ini values for # Tagger Machine section:
tagging_location=D:\temp\NewTag
Overview: you will run *4 scripts in total:
- two scipts on the machine where model (re)training happens and
- two scripts where human annotators label images (or review images pre-labeled by the model).
Run bash script to Init pipeline
~/repos/models/research/active-learning-detect/train$ . ./active_learning_initialize.sh ../config.ini
This step will:
- Download all images to the box.
- Create totag_xyz.csv on the blob storage ( "activelearninglabels" container by default).
This is the snapshot of images file names that need tagging (labeling). As human annotators make progress on labeling data the list will get smaller and smaller.
- Make sure that the tagging_location is empty.
- Start each "phase" with downloading images to label (or to review pre-labeled images).
Sample cmd below requests 40 images for tagging:
D:\repo\active-learning-detect\tag>python download_vott_json.py 40 ..\config.ini
This step will create new version of totag_xyz.csv on blob storage that will have 40 images excluded from the list.
File tagging_abc.csv will hold list of 40 images being tagged. - Start VOTT , load the folder for labeling\review (in my case it will be
D:\temp\NewTag\images
) - Once done with labeling push results back to central storage:
D:\repo\active-learning-detect\tag>python upload_vott_json.py ..\config.ini
This step will push tagged_123.csv to blob storage: this file contains actual bounding boxes coordinates for every image.
Tagging_abc.csv will contain list of files that are "work in progress" -- the ones to be tagged soon.
Now model can be trained.
Before your first time running the model, and at any later time if you would like to repartition the test set, run:
~/repos/models/research/active-learning-detect/train$ . ./repartition_test_set_script.sh ../config.ini
This script will take all the tagged data and split some of it into a test set, which will not be trained/validated on and will then be use by evalution code to return mAP values.
Run bash script:
~/repos/models/research/active-learning-detect/train$ . ./active_learning_train.sh ../config.ini
This script will kick of training based on available labeled data.
Model will evaluated on test set and perf numbers will be saved in blob storage (performance.csv).
Latest totag.csv will have predictions for all available images made of the newly trained model -- bounding box locations that could be used by human annotator as a starter.
Human annotator(s) deletes any leftovers from previous predictions (csv files in active-learning-detect\tag, image dirs) and runs goes again sequence of:
- Downloading next batch of pre-labeled images for review (
active-learning-detect\tag\download_vott_json.py
) - Going through the pre-labeled images with VOTT and fixing bounding boxes when needed.
- Pushing back new set of labeled images to storage (
active-learning-detect\tag\upload_vott_json.py
)
Training cycle can now be repeated on bigger training set and dataset with higher quality of pre-labeled bounding boxes could be obtained.
The Custom Vision service can be used instead of Tensorflow in case you do not have access to an Azure Data Science VM or other GPU-enabled machine. The steps for Custom Vision are pretty similar to those for Tensorflow, although the training step is slightly different:
If you would like to repartition the test set, run:
~/repos/models/research/active-learning-detect/train$ . ./repartition_test_set_script.sh ../config.ini
This script will take all the tagged data and split some of it into a test set, which will not be trained/validated on and will then be use by evalution code to return mAP values.
To train the model:
python cv_train.py ../config.ini
This python script will train a custom vision model based on available labeled data.
Model will evaluated on test set and perf numbers will be saved in blob storage (performance.csv).
Latest totag.csv will have predictions for all available images made of the newly trained model -- bounding box locations that could be used by human annotator as a starter.
I'm using wood knots dataset mentioned in this blog Here is link to the dataset: zip file with 800+ board png images.
The current custom vision SDK is in preview mode, and one of the limitations is that an error while training does not return an error message, just a generic 'Bad Request' response. Common reasons for this error include:
- Having a tag with less than 15 images. Custom Vision requires a minimum of 15 images per tag and will throw an error if it finds any tag with less than that many.
- Having a tag out of bounds. If for some reason you attempt to add a tag through the API which is out of bounds, it will accept the request but will throw an error while training.
- No new images since last training session. If you try to train without adding additional images Custom Vision will return a bad request exception. The best way to debug these is to go into the Custom Vision website (customvision.ai) and click the train button, which should then tell you what the error was.