The proposed challenge is a street view house numbers detection, which contains two parts:
- Do bounding box regression to find top, left, width and height of bounding boxes which contain digits in a given image
- classify the digits of bounding boxes into 10 classes (0-9)
The giving SVHN dataset contains 33402 images for training and 13068 images for testing. This project uses the YOLOv5 pre-trained model to fix this challenge.
- Intel(R) Core(TM) i5-9600K CPU @ 3.70GHz
- NVIDIA GeForce RTX 2080 Ti
- Microsoft win10
- Python 3.7.3
- Pytorch 1.7.0
- CUDA 10.2
To reproduct my submission without retrainig, do the following steps:
- Installation
- Data Preparation
- Set Configuration
- Download Pretrained Model
- Training
- Testing
- Reference
- install pytorch from https://pytorch.org/get-started/locally/
- install openCV
sudo apt-get install python-opencv
- install dependencies
pip install -r requirements.txt
Download the given dataset from Google Drive or SVHN Dataset.
data / svhn
+- train
| +- xxx.jpg
| +- digitStruct.mat
+- test
| +- yyy.jpg
+- mat_to_yolo.py
+- shvn.yaml
And run command python mat_to_yolo.py
to create labels for yolo and reorganize the train data structure as below:
- train/
├── 1.png
├── 1.txt
├── 2.png
├── 2.txt
│ .
│ .
│ .
├── 33402.png
└── 33402.txt
- create
svhn.yaml
in./data
# train and val data as 1) directory: path/images/, 2) file: path/images.txt, or 3) list: [path1/images/, path2/images/]
train: data/svhn/train # 33402 images
val: data/svhn/valid # 3000 images
# number of classes
nc: 10
# class names
names: ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']
- train model with pretrained model
python train.py --img 320 --batch 16 --epochs 50 --data svhn.yaml --weights yolov5m.pt
- Using the following script to get more information
python train.py --help
- detect test data
python detect.py --source data/svhn/test/ --weights runs/train/exp18/weights/best.pt --conf 0.25 --save-txt --save-conf
- Using the following script to get more information
python detect.py --help
- Make Submission: output json format
python combine.py
[{
"bbox": [[top, left, buttom, right]],
"score": [confidence],
"label": [predict_label]
},
{
"bbox": [[7, 112, 28, 121], [9, 122, 27, 134],
"score": [0.674805, 0.713867],
"label": [1, 0]
}
]