Authors: Khanh Vu, Thimo Blom, Hannah Stone.
Institute: Vrije Universiteit Amsterdam.
- Capture raw images from the webcam.
- Convert RGB to HSV color space.
- Apply median blur in order to cancel noises.
- Apply image thresholding & background subtraction to capture existing coins.
- Make bounding boxes and crop out coins.
From https://neurohive.io/en/popular-networks/vgg16/
VGG16 is a convolutional neural network model proposed by K. Simonyan and A. Zisserman from the University of Oxford in the paper “Very Deep Convolutional Networks for Large-Scale Image Recognition”. The model achieves 92.7% top-5 test accuracy in ImageNet, which is a dataset of over 14 million images belonging to 1000 classes. It was one of the famous model submitted to ILSVRC-2014. It makes the improvement over AlexNet by replacing large kernel-sized filters (11 and 5 in the first and second convolutional layer, respectively) with multiple 3×3 kernel-sized filters one after another.
We made few adjustions to the existing VGG16 model from Keras:
- We kept only the convolutional layers so we removed the last 4 layers (flatten + 2 x 4096 fully-connected + 1000 classes prediction layers).
- Added some fully-connected layers together with a batch_normalization layer and a dropout layer.
-
Dataset:
- Total images: 2923 (~500 images per class).
- 6 classes: '10cent', '1euro', '20cent', '2euro', '50cent', '5cent'.
- Train/Validation ratio: 7:3.
- Data augmentation: Rotation and Flip.
-
Model:
- Optimizer: Adam - Learning rate = 0.0001.
- Loss function: 'categorical_crossentropy'.
- Transfer learning: We froze all the layers except for the last 7 hidden layers to accelerate the training process.
- Metrics tracker: Tensorboard.
We downloaded the model params (model.h5) and ran it in our machine - source code.
See our project report for more details.