An image classification model from data collection, cleaning, model training, deployment and API integration.
Table of Contents
The model can classify 28 different types of charts and diagrams in raster image formats (png, jpg, gif, etc.).
The types are following:
- arc diagram
- area chart
- bar chart
- block diagram
- boxplot
- bubble chart
- cartogram
- control chart
- dendrogram
- flowchart
- funnel chart
- gantt chart
- heatmap
- histogram
- line graph
- matrix diagram
- mind map
- network graph
- neural network diagram
- organogram
- phase diagram
- pie chart
- radar chart
- scatter plot
- snakey chart
- surface plot
- timeline chart
- venn diagram
Data Collection: The image dataset was downloaded from DuckDuckGo search engine API using keywords (28 class names).
DataLoaders: fastai DataBlock API was used to set up the DataLoaders.
Data Augmentation: fastai provides default data augmentation which operates in GPU.
Details can be found in
notebooks/data_collection_and_augmentation.ipynb
of the GitHub repo.
Example images from the training dataset
Example images from the validation dataset
Training: Training was done on a pre-trained model (resnet34
) and it was fine-tuned for 6 epochs with accuracy upto ~85% .
Data Cleaning: Since the data was collected from DuckDuckGo search engine API, there were many noises and inconsistencies within the dataset. Hence, the data was cleaned and updated using the fastai ImageClassifierCleaner. The data was cleaned each time after training or fine-tuning until the final iteration of the model.
Details can be found in
notebooks/model_training_and_cleaning.ipynb
of the GitHub repo.
The model was exported as a .pkl
file and was used for inference.
Details can be found in
notebooks/model_inference.ipynb
of the GitHub repo.
The model was deployed to Hugging Face Spaces as a gradio app. The implementation can be found here.
Model deployed in Hugging Face Spaces
The deployed model API is integrated here in GitHub Pages Website. Implementation and other details can be found in docs
folder.