Uncertainty-based Active Learning for OpenAI's CLIP Classifier

This internship project aims to classify images using the CLIP model developed by OpenAI, while implementing an Active Learning strategy based on different uncertainty measures to iteratively select new samples to annotate and add to the training set.

Structure

The code in this repository is structured as follows:

main.py: This file contains the main logic of the program which includes functions for calculating uncertainty, annotating images, training the model, making predictions, updating dataframes, and writing results. This is also where the active learning strategy is implemented and the training-validation loop resides.
main_noAL.py: Contains the traditional machine learning pipeline, without Active Learning.
CLIPImageClassifier.py: This file contains the CLIPImageClassifier class which is used for training and saving the CLIP model, as well as making predictions on unseen images.
CLIPImageCLassifierAPI.py: This file contains the API connector for Label Studio's Active Learning loop. This class is designed to interact with Label Studio's machine learning backend and integrates the CLIPImageClassifier class for image classification tasks.

Requirements

The required package versions are listed in requirements.txt.

Python 3.x
PyTorch
torchvision
tqdm
pandas
sklearn
OpenAI's CLIP

How to Run

Clone this repository
Create a virtual environment:
```
python -m venv venv
```
Activate the virtual environment:
```
.\venv\Scripts\activate
```
Install requirements using pip:
```
pip install -r requirements.txt
```
Prepare your image dataset using createDataset.py. The dataset should be in CSV format with columns for image file paths and labels.
Adjust the hyperparameters in main.py.
Run the main.py script:
```
python main.py
```
The results will be saved to a CSV file which includes performance metrics such as accuracy for each iteration of the Active Learning strategy.

Hyperparameters

You can tune the following hyperparameters in the main.py file:

UNCERTAINTY_MEASURE: The uncertainty measure used for selecting samples to annotate. Options are 'margin', 'entropy', 'least', and 'random'.
N_PER_CLASS: Number of samples per class for the first iteration.
N_SCORE_PER_ITERATION: Number of samples to score per iteration.
N_ANNOTATE_PER_ITERATION: Number of samples to annotate and rank per iteration.
N_VAL: Number of validation samples.
BATCH_SIZE: Number of samples that going to be propagated through the network.
NUM_EPOCHS: Number of times the entire training set is shown to the network during training.
LEARNING_RATE: The size of the steps the optimizer takes to reach the minimum of the loss function.
RESULTS_FILE: The path where the results will be saved.
MODEL_PATH: The path where the trained model will be saved.

The training-validation loop is executed for a number of runs and iterations, which can be adjusted in the RUNS and ITERATIONS variables, respectively.

Label Studio Machine Learning

Integrate the Active Learning pipeline with your data labeling workflow by adding a machine learning backend SDK to Label Studio. You can use CLIPImageCLassifierAPI.py as the custom backend server. Follow this guide to set up this server connection.

With this backend server you can perform 2 tasks:

Dynamically pre-annotate data based on model inference results
Retrain or fine-tune a model based on recently annotated data

You can use commands.txt to start your custom backend.

Dataset

The fashion challenge dataset can be found on Sharepoint.

License

This project is licensed under the terms of the MIT license.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
Plots		Plots
Results		Results
CLIPDataset.py		CLIPDataset.py
CLIPImageClassifier.py		CLIPImageClassifier.py
CLIPImageClassifierAPI.py		CLIPImageClassifierAPI.py
Internship_report.pdf		Internship_report.pdf
README.md		README.md
commands.txt		commands.txt
createDataset.py		createDataset.py
main.py		main.py
main_noAL.py		main_noAL.py
mlc_logo.png		mlc_logo.png
plot_results.ipynb		plot_results.ipynb
radboud_logo.png		radboud_logo.png
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Uncertainty-based Active Learning for OpenAI's CLIP Classifier

Structure

Requirements

How to Run

Hyperparameters

Label Studio Machine Learning

Dataset

License

Author

About

Releases

Packages

Languages

rroell/active-learning

Folders and files

Latest commit

History

Repository files navigation

Uncertainty-based Active Learning for OpenAI's CLIP Classifier

Structure

Requirements

How to Run

Hyperparameters

Label Studio Machine Learning

Dataset

License

Author

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages