Human Protein Classification with PyTorch

An approach to classify mixed patterns of proteins in microscope images using transfer learning.

Data Source: Kaggle
Link: Dataset

About Dataset

The dataset comprises 10 different cell types(labels), of different morphology. Each image can have more than one label associated.
Labels: {0: 'Mitochondria', 1: 'Nuclear bodies', 2: 'Nucleoli', 3: 'Golgi apparatus', 4: 'Nucleoplasm', 5: 'Nucleoli fibrillar center', 6: 'Cytosol', 7: 'Plasma membrane', 8: 'Centrosome', 9: 'Nuclear speckles'}

Python Libraries/Frameworks Used

Pytorch
Pandas
Numpy
Matplotlib, seaborn (for ploting images and visualisation)

Notes

The Dataset is found to have class imbalances through visualisation. To handle this, F1 score is used as an evaluation metric, instead of normal accuracy.
Two different approaches to split into train and vaidation sets were made, one using masking and other using conventional train_test_split, and scores were recorded. It was observed that f1 score for masking was better due to slightly more amount of data to train.

Features

Channel-wise Data Augmentation : Each of the channels is normalised by subtracting from mean and dividing by standard deviation across each channel. This is helpful to prevent any one channel from disproportionately affecting gradients.
For this purpose the values of mean and standard deviation are taken from ImageNet Dataset, which is suggested for pre-trained models on ImageNet.
Reference: Pre-Trained models_Pytorch

2.Different transformations (augmentations) applied to the image:

Random Cropping
Random Resized Cropping
Normalize
ToTensor(mandatory to convert into pyTorch Tensor)
RandomHorizontalFlip
RandomRotation

Model:

Pre-Trained ResNet18 model, trained on ImageNet is used for the purpose. Initially the ResNet layers are freezed and only final layers are trained. Then unfreezed to train some more.

Comparisons:

The above score v/s epochs plot for two different cases:

Using standard train_test split(1st hump)
Using masking(2nd hump)

Training

To increase the speed of training, available GPU used.
Learning Rate Scheduling - To change learning rates after each batch of training.
Weight Decay - Regularization technique, prevent weights from becoming too large.
Gradient Clipping - To clip gradient to smaller values while training.

Loss:

Binary-Cross Entropy Loss which are suitable and align for F1 metric(when binary loss decrease, f1 metric do not change much)

Optimizer

Adam - Helps to converge faster

Accuracy_score

Using masking: Around 60% of validation score is achieved
Using train_test_split: Around 50% validation score is obtained.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Human Protein Classification with PyTorch

About Dataset

Python Libraries/Frameworks Used

Notes

Features

Model:

Comparisons:

Training

Loss:

Optimizer

Accuracy_score

Files

README.md

Latest commit

History

README.md

File metadata and controls

Human Protein Classification with PyTorch

About Dataset

Python Libraries/Frameworks Used

Notes

Features

Model:

Comparisons:

Training

Loss:

Optimizer

Accuracy_score