Medical Imaging is a critical component of modern healthcare that can aid medical professionals to make more informed diagnostic decisions. Chest X-rays are the most commonly used medical imaging modality and their interpretation can be a time-consuming, challenging, and error-prone process, even for expert radiologists to accurately diagnose diseases. The advancement of AI has triggered a significant transformation in the field of Healthcare, thus, our project aims to utilize state-of-the-art AI models, to assist in the diagnosis of respiratory and cardiovascular conditions in patients. We seek to train advanced Convolutional Neural Networks (CNNs) (AlexNet, ResNet18, and Inceptionv3), with Chest X-Ray images, to compare the performance metrics of these AI models to identify the underlying trends in the data and to diagnose abnormalities accurately.
The goal is to evaluate the models based on their metrics and draw insights into their performance on a particular dataset. Since the ResNet18 and Inceptionv3 have deep architectures we expect them to learn more features and perform better than the AlexNet. This can be verified by comparing the Accuracy, Precision, Recall, and F-Score of the models. However, the shallow nature of AlexNet and the lesser number of layers in ResNet18 help them train faster (training time per epoch) than Inceptionv3 and will thus be computationally less expensive.
To perform disease classification from Chest X-rays, we chose the datasets publicly available from Kaggle namely, the Chest Xray Images, Chest X-Ray Dataset for Respiratory Disease Classification, NIH Chest Xray-14.
This dataset consists of 5,856 chest X-ray images of patients with and without pneumonia. The data is present in a .jpeg format. It consists of 2 classes Normal (1,342) and Pneumonia (4,514), of size 1024*1024. Source: Chest xray images to diagnose pneumonia — kaggle.com. https://www.kaggle.com/datasets/paulti/chest-xray-images
This dataset contains a total of 32,687 images with 5 classes of respiratory diseases, including COVID-19 (4,189), Lung-Opacity (6,012), Normal (10,192), Viral Pneumonia (7,397), and Tuberculosis (4,897). The data is in the form of a .npz file, which is a dictionary containing the image (224*224), image name, and image label. We have sampled 12,000 images from this dataset.
Source: Arkaprabha Basu, Sourav Das, Susmita Ghosh, Sankha Mullick, Avisek Gupta, and Swagatam Das. Chest X-Ray Dataset for Respiratory Disease Classification, 2021 - https://dataverse.harvard.edu/file.xhtml?fileId=5194823&version=5.1
The NIH Chest X-rays dataset is a collection of 112,120 frontal-view chest X-ray images of 30,805 unique patients, of which we sample 20,000 images. The metadata is in a .csv format and the dataset represents 15 different abnormalities diagnosed including Atelectasis, Consolidation, Infiltration, Pneumothorax, Edema,, Emphysema, Fibrosis, Effusion, Pneumonia, No Finding, Pleural thickening, Cardiomegaly, Nodule Mass, and Hernia.
Source: NATIONAL INSTITUTES OF HEALTH CHEST XRAY DATASET. Nih chest x-rays — kaggle.com. https://www.kaggle.com/datasets/nihchest-xrays/data
To run Python with PyTorch and CUDA, you'll need to have the following requirements installed on your system:
You'll need to have Python 3.6 or later installed on your system. You can download the latest version of Python from the official website: https://www.python.org/downloads/
You can install PyTorch using pip. Run the following command in your terminal to install PyTorch (refer - https://pytorch.org/get-started/locally/):
pip install torch torchvision
To use PyTorch with CUDA, you'll need to have a compatible NVIDIA GPU and the CUDA toolkit installed on your system. You can download the CUDA toolkit from the official NVIDIA website: https://developer.nvidia.com/cuda-downloads
Make sure to select the correct version of the CUDA toolkit that's compatible with your GPU and operating system.
Once you have the CUDA toolkit installed, you'll need to set the following environment variables:
export PATH=/usr/local/cuda/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH
You can add these environment variables to your .bashrc
file to make them persistent.
You'll also need to install the cuDNN library, which is a CUDA-accelerated library for deep neural networks. You can download the cuDNN library from the NVIDIA website: https://developer.nvidia.com/cudnn
Make sure to select the version of the cuDNN library that's compatible with your CUDA toolkit and operating system.
Once you have all of these requirements installed, you should be able to run Python with PyTorch and CUDA on your system.
To use Python with PyTorch and CUDA on Google Colab, you can follow these steps:
- Open a new notebook in Google Colab.
- Go to "Runtime" and select "Change runtime type".
- Under "Hardware accelerator", select "GPU" and click "SAVE".
- Install PyTorch using pip. Run the following command in a code cell:
!pip install torch torchvision
- Verify that PyTorch is installed by importing it and printing the version number:
python import torch print(torch.__version__)
This should output the version number of PyTorch.
- To use CUDA, you don't need to install anything additional on Google Colab since it already has the CUDA toolkit and drivers installed on its virtual machines.
- You can now start using PyTorch with CUDA in your Google Colab notebook.
To train and validate our PyTorch models in Google Colab, you can follow these steps:
- Load the dataset into a format that PyTorch can use, such as a
torch.utils.data.Dataset
. - Split the dataset into training and validation sets, for example using
torch.utils.data.random_split()
. - Define a loss function, for example using
torch.nn.CrossEntropyLoss()
. - Define an optimizer, for example using
torch.optim.Adam()
. - Train the model for a specified number of epochs or until convergence.
- Validate the model using validation split.
- Save the trained model to disk using
torch.save()
.
COMP6721_Winter2023_GroupN/Simple Chest XRay/
COMP6721_Winter2023_GroupN/Harvard Chest XRay/
COMP6721_Winter2023_GroupN/Hyperparameter Tuning/
COMP6721_Winter2023_GroupN/Transfer Learning/
COMP6721_Winter2023_GroupN/NIH Chest XRay/
all the utility functions used can be found here : COMP6721_Winter2023_GroupN/utils.ipynb
The links for original full datasets can be found in Introduction section above.
Here is a link for a sample dataset of 100 images - https://drive.google.com/drive/folders/10e-Yf_PxUTCDvh97mU1C5dp5tz7flEKk?usp=sharing
To run a pre-trained PyTorch model on a provided sample test dataset, you can follow these steps:
- Obtain the pre-trained model file as a
.pth
file, from the folders mentioned below - Download or obtain the sample test dataset from here - https://drive.google.com/drive/folders/10e-Yf_PxUTCDvh97mU1C5dp5tz7flEKk?usp=sharing.
- Create a PyTorch model using the same architecture and input format as the pre-trained model, and load the pre-trained parameters using
torch.load()
. - Apply the model to the sample test dataset using a data loader, for example using
torch.utils.data.DataLoader
. - Compute the model's output on each sample in the test dataset, for example using the
model()
function.
python
import torch
from torch.utils.data import DataLoader, Dataset
model = torch.hub.load('pytorch/vision:v0.9.0', 'resnet18', pretrained=True)
model.load_state_dict(torch.load('your_model_weights.pth'))
ResNet18 - Dataset 1 : https://drive.google.com/file/d/1-AJMU30_2ppw-yORpCqfM-zvI_JgaarB/view?usp=sharing.
AlexNet - Dataset 2 : https://drive.google.com/file/d/1FEsD9htWVHe7BxBNtFT2aWeT1GhzAYyv/view?usp=share_link.
Inceptionv3 - Dataset 2 : https://drive.google.com/file/d/1-9kRtWVIPHfzm5V3vXFN4H-qaBO9tpul/view?usp=sharing.
ResNet18 - Dataset 3 : https://drive.google.com/file/d/176zEWSmk3QoX_bdImW8MeeHhYFn-O_Xm/view?usp=sharing.
COMP6721_Winter2023_GroupN/Simple Chest XRay/
COMP6721_Winter2023_GroupN/Harvard Chest XRay/
COMP6721_Winter2023_GroupN/Hyperparameter Tuning/
COMP6721_Winter2023_GroupN/Transfer Learning/
COMP6721_Winter2023_GroupN/NIH Chest XRay/
Follow the drive link - https://drive.google.com/drive/folders/10e-Yf_PxUTCDvh97mU1C5dp5tz7flEKk?usp=sharing download the whole folder and place it in the same directory where the we run the code to train and test
hyper parameter tuning on dataset 2
optimizer | Adam | SGD | ||||||
---|---|---|---|---|---|---|---|---|
learning rate | 0.005 | 0.0001 | 0.0005 | 0.00001 | 0.05 | 0.01 | 0.005 | 0.001 |
Accuracy | 0.3164 | 0.8539 | 0.8076 | 0.7112 | 0.1335 | 0.8034 | 0.7640 | 0.6477 |
Precision | 0.0633 | 0.8600 | 0.7991 | 0.6976 | 0.0267 | 0.8052 | 0.7983 | 0.6610 |
Recall | 0.2000 | 0.8359 | 0.7972 | 0.6884 | 0.2000 | 0.7843 | 0.7945 | 0.6011 |
F-Score | 0.0961 | 0.8462 | 0.7960 | 0.6915 | 0.0471 | 0.7925 | 0.7937 | 0.6095 |