Xray Xplorer · CNN, XGBoost & Grad-CAM · COVID-19 Detection in Chest X-Ray Images Using Explainable Boosting Algorithms
Welcome to the GitHub repository for "Xray Xplorer", a powerful diagnostic tool birthed from a dissertation project titled "COVID-19 Detection in Chest X-Ray Images using Explainable Boosting Algorithms". The dissertation addresses the urgent demand for transparency in AI-powered diagnostic models. Uniting the robustness of Convolutional Neural Networks (CNNs), the potency of eXtreme Gradient Boosting (XGBoost), and the transparency of Gradient-weighted Class Activation Mapping (Grad-CAM), Xray Xplorer provides precise and interpretable COVID-19 predictions using chest X-ray images. Proven to perform admirably with an accuracy of 94.05% and an F1 score of 94.08%, this tool is ready to contribute to the fight against the pandemic.
Xray Xplorer presents a robust interface for healthcare professionals to upload chest X-ray images, returning a diagnostic prediction of whether the image indicates a normal condition, pneumonia (viral or bacterial), or COVID-19. The web application offers a Grad-CAM visualization alongside the prediction, fostering explainability and interpretability - key requirements for trust in AI-driven diagnostic tools.
Download Repository · GitHub
This repository contains the source code for the Flask-based web application, designed to facilitate the deployment and use of the diagnostic AI model trained in the project's Jupyter notebook on Google Colab.
View Jupyter Notebook · Google Colab
The complete dissertation report, outlining the comprehensive research, methodology, and broader implications, is available for deeper insights into the project.
View Dissertation · Google Drive
All model files, including individual CNN models and the final hybrid CNN-XGBoost model, are available in the Google Drive linked below.
View Model Files · Google Drive
To get this project running in your local environment, you will need Python 3 installed.
Download Python 3 · Python.org
Then, clone or download the repository, open the project folder in your terminal, and follow these steps:
- Create a Python virtual environment, for isolation:
python3 -m venv venv
- Activate the virtual environment:
# Windows:
venv\Scripts\activate
# Linux and macOS:
source venv/bin/activate
- Install the necessary Python packages:
pip install -r requirements.txt
- Set the Flask app environment variable:
# Windows:
set FLASK_APP=app.py
# Linux and macOS:
export FLASK_APP=app.py
- Run the application:
flask run
The application should now be running at localhost:5000
Dive into the world of hybrid modeling that combines the image processing finesse of CNNs, the robust ensemble learning mechanism of XGBoost, and the interpretability aspect of Grad-CAM. This unique integration forms a comprehensive, efficient, and transparent tool for COVID-19 detection.
Convolutional Neural Networks (CNNs): CNNs excel in processing grid-like data, such as image data. A CNN uses filters and different layers to automatically extract and learn features from an image which is fed into the network in the raw form.
XGBoost: XGBoost is an optimized distributed gradient boosting library, designed to be highly efficient, flexible, and portable. Gradient boosting is a machine learning technique where the model learns from its mistakes in each iteration. XGBoost enhances the model's performance by merging the outputs of many weak learners.
Grad-CAM: Grad-CAM, short for Gradient-weighted Class Activation Mapping, provides visual explanations for model predictions. It generates a heatmap highlighting the important regions in the input image that led to the model's decision.
Embark on the journey through the methodical steps involved in creating the model:
- Data Preprocessing: Normalizes and standardizes input images to enhance model learning.
- Data Augmentation: Enlarges the training set through techniques like rotation, flipping, brightness variation, and zooming to boost model's generalization ability.
- Dataset Preparation: Uses ImageDataGenerator instances to divide and stream data during model training, validation, and testing.
- CNN Model Implementation: Adapts pre-trained CNN models (VGG16, ResNet50, InceptionV3) for feature extraction and serves these features as input to the XGBoost model for classification.
- XGBoost Hyperparameter Tuning and Feature Selection: Applies Bayesian optimization with cross-validation for optimal hyperparameter selection, concurrently performing feature selection to hone in on the most predictive features.
- CNN-XGBoost Hybrid Model Evaluation and Testing: Evaluates the performance of the hybrid model using metrics like accuracy, precision, recall, F1 score, specificity, and AUC-ROC.
- Grad-CAM Implementation: Offers insights into the decision-making process of the CNN model through heatmap visualizations.
- Chest X-Ray Image Verification: Uses a specially-trained VGG16 model to ensure that the uploaded image is a chest X-ray before proceeding with the COVID-19 detection pipeline, thereby mitigating errors from incorrect image uploads.
Together, these methods construct a model that is not only efficient and accurate but also interpretable, lending insight into its diagnostic process.
This project utilizes the "COVID-QU-Ex Dataset" (Coronavirus Disease Qatar University Extended Dataset) from Kaggle, encompassing 33,900 chest X-ray images, categorized as COVID-19 positive cases, normal cases, and non-COVID infection cases.
The distribution of the images in the dataset is as follows:
- 11,950 images are classified as COVID-19 positive cases,
- 10,695 images are classified as normal cases,
- 11,255 images are classified as positive cases for non-COVID infections.
The dataset is divided into training, validation, and testing sets containing 21,705 images, 5,410 images, and 6,785 images respectively.
View Dataset · Kaggle
- Python 3.10.11
- TensorFlow 2.12.0
- Keras 2.12.0
- XGBoost 1.7.6
- OpenCV 4.7.0
- Flask 2.3.2
- jQuery 3.7.0
- Visual Studio Code 1.80
- Google Colab
- Abhijeet Pitumbur