Improving Covid-19 Prediction Through Variational Bayes For Learned Class Weighting

Current Covid-19 datasets are rife with class imbalance. This study identifies the consequences of such imbalance on Covid-19 diagnosis tasks with a CNN (resnet architecture), as well as presenting both existing and novel solutions to these problems. The project demonstrates the limitations and advantages of each approach, as well as suggesting further work (please read the paper for details).

Dataset Distribution

The CovidX dataset was used for training all networks. At the time of producing this project, the dataset was heavily imbalanced:

As a result, network performance on the minor class of Covid-19 samples was poor.

Existing Methods

The first strategies leveraged were data augmentation (adding perturbations to samples to increase variance within the dataset), upsampling (sampling the minority-class at a greater proportion to its size) and loss-function weighting (applying greater weights to the loss function for the minority class). By combining all three strategies, the performance of network on the minority class improved. However, the performance on the majority classes was worse (in other words, the network simply predicted Covid-19 more frequently, instead of actually learning the features of Covid-19).

Variational Bayes

Through using a variational autoencoder and Bayesian statistics, it is possible to estimate the difficulty of a classification for each sample 1. This project took this notion and presents a novel application of the methodolgy for dynamic, real-time and per-sample loss-function weighting and upsampling. This created a more robust network, that improved on minority-class samples without sacrificing majority-class performance.

MoCo and Pretraining

Finally, the efficacy of pretraining the CNN on large existing databases was explored (specifically, traditional pretraining on ImageNet and Momentum Contrast Learning against the Instagram 1bn dataset). This resulted in a signficantly stronger performance on minority-class samples, but lead to an overall reduction in performance. The intuition is that the generic feature extractor is not biased by the dataset imbalance, however the lack of training against chest X-ray images results in worse performance overall.

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
Code		Code
Results and Paper		Results and Paper
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Improving Covid-19 Prediction Through Variational Bayes For Learned Class Weighting

Dataset Distribution

Existing Methods

Variational Bayes

MoCo and Pretraining

About

Releases

Packages

Languages

jwf40/Improving-Covid-19-Prediction-Through-Variational-Bayes-For-Learned-Class-Weighting

Folders and files

Latest commit

History

Repository files navigation

Improving Covid-19 Prediction Through Variational Bayes For Learned Class Weighting

Dataset Distribution

Existing Methods

Variational Bayes

MoCo and Pretraining

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages