Statistical quantification of confounding bias in predictive modelling

.

Git repositoty of the manuscript entitled

Statistical quantification of confounding bias in predictive modelling

The manuscript describes and validates the package mlconfound.

Read the docs.

Abstract

The lack of non-parametric statistical tests for confounding bias significantly hampers the development of robust, valid and generalizable predictive models in many fields of research. Here I propose the partial and full confounder tests, which, for a given confounder variable, probe the null hypotheses of unconfounded and fully confounded models, respectively.

The tests provide a strict control for Type I errors and high statistical power, even for non-normally and non-linearly dependent predictions, often seen in machine learning. Applying the proposed tests on models trained on functional brain connectivity data from the Human Connectome Project and the Autism Brain Imaging Data Exchange dataset reveals confounders that were previously unreported or found to be hard to correct for with state-of-the-art confound mitigation approaches.

The tests (implemented in the package mlconfound can aid the assessment and improvement of the generalizability and neurobiological validity of predictive models and, thereby, foster the development of clinically useful machine learning biomarkers.

This repository contains:

All source code required to reproduce the results in the manuscript. See the directories: simulated and empirical.
All results. See the directories simulated/results and the analysis notebooks.
All figures. See the directory fig.

To reproduce the whole analysis:

./reproduce.sh

Citation

T. Spisak, Statistical quantification of confounding bias in predictive modelling, preprint on arXiv:2111.00814, 2021.

Acknowledgements

The manuscript builds on an aesthetic and simple LaTeX style suitable for "preprint" publications such as arXiv and bio-arXiv, etc. It is based on the nips_2018.sty style.

Name		Name	Last commit message	Last commit date
Latest commit History 199 Commits
empirical		empirical
fig		fig
simulated		simulated
LICENSE		LICENSE
README.md		README.md
reproduce.sh		reproduce.sh
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Git repositoty of the manuscript entitled

Statistical quantification of confounding bias in predictive modelling

Abstract

This repository contains:

To reproduce the whole analysis:

Citation

Acknowledgements

About

Releases 4

Languages

License

pni-lab/mlconfound-manuscript

Folders and files

Latest commit

History

Repository files navigation

Git repositoty of the manuscript entitled

Statistical quantification of confounding bias in predictive modelling

Abstract

This repository contains:

To reproduce the whole analysis:

Citation

Acknowledgements

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 4

Languages