This project looked at two Madelon Datasets.
The first one contained 500 features and the second one contained over 6,000 features.
The purpose of this project was to look at trying to find the correlation between those features and the target. It also meant that I created a model in order to do so.
First step included creating benchmark models to see how the models performed.
I used these models for my benchmarking: logistic regression decision tree k nearest neighbors support vector classifier
The second step included taking the model that performed the best and used that to identify important features. I then tried to adjust the pipelines to improve the model even more.