Credit Card Customer Attrition Project

Machine Learning

This project was a collaboration between Hays Kronke, Emily Neaville, Bennett Northcutt, and Stephen Mims

Dataset: https://zenodo.org/record/4322342#.Y8OsBdJBwUE

Research question:

Can we predict customer churn for the bank's credit card customers in order to reduce the rate of attrition?

Summary:

Our group will use the dataset to build machine learning models that can accurately predict bank customers who are at risk of attrition. This dataset contains customer information ranging from demographic (age, gender, education) to financial data (income bracket, card history, credit limit). Using these features, our aim is to build a model that will be able to be used by bankers, sales managers, branch managers, or other decision makers to help the bank reduce customer churn.

Processes included in Jupyter Notebook:

Data Loading and EDA
Data preprocessing
Fitting models and making predictions
KNN, Logistic Regression, and Random Forest results
Adjusted weights and oversampled Logistic Regression models
Feature selection
Optimized models using feature importance

Libraries used: pandas, sqlite3, seaborn, matplotlib, numpy, scipy, scikit-learn

Data loading and EDA

After reading in and cleaning the data, we were able to conduct some exploratory analysis. The main takeaway from this in regard to building our machine learning models was taking note of the imbalanced classes. There were many more instances of existing customers than there were of attrited customers, as shown in this visualization.

Data preprocessing

Utilized scipy to remove outliers using z-scores
Encoded both the target variables (Attrition_Flag) and categorial features
Split testing and training data
Scaled the data

Model results

We built a KNN model, a random forest model, and a logisitic regression model. The confusion matrices for all three models can be seen below.

Feature Importance

Creating the random forest model allows us to identify the most important features of the model After visualizing the important features, we create a new dataframe dropping some features and retrained our models for performance improved.

Final model

After retraining theKNearestNeighbors, the logistic regression, and the random forest models with the selected features, the random forest model still maintained the best performance. Using feature selection, we were able to slightly increase the accuracy by 1% and the recall of the model, which went from 84% to 87%. There was a slight decrease in the precision of the model, but in the end that was a hit we were willing to take for the improved performance. The confusion matrix of the optimized model of choice can be seen below.

References

Feature Selection Techniques in Machine Learning with Python
Baseline Models: Your Guide For Model Building
scipy zscore docs
Improve Model Performance using Feature Importance
Random Oversampling and Undersampling for Imbalanced Classification
How to improve logistic regression in imbalanced data with class weights

Name		Name	Last commit message	Last commit date
Latest commit History 57 Commits
Figures		Figures
Resources		Resources
.gitignore		.gitignore
README.md		README.md
analysis_and_models.ipynb		analysis_and_models.ipynb
making_sqlitedb.txt		making_sqlitedb.txt
presentation.pdf		presentation.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Credit Card Customer Attrition Project

Machine Learning

This project was a collaboration between Hays Kronke, Emily Neaville, Bennett Northcutt, and Stephen Mims

Dataset: https://zenodo.org/record/4322342#.Y8OsBdJBwUE

Research question:

Can we predict customer churn for the bank's credit card customers in order to reduce the rate of attrition?

Summary:

Processes included in Jupyter Notebook:

Data loading and EDA

Data preprocessing

Model results

Feature Importance

Final model

References

About

Releases

Packages

Contributors 4

Languages

hdkronke/cc-customer-attrition-project

Folders and files

Latest commit

History

Repository files navigation

Credit Card Customer Attrition Project

Machine Learning

This project was a collaboration between Hays Kronke, Emily Neaville, Bennett Northcutt, and Stephen Mims

Dataset: https://zenodo.org/record/4322342#.Y8OsBdJBwUE

Research question:

Can we predict customer churn for the bank's credit card customers in order to reduce the rate of attrition?

Summary:

Processes included in Jupyter Notebook:

Data loading and EDA

Data preprocessing

Model results

Feature Importance

Final model

References

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages