A Peer-to-peer lending company wants to use Machine Learning to predict credit risk, for quicker and more reliable loan experiences. This project will use Resampling, employing different techniques from the imbalanced-learn and scikit-learn libraries to build and evaluate learning models on:
- Balanced Accuracy: How often the classifier is correct
- Precision: How reliable a positive/negative classifier is.
- Recall/Sensitivity: The ability of a clssifier to find all the positive/negative samples
To determine the best suited model that accurately predicts and classifies risky credit applications.
- Juptyter Notebook
- Python
- imbalance-learn
- scikit learn
- NumPy
- Pathlib
Balanced Accuracy: 63%
Precision:
- Risky Loans = 1%; Model recorded a large number of FALSE positives
- Good Loans = 100%; Model recorded a large number of True negatives
Recall:
- Risky Loans = 64%; Model recorded a large number of True positives
- Good Loans = 63%; Model recorded a low number of False positive
Balanced Accuracy: 63%
Precision:
- Risky Loans = 1%; Model recorded a large number of FALSE positives
- Good Loans = 100%; Model recorded a large number of True negatives
Recall:
- Risky Loans = 60%; Model recorded a large number of True positives
- Good Loans = 67%; Model recorded a low number of False positive
Balanced Accuracy: 53%
Precision:
- Risky Loans = 1%; Model recorded a large number of FALSE positives
- Good Loans = 100%; Model recorded a large number of True negatives
Recall:
- Risky Loans = 66%; Model recorded a large number of True positives
- Good Loans = 40%; Model recorded a large number of False positives
Balanced Accuracy: 66%
Precision:
- Risky Loans = 1%; Model recorded a large number of FALSE positives
- Good Loans = 100%; Model recorded a large number of True negatives
Recall:
- Risky Loans = 75%; Model recorded a large number of True positives
- Good Loan = 58%; Model recorded a lower but still large number False positives
Balanced Accuracy: 77%
Precision:
- Risky Loans = 4%; Model recorded a large number of FALSE positives
- Good Loans = 100%; Model recorded a large number of True negatives
Recall:
- Risky Loans = 63%; Model recorded a large number of True positives
- Good Loans = 92%; Model recorded a large number of True negatives
Balanced Accuracy: 89%
Precision:
- Risky Loans = 7%; Model recorded a large number of FALSE positives
- Good Loans = 100%; Model recorded a large number of True negatives
Recall:
- Risky Loans= 84%; Model recorded a large number of True positives
- Good Loans = 95%; Model recorded a large number of True negatives
Credit risk is an inherently unbalanced classification problem, as good loans easily outnumber risky loans. Therefore, different techniques were employed to train and evaluate models with unbalanced classes. Considering the significance level of credit risk when applying for loans, there is a priority hierarchy of Recall/Sensitivity to Precision metrics when it comes to determining which model to deploy. The ClusterCentroids (53%), RandomSampler (63%), SMOTE (63%), and SMOTEEN (66%) classifyng models all had low performing Balanced Accuracy Scores, which denotes “How often the classifier is correct”. I would not recommend use of these models when determining credit risk. The BalancedRandomForestClassifier has a Balanced Accuracy Score of 77% but the Sensitivity Score for risky loans is discouraging. I would recommend adopting the EasyEnsembleClassifier. The model is 89% accurate with distinguishing high-risk loans from low risk loans, and the Recall clusters: The ability of a classifier to find all the positive/negative samples in a dataset , (84%:95%), are as equally accurate.