The purpose of this challenge is to apply machine learning to predict credit-risk. Using a data set from LendingClub, oversample the data using the RandomOverSampler and SMOTE algorithms. Undersample the data with ClusterCentroids algorithm. The SMOTEENN algorithm is used for a combination of over and under resampling. BalancedRandomForestClassifier and EasyEnsembleClassifier are used to reduce bias to predict credit risk.
- The balanced accuracy score for the Balanced Forest Classifier is ~79%. Although it was not not the best performer, it falls within the realistic range of scores.
- High-risk has a precision score of 4% and a F1 score of 7%. Both of these are very low, with a large number of false positives.
- Low-risk has a precision score of 100% and a F1 score of 95%. These are excellent, very few high-risk were mistaken for low-risk.
- The balanced accuracy score is ~92%. This was the best results of all the tests.
- High-risk has a precision score of 7% and a F1 score of 14%. While slightly better, it will still produce a large number of false-positives.
- Low-risk had much better results, with a precision score of 100% and a F1 score of 97%.
- A balanced accuracy of ~66% is below what is considered good.
- High-risk has a precision of 1% and F1 of 2%. Once again these ar very low, producing a high number of false positives.
- Low-risk has a precision of 100%, but F1 of only 80%.
- The oversampling balanced accuracy score was about the same at ~66%, and is not considered a good score.
- High-risk's precision and F1 scores, are once again very low. 1% for precision, and 2% for F1.
- Low-risk are better at 100% precision, and 80% F1.
- Undersampling produced the worst balanced accuracy score at just ~53%.
- High-risk prescision and F1 scores are both very low at 1%.
- Low-risk preformed better with a precision score of 100% and F1 of 62%.
- The combination sampling was also under-preforming, with a balanced accuracy score of ~62%.
- High-risk also did poorly, with a precision of 1% and F1 of 2%
- Low risk was better with a precision os 100% and F1 of 72%.
Of all the tests preformed, Adaboost classifier preformed the best. The balanced accuracy score was very high at ~92%. The low-risk precision scores were also very high, leaving few high risk applicants mistaken for low-risk. This is ideal when looking for credit-risk. Having a lot of loans approved for high-risk applicants could result in many not being paid back. However, the low scores for high risk is allowing for a large number of false positives. This could cost the bank a lot in missed revenue. My suggestion would be to look for a better option.