GitHub - sid83/BankLoadData_MLmodel: Random forest classifier model applied to a bank loan data to predict loan defaulters.

Machine learning models were applied for prediction of loan application quality. The dataset used for ml modeling initially contained many features (around 57), which were trimmed down to 7, based on weightage. Few ml models were applied to figure out the best model for this problem.

Neural network (deep learning) - The accuracy was almost as good as random pick
KNN (K nearest neighbours) - accuracy increased with as K value increased up. Discarded as the accuracy was as good as randomly picking, whether loan will default or not.
SVC (Support vector classifier) - took too long a time to run and optimize coefficients, and accuracy not good enough
Logistic Regression - almost 93% test accuracy, fast processing
Random Forest (rf) - 93% test accuracy, used features_importance method to trim features to only the most important ones.

ROC Curve

ROC curve was calculated for the rf model, and was found that the area under the curve = 0.87

This shows that rf model was doing a decent job of prediction of loan quality on test data.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
Images		Images
2017_LoanData - Copy - Cat.csv		2017_LoanData - Copy - Cat.csv
Model_LR_RF.ipynb		Model_LR_RF.ipynb
Readme.md		Readme.md
output_17_0.png		output_17_0.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ROC Curve

About

Releases

Packages

Languages

sid83/BankLoadData_MLmodel

Folders and files

Latest commit

History

Repository files navigation

ROC Curve

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages