Welcome to the 2024 Kaggle Playground Series! This project is part of the Kaggle competition where the goal is to predict whether an applicant is approved for a loan. The dataset provided is ideal for practicing machine learning skills and involves various preprocessing, training, and evaluation steps.
Data Source:Kaggle
Loan-Approval/
├── data/
│ ├── raw/
│ │ └── train.csv
│ ├── cleaned/
│ │ └── train.csv
├── models/all the model # Didn't save 100mb-400mb+ Size
├── src/
│ ├── clean_data.py
│ ├── evaluate.py
│ ├── preprocessing.py
│ └── trainer.py
├── RandomF.ipynb
├── notebook/
│ └── notebook.ipynb # EDA & DATA visualizations
├── XGB.ipynb
├── Stacking_clf.ipynb
├── catboost.ipynb
├── lightgb.ipynb
├── requirements.txt
├── app.py # fastAPI app to predict the loan status
├── app.log # Logs generated by fastapi (app.py)
├── Dockerfile # Dockerize fastapi app
└── README.md
- Python 3.12.3
- Jupyter Notebook
- Required Python packages (listed in
requirements.txt
)
-
Clone the repository:
git clone https://github.com/Jatin-Mehra119/Loan-Approval.git cd Loan-Approval
-
Install the required packages:
pip install -r requirements.txt
-
Place the raw data file
train.csv
in thedata/raw/
directory. -
Run the data cleaning script to preprocess the data:
python src/clean_data.py
This will create a cleaned version of the data in the
data/cleaned/
directory.
- RandomF.ipynb: Train and tune the hyperparameters, evaluate a Random Forest classifier.
- XGB.ipynb: Train and tune the hyperparameters, evaluate an XGBoost classifier.
- Stacking_clf.ipynb: Train and tune the hyperparameters, evaluate and tune the hyperparameters, a Bagging classifier.
- catboost.ipynb: Train and tune the hyperparameters, evaluate a CatBoost classifier.
Contains the preprocessing
class that handles data preprocessing steps including imputation, scaling, and encoding.
Defines the Trainer
class which handles model training, hyperparameter tuning, evaluation, and saving.
Defines the Evaluator
class which evaluates the trained models and logs metrics.
Loads raw data, applies preprocessing, and saves the cleaned data.
This project also includes a FastAPI application that serves the trained loan approval model as an API. The API allows you to send loan applicant data in JSON format and receive a prediction indicating whether the loan is approved or not.
- POST /predict: This endpoint accepts a JSON body containing the applicant's loan data and returns the loan approval prediction.
You can test the FastAPI app using Postman or cURL. Here's an example of how to format the input data:
{
"id": 58652,
"person_age": 23,
"person_income": 55000,
"person_home_ownership": "MORTGAGE",
"person_emp_length": 6.0,
"loan_intent": "PERSONAL",
"loan_grade": "A",
"loan_amnt": 6250,
"loan_int_rate": 6.76,
"loan_percent_income": 0.12,
"cb_person_default_on_file": "N",
"cb_person_cred_hist_length": 2
}
{
"prediction": 1
}
Where:
- 1 indicates the loan is approved.
- 0 indicates the loan is denied.
To run the FastAPI app, use the following command:
uvicorn app:app --reload
This will start the API server at http://localhost:8000
. You can then access the /predict
endpoint to test predictions.
The FastAPI application logs each incoming request, and these logs are saved locally for reference. The logs include details such as the timestamp, the request data, and the corresponding prediction result.
The FastAPI application also handles errors gracefully, returning a message describing any issues encountered during the prediction process.
- Open the desired Jupyter notebook (e.g.,
RandomF.ipynb
) and follow the steps to train and evaluate the model. - The models will be saved in the
models/
directory. - Evaluate the models using the provided evaluation scripts.
- Run the FastAPI app and interact with it using Postman or a similar tool.
For the final submission, the Stacking Classifier was selected and used to generate the predictions. This model achieved the highest performance and was used for the final submission.csv
.
Here are the ROC_AUC scores (Cross-validation scores) for each model:
- XGBoost: ~95
- CatBoost: ~95
- LightGBM: ~95
- Random Forest: ~93
- Stacking Classifier: 96 (Best performing model for final submission)
Contributions are welcome! Feel free to open a pull request or create an issue if you find any bugs or have suggestions for improvements.
This project is licensed under the MIT License.
- FastAPI Overview: Added a new section that explains the FastAPI application for serving predictions.
- API Example: Added an example of the input JSON and the expected output JSON.
- Running FastAPI: Instructions on how to run the FastAPI app locally.
- Logs and Error Handling: Mentioned that incoming requests are logged and handled with error messages if necessary.