This project implements a breast cancer classification model using the Explainable Boosting Machine (EBM) algorithm. EBM is a type of interpretable machine learning model that provides both high accuracy and insights into feature importance. This model is trained on the Breast Cancer dataset, which is used to predict the presence of breast cancer based on various features.
- Model Training: Uses EBM for classification.
- Model Evaluation: Evaluates model accuracy on test data.
- Feature Importance: Visualizes feature importance to understand key factors influencing predictions.
The model uses the Breast Cancer dataset, which is included with the scikit-learn
library. This dataset consists of measurements of various features related to breast cancer tumors and is used for binary classification.
To run this project, you'll need Python and several libraries. It is recommended to use a virtual environment to manage dependencies.
-
Clone the Repository
git clone https://github.com/sahilmate/ebm-breast-cancer-classifier.git cd ebm-breast-cancer-classifier
-
Create and Activate a Virtual Environment
python -m venv env .\env\Scripts\Activate
-
Install Required Libraries
pip install -r requirements.txt
If
requirements.txt
is not present, manually install the required libraries:pip install pandas scikit-learn interpret matplotlib numpy
-
Run the Jupyter Notebook
Open the Jupyter Notebook file included in the repository (
ebm-breast-cancer-classifier.ipynb
) and run the cells to execute the project. -
Run the Python Script
Alternatively, you can run directly from the command line:
jupyter nbconvert --to notebook --execute ebm-breast-cancer-classifier.ipynb
This will load the dataset, train the EBM model, make predictions, and display the model's accuracy and feature importance plot.
The script will produce the following outputs:
-
Accuracy: The accuracy score of the model on the test dataset was : 0.9736842105263158
-
Feature Importance Plot: A horizontal bar plot showing the importance of each feature in the model.
Contributions to this project are welcome. To contribute:
-
Fork the Repository
Click the "Fork" button on the top right corner of this repository page.
-
Create a New Branch
git checkout -b your-branch-name
-
Make Your Changes
Edit or add files as necessary.
-
Commit Your Changes
git add . git commit -m "Your descriptive commit message"
-
Push to Your Fork
git push origin your-branch-name
-
Create a Pull Request
Go to the original repository and click on "New Pull Request." Select your branch and submit the pull request.
This project is licensed under the MIT License - see the LICENSE file for details.
- EBM: InterpretML for the Explainable Boosting Machine implementation.
- Dataset: Scikit-learn's Breast Cancer dataset.