Skip to content

aitwehrrg/SmartMailGuard

Repository files navigation

SmartMailGuard

Table of Contents

About the Project

Aim

The objective of this project is to develop an intelligent email classification system using machine learning and deep learning models.

Description

SmartMailGuard is a system designed to categorize emails using Naïve Bayes, LSTM, and other Transformer architectures.

Using these different models and algorithms we can compare and grade their effectiveness on datasets of varying sizes and on the type of classification: Binary (Spam/Not-Spam) or Multiclass.

Tech Stack

  1. Python
  2. NumPy
  3. PyTorch
  4. TensorFlow
  5. Pandas
  6. HuggingFace

Models and Accuracies

83k Dataset Link(For Binary Classification): Kaggle
3k Dataset Link(For Multiclass Classification): Kaggle
Dataset for AutoLabeler: Kaggle

1. Naïve Bayes

1.1. Without N-gram Optimization

  • Train:
  • Test:

1.2. With N-gram Optimization

  • Train:
  • Test:
Toy Examples:

2. Recurrent Neural Network (RNN)

  • Train:
  • Test:

3. Long Short-Term Memory (LSTM)

  • Train:
  • Test:

4. Multinomial Naïve Bayes

5. Bidirectional Encoder Representations from Transformers (BERT)

5.1. From Scratch

  • Train:
  • Test:

5.2. Implementation from a Pre-Trained Model

  • Train:
  • Test:
Toy Examples

6. Support Vector Machine (SVM)

  • Train:
  • Test:

7. Decision Tree

8. Random Forest Classififer

  • Train:
  • Test:

File Structure

├── Binary Classification
│   ├── Naive_Bayes_Final.ipynb
│   ├── Naive_Bayes_enron_dataset.ipynb
│   ├── Naive_Bayes_sklearn.ipynb
│   ├── lstmemailclassification.ipynb
│   └── RNN_spam_not_spam.ipynb
├── Coursera Notes
│   ├── Course1
│   ├── Course2
│   ├── Course5
├── Multi Intent Classification
│   ├── Decision Tree
|   │   ├── decision-tree-grid-search.ipynb
|   │   ├── decision-tree.ipynb
│   ├── Random Forest Classifier
|   │   ├── RandomForestClassifier-grid_search.ipynb
|   │   ├── RandomForestClassifier.ipynb
│   ├── Support Vector Machine
|   │   ├── SVM_grid_search.ipynb
|   │   ├── SVM_multiclass_classifier.ipynb
|   ├── AutoLabeler.ipynb
│   ├── Multiclass.ipynb
│   ├── multiclass-bert-Finaldataset.ipynb
│   ├── multiclass-bert-Finaldataset-from-scratch.ipynb
│   └── multinomial_combined.ipynb
├── SmartMailGuard Report
│   ├── SmartMailGuard Report.pdf
└── README.md 

Requirements

  • Install Python 3.1.
  • Install Pip and verify its installation using the following terminal command:
pip --version
  • Optional: Install Jupyter using the following command:
pip install jupyter lab

Alternatively, Google Colaboratory and Kaggle can also be used to run the notebooks (with some RAM limitations).

  • Run the following command to install all the dependencies:
pip install pandas pytorch scikit-learn tensorflow transformers
  • Clone the repository:
git clone https://github.com/aitwehrrg/SmartMailGuard.git
  • Run any of the models (.ipynb) as Jupyter notebooks.

Contributors

  1. Amal Verma
  2. Kevin Shah
  3. Rupak R. Gupta

Mentors

  1. Druhi Phutane
  2. Raya Chakravarty

Acknowledgements and Resources

  1. CoC and Project X for providing this opportunity.
  2. Course on Deep Learning Specialization by DeepLearning.AI
  3. Long Short-Term Memory
  4. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
  5. Attention is all you need
  6. Kaggle datasets
  7. HuggingFace Transformer Models

About

SmartMailGuard for Project X 2024

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •