GitHub - Yakoob-Khan/Toxic-Comment-Classification-Challenge

Automatic Extraction, Classification and Neutralization of Toxic Comments

CS 89.21 Data Mining and Knowledge Discovery

Final Project Winter 2021

Yakoob Khan, Luca C. L. Lit, Louis Murerwa, Aadil Islam

About

This repository contains code used to scrap tweets related to anti-asian rhetoric in March 2021. We used the Toxic Comments Classification Challenge data provided by Jigsaw/Conversational AI team on Kaggle to train classification models to filter toxic comments. Finally, we explore a simple rules-based word substitution technique to neutralize the offensive language in comments that our best classification model (BERT) predicts as toxic.

Layout of Code Repository

├── Tweet-Scraping          			# Code to scrape posts using Tweepy API
├── checkpoints     	    			# Contain intermediate predictions
    ├── bert_model_cased_not_lowered 
├── data    					# Training data    
├── deep_learning_based_models			# BERT Classification   
├── style-transfer				# Text Style Transfer
├── README.md							
├── machinelearningmodles		        # Sci-kit Learn Classical ML Models
└── requirements.txt			        # dependencies for reproducibility

Dataset

Toxic Comments Classification Challenge

Frameworks

Sci-kit Learn
Pytorch
Hugging Face Transformer's Library

Acknowledgements

We thank Professor Soroush Vosoughi and the Teaching Assistant team for the wonderful course on Data Mining and Knowledge Discovery (Winter 2021).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Automatic Extraction, Classification and Neutralization of Toxic Comments

About

Layout of Code Repository

Dataset

Frameworks

Acknowledgements

About

Releases

Packages

Contributors 4

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
Tweet-Scraping		Tweet-Scraping
checkpoints/bert_model_cased_not_lowered		checkpoints/bert_model_cased_not_lowered
data		data
deep_learning_based_models		deep_learning_based_models
machinelearningmodles		machinelearningmodles
style-transfer		style-transfer
Project Report.pdf		Project Report.pdf
README.md		README.md
requirements.txt		requirements.txt

Yakoob-Khan/Toxic-Comment-Classification-Challenge

Folders and files

Latest commit

History

Repository files navigation

Automatic Extraction, Classification and Neutralization of Toxic Comments

About

Layout of Code Repository

Dataset

Frameworks

Acknowledgements

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages