Skip to content

authorship identification, UH course 'Machine Learning'

Notifications You must be signed in to change notification settings

stephenkung/authorship

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

authorship identification

authorship identification, UH course 'Machine Learning'

dependency

Python3 + LGBM + NLTK + textstat + Jupyter notebook + pandas + scikit-learn + pymysql

How to use

Dataset: download from http://ritual.uh.edu/resources/
Authorship Attribution on Reviews (CICLING 2016), you only need to download Amazon review.
AA.sql: run this script to generate train+validation+test dataset. Each author will have 200 comments, total 1000 authors.
data_processing.ipynb: the main code to do feature engineering.
model.ipynb: code for LGBM model.
Final_AA_Group1.pdf: final presentation.
feature importance

About

authorship identification, UH course 'Machine Learning'

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published