Skip to content

Shargus/Sentiment-Analysis-TripAdvisor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Sentiment analysis of TripAdvisor reviews

Final project for the Data Science Lab exam

In this project, a sentiment analysis task is carried out. A binary classification pipeline is implemented, to detect whether a certain TripAdvisor textual review is a "positive review" or a "negative review".

The development dataset consists of 28754 reviews. The data pipeline includes the following preprocessing steps:

  • Removal of non-alphanumeric characters, tokenization, case normalization, stop-words removal (data cleaning & reformatting)
  • Stemming (through an Italian-based stemming algorithm)
  • Bigrams extraction
  • Removal of words/bigrams which are too frequent or too infrequent
  • TF-IDF feature extraction
  • Oversampling through SMOTE
  • Feature selection through ANOVA F-test

Classification performed by a simple Multinomial Naive Bayes classifier on the TF-IDF text representation of each review.

Achieved Weighted F1 score on the evaluation test set: 0.967

User guide

  • classification.py: project code
  • Report.pdf: final report describing the proposed solution (\w results)
  • dataset: contains the two datasets (development and evaluation)

About

Final project for the Data Science Lab exam

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages