Skip to content

riemanli/text_sentiment_classification

Repository files navigation

Text Sentiment Classification

Introduction

This repository contains two notebooks for Twitter tweet sentiment classification -- one using Bi-LSTM and the other one using pre-trained BERT.

The tweet dataset contains:

  • labelled training data: 200k
  • unlabeled training data: 1.1M
  • testing data: 200k

Models

Bi-LSTM:

To use the Bi-LSTM, we utilize the pre-trained Word2Vec model in Gensim to embed each sentence. In addition, Kerastuner is used to find the best hyperparameters.

BERT:

Here, we use the pre-trained BERT model from Tensorflow Hub.

Training

To make use of the unlabeled training dataset, we use semi-supervised learning to increase our training dataset.

About

A classifier for text sentiment

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published