Sentiment-Analysis-on-Twitter-Data

Project Overview

This project focuses on performing sentiment analysis on Twitter data using machine learning and deep learning models. The sentiment classes in the dataset are:

Positive (1)
Negative (2)
Neutral (0)

Dataset

The dataset consists of 3,534 rows with multiple columns such as:

Text: The tweet content Sentiment: Target labels (0: Neutral, 1: Positive, 2: Negative) Other columns like user details and geographic information are not used for model building but are part of the dataset.

Preprocessing

Data cleaning steps included:

Removing hashtags, URLs, mentions, and special characters from the tweets. Lowercasing all text for consistency.

Feature Engineering

TF-IDF vectorization was applied to convert the text data into numerical form for model training.

Models Used

Multiple machine learning and deep learning models were trained to predict sentiment, including:

Logistic Regression Random Forest Support Vector Machine (SVM) AdaBoost Transformers (BERT) Data Augmentation

To improve model performance, various data augmentation techniques such as back translation and random word insertion were applied to the training set.

Evaluation

Model performance was evaluated using accuracy, confusion matrix, precision, recall, F1-score, and AUC.

Results

Despite rigorous preprocessing and data augmentation, the models initially achieved accuracies in the range of 62-64%, with further improvements observed after applying back translation and hyperparameter tuning.

Technologies Used

Python Scikit-learn for machine learning models Transformers (Hugging Face) for BERT-based models TF-IDF Vectorization for text data Data Augmentation to enhance training data GridSearchCV for hyperparameter tuning

How to Run

Clone the repository.

Install the required packages:

pip install -r requirements.txt

Run the notebook to preprocess data, train models, and evaluate performance.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
README.md		README.md
Sentimental_Analysis_Twitter_Data_(2_vast).ipynb		Sentimental_Analysis_Twitter_Data_(2_vast).ipynb
Twitter_Analysis_Testing.csv		Twitter_Analysis_Testing.csv
Twitter_Analysis_Training.csv		Twitter_Analysis_Training.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sentiment-Analysis-on-Twitter-Data

Project Overview

Dataset

Preprocessing

Data cleaning steps included:

Feature Engineering

Models Used

Evaluation

Results

Technologies Used

How to Run

Clone the repository.

About

Releases

Packages

Languages

Gracysapra/Sentiment-Analysis-on-Twitter-Data

Folders and files

Latest commit

History

Repository files navigation

Sentiment-Analysis-on-Twitter-Data

Project Overview

Dataset

Preprocessing

Data cleaning steps included:

Feature Engineering

Models Used

Evaluation

Results

Technologies Used

How to Run

Clone the repository.

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages