The Fake News Detection project is a machine learning endeavor aimed at identifying and classifying news articles as either real or fake. The project utilizes various algorithms, including Naive Bayes, Logistic Regression, Decision Tree, and LSTM Neural Network, to achieve accurate and reliable results. By leveraging a diverse set of models, the project provides a comprehensive approach to addressing the challenge of misinformation in the digital age.
The project is organized into distinct components, each serving a crucial role in the overall workflow:
-
Data Preprocessing:
- Removal of unnecessary columns and null values.
- Text cleaning, including the removal of special characters, punctuation, and stopwords.
- Visualization of frequent words through WordCloud.
-
Model Implementation:
- Implementation of Naive Bayes, Logistic Regression, Decision Tree, and LSTM Neural Network models.
- Training and evaluation of each model using performance metrics.
- Visualization of confusion matrices for result interpretation.
-
Model Testing:
- Testing the trained models on unseen input for real-time detection.
- Providing a script for users to input their own news text and receive predictions.
- Python Version: 3.10.0
- CUDA Version: 11.2
- cudNN Library Version: 8.1
- Download the Fake News Dataset and place it in the project directory.
-
Set up a virtual environment:
python -m venv [name]
-
Activate the virtual environment:
- On Windows:
[name]\Scripts\activate
- On Unix or MacOS:
source [name]/bin/activate
Install the required packages from the requirements.txt
file:
pip install -r requirements.txt
The project begins with a thorough data preprocessing phase, cleaning and preparing the dataset for model training. Special attention is given to the removal of irrelevant information, handling missing values, and cleaning the text data.
The project employs four distinct models for fake news detection: Naive Bayes, Logistic Regression, Decision Tree, and LSTM Neural Network. Each model is implemented, trained on the dataset, and evaluated for performance.
model_nb = MultinomialNB()
model_nb.fit(x_train_tfidf, y_train)
y_pred_nb = model_nb.predict(x_test_tfidf)
#... (Performance evaluation and confusion matrix visualization)
from sklearn.linear_model import LogisticRegression
model_lr = LogisticRegression(max_iter=1000)
model_lr.fit(x_train_tfidf, y_train)
y_pred_lr = model_lr.predict(x_test_tfidf)
#... (Performance evaluation and confusion matrix visualization)
from sklearn.tree import DecisionTreeClassifier
model_dt = DecisionTreeClassifier()
model_dt.fit(x_train_tfidf, y_train)
y_pred_dt = model_dt.predict(x_test_tfidf)
# ... (Performance evaluation and confusion matrix visualization)
from keras.preprocessing.text import Tokenizer
from keras.preprocessing.sequence import pad_sequences
# ... (Tokenization and padding)
# ... (Splitting data into test and train set for LSTM)
# ... (Creating and training the LSTM model)
# ... (Model performance and accuracy evaluation)
Users can test the models on their own input for real-time fake news detection. The provided scripts guide users through the process, ensuring seamless and accessible testing.
The Fake News Detection project offers a robust and diverse set of models to effectively combat the spread of misinformation. Users are encouraged to explore the provided code for a deeper understanding and contribute to advancing fake news detection algorithms.
Note: For detailed code implementation and analysis, refer to the corresponding notebooks and scripts within the repository.