This repository contains a project focused on classifying movie genres using the Multinomial Naive Bayes classifier. The goal is to improve genre classification accuracy, thus enhancing recommendation systems for streaming platforms like Netflix.
In the constantly developing multimedia entertainment industry, particularly classifying movies into genres is a challenging yet essential task for more effective user suggestions and content management. This project provides a unique approach to this issue through using the Multinomial Naive Bayes classifier on a mixed and diverse movie dataset.
The dataset used in this project is sourced from Kaggle. It contains a comprehensive collection of movie and TV show data from Netflix, including titles, directors, actors, country, year, and descriptions.
The Multinomial Naive Bayes classifier is particularly suited for this project due to its effectiveness in text classification and handling of categorical data. The model is trained on movie descriptions, leveraging the Term Frequency-Inverse Document Frequency (TF-IDF) vectorization method to convert text data into numerical format.
- Python: 3.8 or newer
- Libraries:
- Pandas
- NumPy
- Scikit-learn
- Matplotlib
- Seaborn
- Jupyter Notebook
The model was evaluated using various metrics such as accuracy, precision, recall, and F1-score. The Multinomial Naive Bayes classifier showed high precision in genre classification and outperformed traditional methods like Decision Trees, K-Nearest Neighbors, and Support Vector Machines.
Visualizations such as confusion matrices, bar plots, and line graphs were used to illustrate the model's performance. These visualizations help in understanding the strengths and limitations of the classifier.
Confusion Matrix
Accuracy Comparision between MNN and KNN
Correlation B/W Genre Frequency and Model Accuracy
MNB Accuracy
- Collaborative Filtering Recommender System Based on Memory Based in Twitter Using Decision Tree Learning Classification
- A comprehensive survey on support vector machine classification: Applications, challenges and trends
- A multimodal approach for multi-label movie genre classification
- A Movie Recommendation System Design Using Association Rules Mining and Classification Techniques
- Multinomial Naїve Bayes for Documents Classification and Natural Language Processing (NLP)
- Installing Jupyter
- Anaconda Software Distribution
- Pandas Library
- NumPy The fundamental package for scientific computing with Python
- scikit-learn: Machine Learning in Python
- Matplotlib: Visualization with Python
- seaborn: statistical data visualization
- Multilabel Genre Prediction Using Deep-Learning Frameworks