This repository contains machine learning models and code for predicting the next word in a sequence based on a variety of algorithms and methods, such as TF-IDF, Cosine Similarity, AdaBoost, and more. The project demonstrates how to use different techniques for text prediction tasks.
- Next Word Prediction using:
- TF-IDF Multinominal Naive Bayes model / embedding with universal model encoder by Tensor Flow with LSTM model
- Training and evaluating multiple models
- Interactive Web Application using Flask
- Support for different types of word embeddings and vectorizers
Before running the code, ensure you have the following Python libraries installed:
Flask
pandas
numpy
scikit-learn
keras
tensorflow
nltk
gensim
You can install the dependencies using the following command:
pip install -r requirements.txt
- in app.py uncomment this line
embed = (hub.KerasLayer("https://tfhub.dev/google/universal-sentence-encoder/4"))
- make sure to make a rep in your repository named universal_model_encoder_tf
- navigate to here you find this rep and copy it in universal_model_encoder_tf
C:\\Users\\name\\AppData\\Local\\tfhub_modules
063d866c06683311b44b4992fd46fsfdsfdsf/
│
├── saved_model.pb
├── variables/
│ ├── variables.data-00000-of-00001
│ └── variables.index
- they comment the line to avoide loading the model each time you run the app
Note that: the corpus used is directed to concepts of AI