This project aims to build LSTM models that can predict hindi texts based on a charecter level prediction.
The code requires the following dataset from : https://github.com/ritikdhame/Hindi_Text_Prediction_using_LSTM/blob/main/Data/chankya%20neeti.txt The dataset contains : First Fourteen chapters from the book chankya neeti in hindi language, with around corpus length of 45245.
To run the code, you need to install the following libraries:
- matplotlib
- Keras.layers -keras.optimizers
- keras.model
-Single LSTM model
- Stacked LSTM model
The code consists of 3 main parts:
- Data loading and exploration: invloves loading the chankya neeti hindi text file with the first 14 chapters as training data, followed by exploring the corpus of data.
- Preprocessing: Involves getting rid on data in English and numbers to simplify the model training process.
- Model building and training: This part using the keras libraries to build a single LSTM model with only 128 units and softmax activation layer, followed by Stacked LSTM model with a hidden LSTM with 128 units and a softmax activation layer.
- After seting seed of the 60 charecters we go on to predict 400 charecters
- We then track the loss function at every epoch and finally compare the loss function of both the LSTMs
You can install them using pip or conda commands. To run the code, you need to execute the following steps:
- Import the required libraries.
- Load and view the text and explore the data
- Define the layers of the LSTM model
- Building and fitting the model with keras package.
- Predicting the tect on a charecter level
- Evaluating and visualize the model performance using metrics and plots.
The code is commented and documented for better understanding and readability.