Skip to content

ahr9n/quranic-search-v2

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

72 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Quranic Search

Quranic Lexical/Semantic Search

A Machine Learning Based Web Application of Retrieving Information from the Holy Quran Using NLP

πŸ“œ Table of Contents

πŸŽ‰ About the Project

Retrieving information from the Holy Quran is an important field for Quran scholars, Muslim researchers, and Arabic enthusiasts in general. There are two popular types of Quran searching techniques: lexical or keyword-based and semantic or concept-based that is a challenging task, especially in a complex corpus such as the Holy Quran. Quranic Search stands for lexical and semantic search in the Holy Quran.

πŸ“· Screenshots

Home Results

🎀 What is Quranic Search

Quranic Search is developed to help all people, especially Muslims to deal with the Holy Quran easier and faster, allowing them to search in the Holy Quran for specific Verses, by a keyword or a conceptual topic.

πŸ” Why Quranic Search

The Holy Quran is considered the primary reference to approximately 1.6 billion Muslims around the world and as the leading resource for classical Arabic language. Muslims, as well as non-Muslims, need to search for certain information from the Holy Quran or retrieve verses that discuss a specific topic, having various topics to discuss, for example; ethics, Islamic law, marital and family law, monetary transactions, morals, and the relationship between Islam/Muslims and other world religions.

πŸ‘Ž The Problem with Traditional Lexical Search

  • Incomplete results using key-words
  • Lexical search is not based on the meaning of the search query

πŸ‘ The Solution with Semantic Search

  • Relevant verses based on meaning, improving the accuracy of search
  • Best ranking of the most similar verses, based on Word Embedding Representation

πŸ™ˆ Features

  • Natural Language Processing based
  • Displaying the first 50 results based on the best ranking
  • Using the best pre-trained Word2Vec models
  • Building a sentence embedding model based on Word2Vec (CBOW Architecture)
  • Using different methods to represent the sentence vector
    • Max similarity score between two words (A word in a query and a word in a Verse)
    • Max frequency score of a specific similarity (0.3) between two words
    • Average similarity score between two words
    • Pooling; max pooling and average pooling
  • Preprocessing of the queries is done based on the preprocessing of models' training to o seek the best comparison of vectors
  • Working first on the single word level
    • Then we iterated over the whole query and sentence, maximizing all Verses words with the results by summing up the result of the method for every two words (in a query and a verse), to finally compare with the whole document of the Holy Quran text.
  • Combining methods results and models to get the best results
  • Fast with low cost, unlike using Transformers
  • Open-source

πŸ” How Quranic Search Works

When you make a lexical search:

  • Lexical Search Django API interacts with the React UI
  • Verses are retrieved based on the sequence of keywords using the Lexical Search API
  • Verses are displayed in the results page lexicographically by Surah number and the Verse number in the Surah

When you make a semantic search:

  • Semantic Search API interacts with the UI
  • Verses IDs are retrieved based on the meaning/topic of words using the Semantic Search API
  • A set of Word2Vec pre-trained models are used to get the word vectors of the words of Verses and search queries
  • Computing sentence vectors is done using the several methods
    • Combining the results of all methods by all models
  • Verses are retrieved based on the similarity score between the query and the verse
  • Computing distances by cosine similarity to retrieve the most similar verses
  • Verses' all props are retrieved from the Lexical Search API

πŸ› οΈ Tech Stack and Tools

The tools used in this project.

Tool Description
Visual Studio Code IDE
React.js Frontend framework
django Lexical Search Backend Framework
Flask Semantic Search API Backend Framework
Gensim Topic Modeling (Word2Vec, KeyedVectors)
SQLite3 For the Holy Quran Database

πŸͺœ Source Code Directory Structure

quranic-search-v2
β”œβ”€β”€ README.md                                   <- This top-level README for this project
β”œβ”€β”€ LICENSE
β”œβ”€β”€ assets
β”‚   β”œβ”€β”€ screenshots                             <- Screenshots from the project
β”‚   └── tools                                   <- Used tools in the project
β”œβ”€β”€ backend                                     
β”‚   β”œβ”€β”€ api
β”‚   β”‚   β”œβ”€β”€ lexical
β”‚   β”‚   β”‚   β”œβ”€β”€ api/                            <- Lexical Django project with settings
β”‚   β”‚   β”‚   β”œβ”€β”€ db/                             <- Used databases in the project
β”‚   β”‚   β”‚   β”œβ”€β”€ search/                         <- Search application (static, templates, models, serializers, urls, views, tests, ..etc) 
β”‚   β”‚   β”‚   β”œβ”€β”€ db.sqlite3                      <- Migrated database
β”‚   β”‚   β”‚   β”œβ”€β”€ manage.py                       <- A command-line utility to interact with this Django project 
β”‚   β”‚   β”‚   └── requirements.txt                <- All needed for installing the lexical search API
β”‚   β”‚   └── semantic
β”‚   β”‚       β”œβ”€β”€ data                    
β”‚   β”‚       β”‚   β”œβ”€β”€ external/                   <- Data from third-party sources
β”‚   β”‚       β”‚   └── processed/                  <- The final, canonical data sets for modeling
β”‚   β”‚       β”œβ”€β”€ models/                         <- Trained and serialized models, model predictions, or model summaries
β”‚   β”‚       β”œβ”€β”€ notebooks/                      <- All Jupyter notebooks
β”‚   β”‚       β”œβ”€β”€ src                             <- Source code for use in this project
β”‚   β”‚       β”‚   β”œβ”€β”€ __init__.py                 <- Makes src a Python module
β”‚   β”‚       β”‚   └── models                      <- Scripts to train models and then use trained models to make predictions
β”‚   β”‚       β”‚       β”œβ”€β”€ pooling.py              <- Pooling algorithms for sentence embeddings
β”‚   β”‚       β”‚       β”œβ”€β”€ predict.py              <- Resources of the semantic search API
β”‚   β”‚       β”‚       β”œβ”€β”€ preprocess.py           <- The frequent preprocessing methods 
β”‚   β”‚       β”‚       └── semantic_methods.py     <- The semantic (word/sentence) search methods
β”‚   β”‚       β”œβ”€β”€ app.py                          <- The Flask application (entry point)
β”‚   β”‚       └── requirements.txt                <- All needed for installing the semantic search API
β”‚   └── run.sh                                  <- Bootstrapping script to run the APIs
β”œβ”€β”€ frontend
β”‚   β”œβ”€β”€ node_modules                            <- Node.js modules
β”‚   β”œβ”€β”€ public                                  
β”‚   β”‚   β”œβ”€β”€ fonts                               <- Fonts used in the project
β”‚   β”‚   β”‚   β”œβ”€β”€ amiri/ 
β”‚   β”‚   β”‚   └── kufi/ 
β”‚   β”‚   β”œβ”€β”€ images
β”‚   β”‚   β”‚   └── quran-logo.png 
β”‚   β”‚   β”œβ”€β”€ 404.html 
β”‚   β”‚   β”œβ”€β”€ index.html 
β”‚   β”‚   β”œβ”€β”€ manifest.json
β”‚   β”‚   └── robots.txt  
β”‚   β”œβ”€β”€ src
β”‚   β”‚   β”œβ”€β”€ components                          <- React components
β”‚   β”‚   β”‚   β”œβ”€β”€ HomeForm  
β”‚   β”‚   β”‚   β”‚   β”œβ”€β”€ HomeForm.css  
β”‚   β”‚   β”‚   β”‚   └── HomeForm.js  
β”‚   β”‚   β”‚   β”œβ”€β”€ Navbar/
β”‚   β”‚   β”‚   β”œβ”€β”€ ResultsForm/
β”‚   β”‚   β”‚   └── Verse/
β”‚   β”‚   β”œβ”€β”€ containers                          <- React containers/pages
β”‚   β”‚   β”‚   β”œβ”€β”€ About
β”‚   β”‚   β”‚   β”‚   β”œβ”€β”€ About.css  
β”‚   β”‚   β”‚   β”‚   └── About.js  
β”‚   β”‚   β”‚   β”œβ”€β”€ Bookmarks/
β”‚   β”‚   β”‚   β”œβ”€β”€ Home/
β”‚   β”‚   β”‚   └── Results/
β”‚   β”‚   β”œβ”€β”€ App.css                             <- CSS for the application
β”‚   β”‚   β”œβ”€β”€ App.js                              <- The application file
β”‚   β”‚   β”œβ”€β”€ App.test.js                         <- The application file for testing
β”‚   β”‚   β”œβ”€β”€ index.css                           <- CSS for the root (entire application)
β”‚   β”‚   β”œβ”€β”€ index.js                            <- The root application file
β”‚   β”‚   β”œβ”€β”€ reportWebVitals.js                  <- WebVitals reporting script
β”‚   β”‚   └── setupTests.js                       <- Setup script for testing
β”‚   β”œβ”€β”€ package-lock.json                       <- Used to install dependencies
β”‚   └── package.json                            <- Used to install dependencies
β”œβ”€β”€ .github  
β”‚   └── workflows                               <- GitHub Actions workflows
β”‚       β”œβ”€β”€ django.yml
β”‚       └── node.js.yml
└── .gitignore

πŸš΄β€β™‚οΈ Getting Started

🟑 Prerequisites

This project uses multiple pre-trained models, besides the requirements to run (backend/frontend). You can start by using the helper scripts to download a light model and install all requirements, before running:

sh scripts/start.sh

πŸ”§ Run for Development

  • Clone this repository
git clone https://github.com/ahr9n/quranic-search-v2.git
cd quranic-search-v2

πŸ”΄ All commands must be executed in the root of the project.

  • Run all services (lexical API, semantic API, then frontend)
sh scripts/run.sh
  • Navigate to http://localhost:3000

🟒 Now you are good to go!

πŸ”΄ Notice that all servers shall be running in the background using the scripts, so you can close all of them using the following command:

sh scripts/down.sh

🐣 Contributors


Omar Shamkh

πŸ’»

Ahmad Almaghraby

πŸ’»

Ahmad Abdulrahman

πŸ’»

Abd El-Twab M. Fakhry

πŸ’»

Ahmad Ateya

πŸ’»

⚠️ License

Licensed under the GPL-v3 License.