Retrieving information from the Holy Quran is an important field for Quran scholars, Muslim researchers, and Arabic enthusiasts in general. There are two popular types of Quran searching techniques: lexical or keyword-based and semantic or concept-based that is a challenging task, especially in a complex corpus such as the Holy Quran. Quranic Search
stands for lexical and semantic search in the Holy Quran.
Quranic Search is developed to help all people, especially Muslims to deal with the Holy Quran easier and faster, allowing them to search in the Holy Quran for specific Verses, by a keyword or a conceptual topic.
The Holy Quran is considered the primary reference to approximately 1.6 billion Muslims around the world and as the leading resource for classical Arabic language. Muslims, as well as non-Muslims, need to search for certain information from the Holy Quran or retrieve verses that discuss a specific topic, having various topics to discuss, for example; ethics, Islamic law, marital and family law, monetary transactions, morals, and the relationship between Islam/Muslims and other world religions.
- Incomplete results using key-words
- Lexical search is not based on the meaning of the search query
- Relevant verses based on meaning, improving the accuracy of search
- Best ranking of the most similar verses, based on Word Embedding Representation
- Natural Language Processing based
- Displaying the first 50 results based on the best ranking
- Using the best pre-trained Word2Vec models
- Building a sentence embedding model based on Word2Vec (CBOW Architecture)
- Using different methods to represent the sentence vector
- Max similarity score between two words (A word in a query and a word in a Verse)
- Max frequency score of a specific similarity (0.3) between two words
- Average similarity score between two words
- Pooling; max pooling and average pooling
- Preprocessing of the queries is done based on the preprocessing of models' training to o seek the best comparison of vectors
- Working first on the single word level
- Then we iterated over the whole query and sentence, maximizing all Verses words with the results by summing up the result of the method for every two words (in a query and a verse), to finally compare with the whole document of the Holy Quran text.
- Combining methods results and models to get the best results
- Fast with low cost, unlike using Transformers
- Open-source
When you make a lexical search:
- Lexical Search Django API interacts with the React UI
- Verses are retrieved based on the sequence of keywords using the Lexical Search API
- Verses are displayed in the results page lexicographically by Surah number and the Verse number in the Surah
When you make a semantic search:
- Semantic Search API interacts with the UI
- Verses IDs are retrieved based on the meaning/topic of words using the Semantic Search API
- A set of Word2Vec pre-trained models are used to get the word vectors of the words of Verses and search queries
- Computing sentence vectors is done using the several methods
- Combining the results of all methods by all models
- Verses are retrieved based on the similarity score between the query and the verse
- Computing distances by cosine similarity to retrieve the most similar verses
- Verses' all props are retrieved from the Lexical Search API
The tools used in this project.
Tool | Description | |
---|---|---|
Visual Studio Code | IDE | |
React.js | Frontend framework | |
django | Lexical Search Backend Framework | |
Flask | Semantic Search API Backend Framework | |
Gensim | Topic Modeling (Word2Vec, KeyedVectors) | |
SQLite3 | For the Holy Quran Database |
quranic-search-v2
βββ README.md <- This top-level README for this project
βββ LICENSE
βββ assets
β βββ screenshots <- Screenshots from the project
β βββ tools <- Used tools in the project
βββ backend
β βββ api
β β βββ lexical
β β β βββ api/ <- Lexical Django project with settings
β β β βββ db/ <- Used databases in the project
β β β βββ search/ <- Search application (static, templates, models, serializers, urls, views, tests, ..etc)
β β β βββ db.sqlite3 <- Migrated database
β β β βββ manage.py <- A command-line utility to interact with this Django project
β β β βββ requirements.txt <- All needed for installing the lexical search API
β β βββ semantic
β β βββ data
β β β βββ external/ <- Data from third-party sources
β β β βββ processed/ <- The final, canonical data sets for modeling
β β βββ models/ <- Trained and serialized models, model predictions, or model summaries
β β βββ notebooks/ <- All Jupyter notebooks
β β βββ src <- Source code for use in this project
β β β βββ __init__.py <- Makes src a Python module
β β β βββ models <- Scripts to train models and then use trained models to make predictions
β β β βββ pooling.py <- Pooling algorithms for sentence embeddings
β β β βββ predict.py <- Resources of the semantic search API
β β β βββ preprocess.py <- The frequent preprocessing methods
β β β βββ semantic_methods.py <- The semantic (word/sentence) search methods
β β βββ app.py <- The Flask application (entry point)
β β βββ requirements.txt <- All needed for installing the semantic search API
β βββ run.sh <- Bootstrapping script to run the APIs
βββ frontend
β βββ node_modules <- Node.js modules
β βββ public
β β βββ fonts <- Fonts used in the project
β β β βββ amiri/
β β β βββ kufi/
β β βββ images
β β β βββ quran-logo.png
β β βββ 404.html
β β βββ index.html
β β βββ manifest.json
β β βββ robots.txt
β βββ src
β β βββ components <- React components
β β β βββ HomeForm
β β β β βββ HomeForm.css
β β β β βββ HomeForm.js
β β β βββ Navbar/
β β β βββ ResultsForm/
β β β βββ Verse/
β β βββ containers <- React containers/pages
β β β βββ About
β β β β βββ About.css
β β β β βββ About.js
β β β βββ Bookmarks/
β β β βββ Home/
β β β βββ Results/
β β βββ App.css <- CSS for the application
β β βββ App.js <- The application file
β β βββ App.test.js <- The application file for testing
β β βββ index.css <- CSS for the root (entire application)
β β βββ index.js <- The root application file
β β βββ reportWebVitals.js <- WebVitals reporting script
β β βββ setupTests.js <- Setup script for testing
β βββ package-lock.json <- Used to install dependencies
β βββ package.json <- Used to install dependencies
βββ .github
β βββ workflows <- GitHub Actions workflows
β βββ django.yml
β βββ node.js.yml
βββ .gitignore
This project uses multiple pre-trained models, besides the requirements to run (backend/frontend). You can start by using the helper scripts to download a light model and install all requirements, before running:
sh scripts/start.sh
- Clone this repository
git clone https://github.com/ahr9n/quranic-search-v2.git
cd quranic-search-v2
π΄ All commands must be executed in the root of the project.
- Run all services (lexical API, semantic API, then frontend)
sh scripts/run.sh
- Navigate to
http://localhost:3000
π’ Now you are good to go!
π΄ Notice that all servers shall be running in the background using the scripts, so you can close all of them using the following command:
sh scripts/down.sh
Omar Shamkh π» |
Ahmad Almaghraby π» |
Ahmad Abdulrahman π» |
Abd El-Twab M. Fakhry π» |
Ahmad Ateya π» |
Licensed under the GPL-v3 License.