This project consists of three main Python files and utility modules for document retrieval, re-ranking, and language model integration. The purpose of this project is to showcase the integration of language models, document retrieval, and re-ranking techniques for enhanced information retrieval.
-
main.py
- The main script to execute the entire workflow. It loads data from a PDF file, performs document retrieval, re-ranks the documents using a Cross Encoder, and incorporates automatic query expansion.
-
reranker.py
- Contains functions related to re-ranking retrieved documents using a Cross Encoder.
-
query_expansion.py
- Includes functions for generating question responses based on input text, using a language model and a predefined prompt.
-
helper_utils.py
- Utility functions for reading PDF files, chunking texts, loading data into a Chroma database, word wrapping, and projecting embeddings.
-
model.py
- Defines a Watson Machine Learning model using the IBM Watson platform and integrates it into a LangChain-based language model.
-
Ensure you have the necessary dependencies installed. You can install them using:
pip install -r requirements.txt
-
Set up environment variables:
- Ensure you have the required environment variables set, such as
GA_GENAI_URL
,GA_GENAI_KEY
, andGA_PROJECT_ID
.
- Ensure you have the required environment variables set, such as
-
Execute the main script:
python main.py
numpy
tqdm
chromadb
sentence_transformers
langchain
ibm_watson_machine_learning
- This project was created by Charan H U.
Feel free to customize this README to provide more details about the project structure, usage, and any additional instructions.