This project is a Streamlit app that uses a large language model from Ollama to assist with local business queries and recommendations. It connects to a local API endpoint and provides responses based on the user's input.
- Architecture
- Key Features
- RAFT
- Folder Structure
- Installation
- Datasets
- Ollama Model
- Initialize Backend
- Run the App
- Demo
- TODO
- Blog
- FastAPI Backend: Handles data processing, retrieval, and integration with the LLM.
- Vector Store: Utilizes Chroma DB for efficient similarity search.
- Embedding Model: Employs HuggingFace's BAAI/bge-small-en-v1.5 for text embeddings.
- Large Language Model: Uses a fine-tuned LLaMA3-8B model for natural language understanding and generation.
- Image Embedding and Matching: Incorporates image-based search for enhanced recommendations using CLIP's features.
- Streamlit Frontend: Provides an intuitive user interface for interacting with the system.
-
Personalized Recommendations
- Takes into account the user's past reviews and preferences
- Stores and retrieves user-specific review data
- Incorporates user reviews into the context provided to the LLM
-
Multi-modal Search
- Incorporates image data for enhanced search capabilities
- Performs image-text matching to find visually relevant results
- Presents top matching images alongside text recommendations
-
Conversational Interface
- Chat-like interface for natural language queries
- Detailed responses from the LLM
- View relevant images and business details
- Engage in follow-up questions for deeper exploration
-
Efficient Data Retrieval
- Uses Chroma DB as a vector store for both text and image embeddings
- Implements semantic search using the BAAI/bge-small-en-v1.5 embedding model
- Uses CLIP embedding with Chroma DB for image retrievals
RAFT is a recipe for adapting LLMs to domain-specific RAG. RAFT is used to create synthetic dataset in COT(chain of thought) Format along with answer. We can also pass distraction docs for increasing difficulty of prediction at inference/training. Finally, the finetuned model will produce the output using reference to meta data of the business using double quotes.
For information about the RAFT technique, please refer to the README file in the raft
directory.
project_root/
│
├── assets/
│ └── demo.gif
│
├── datasets/
│ └── google-local-dataset/
│
├── models/
│ └── gguf_model/
│ └── Modelfile
│
├── notebooks/
│ └── finetuning_notebook.ipynb
│
├── raft/
│ └── README.md
│
├── fast_api.py # Backend
|
├── streamlit_demo.py # Frontend
|
├── utils.py
│
├── requirements.txt
└── README.md
To run this app, you need to have Python and Streamlit installed on your machine. You can install the required packages using pip:
pip install -r requirements.txt
Download the datasets from google-local-dataset and save them inside the datasets
directory.
- You can use the model directly for inference if it fits the system; otherwise, you have to use unsloth model conversion to convert it to GGUF format.
- Use the following code from unsloth to convert it to the GGUF model with q4_k_m quantization:
from unsloth import FastLanguageModel
model, tokenizer = FastLanguageModel.from_pretrained("../checkpoint_xx")
model.save_pretrained_gguf("gguf_model", tokenizer, quantization_method = "q4_k_m")
Convert the finetuned model from the finetuning notebook to Ollama using the following command:
ollama create gmap_recomm_llama3 -f ./gguf_model/Modelfile
Note
Update the path of the file in Modelfile
python src/backend/fast_api.py
streamlit run src/frontend/streamlit_demo.py
Demo of the Local Business Assistant in action
- Improve error handling and user feedback
- Optimize database queries for faster responses
- Implement caching mechanism for frequent queries
- Generate RAFT dataset using LLaMA3
- Finetune the LLaMA3/local LLM on the new dataset created
- Replace Chroma DB retriever with BM25 from llama-index (package installation issue)
- Add unit tests for backend functions
- Integrate with more data sources for comprehensive information
- Implement a feedback system for users to rate responses
The content is also explained briefly in my blog post.