Built a RAG Medical Assistant with Fine-tuned Llama-3.1-8b model.
- Context-Aware Responses: Provides precise medical advice by integrating over 20+ medical resources through a RAG pipeline.
- Efficient Document Retrieval: Utilizes LangChain and ChromaDB for optimized and contextually accurate document retrieval.
- Fine-Tuned LLaMA 3.1 Model: Achieved superior performance with a fine-tuned LLaMA 3.1 8B model using LoRA techniques, achieving a 0.29 ROUGE score.
- Optimized Training: Leveraged the Unsloth library for faster training and fine-tuning with 4-bit quantization, significantly reducing resource usage without compromising performance.
- Model Deployment: Uploaded the optimized model to Hugging Face in GGUF format, enabling seamless integration and efficient inference.
- Asynchronous Chat Interface: Built with FastAPI to ensure low-latency and seamless user interaction, reducing response time by 40%.
- LLaMA 3.1 (8B) fine-tuned on medical coversational datasets using PEFT (LoRA) for domain-specific expertise.
- Unsloth: Used for 2x faster fine-tuning and loading model directly in 4-bit, reducing memory and computational costs during training and inference.
https://github.com/unslothai/unsloth.git
- Ollama: Used for model integration and serving.
- LangChain: Enables integration of the LLaMA model with document retrieval capabilities and also implement context-aware responses.
- ChromaDB: Stores and retrieves embeddings for efficient and accurate responses.
- FastAPI: Provides a robust and asynchronous backend for a seamless chat interface.
- Hugging Face: Used for model hosting and inference, including support for GGUF model format.
-
Clone the Repository
git clone https://github.com/SathvikNayak123/chatbot.git
-
Install Dependencies
pip install -r requirements.txt
-
Setup
- Populate the database with medical documents.
- Generate and store embeddings using the pre-trained LLaMA 3.1 model.
- Install Ollama & pull model from HuggingFace
ollama pull hf.co/sathvik123/llama3-ChatDoc
-
Run the Application
uvicorn app:app --reload
- The Fine-tuned LLaMA3 model gave an 0.29 ROUGE score