RAG Medical Assistant

Built a RAG Medical Assistant with Fine-tuned Llama-3.1-8b model.

Features

Context-Aware Responses: Provides precise medical advice by integrating over 20+ medical resources through a RAG pipeline.
Efficient Document Retrieval: Utilizes LangChain and ChromaDB for optimized and contextually accurate document retrieval.
Fine-Tuned LLaMA 3.1 Model: Achieved superior performance with a fine-tuned LLaMA 3.1 8B model using LoRA techniques, achieving a 0.29 ROUGE score.
Optimized Training: Leveraged the Unsloth library for faster training and fine-tuning with 4-bit quantization, significantly reducing resource usage without compromising performance.
Model Deployment: Uploaded the optimized model to Hugging Face in GGUF format, enabling seamless integration and efficient inference.
Asynchronous Chat Interface: Built with FastAPI to ensure low-latency and seamless user interaction, reducing response time by 40%.

LLaMA 3.1 (8B) fine-tuned on medical coversational datasets using PEFT (LoRA) for domain-specific expertise.
Unsloth: Used for 2x faster fine-tuning and loading model directly in 4-bit, reducing memory and computational costs during training and inference.
```
https://github.com/unslothai/unsloth.git
```
Ollama: Used for model integration and serving.

LangChain: Enables integration of the LLaMA model with document retrieval capabilities and also implement context-aware responses.
ChromaDB: Stores and retrieves embeddings for efficient and accurate responses.

FastAPI: Provides a robust and asynchronous backend for a seamless chat interface.

Hugging Face: Used for model hosting and inference, including support for GGUF model format.

Clone the Repository

git clone https://github.com/SathvikNayak123/chatbot.git

Install Dependencies
```
pip install -r requirements.txt
```
Setup
- Populate the database with medical documents.
- Generate and store embeddings using the pre-trained LLaMA 3.1 model.
- Install Ollama & pull model from HuggingFace
```
ollama pull hf.co/sathvik123/llama3-ChatDoc
```
Run the Application
```
uvicorn app:app --reload
```