Retrieval Augmented Generation (RAG) is one of the most popular ways to increase the accuracy of Large Language Models (LLMs) and reduce hallucinations. However, even with RAG-based systems, LLMs are prone to many issues. Understanding these issues with standardized benchmarks is important in order to improve a model or the documents in RAG. LightHouz AI allows you to evaluate your LLM across 6 benchmark categories: Hallucination Tests, Out of Context, Prompt Injection, PII Leak, Toxicity, and Bias. Lighthouz AutoBench automatically generates benchmarks to evaluate your RAG Application based on the documents you upload. It also facilitates AutoEvals of those benchmarks comparing the expected result of a query to the actual response. You can also compare multiple LLMs on the same benchmark to see which performs better.
This demo allows you to run a RAG Chatbot in a Streamlit interface and evaluate the chatbot using LightHouz AI.
- LangChain - Pre-processes and formats text data, making it suitable for embedding generation
- OpenAI Embeddings - Generates vectorized forms of the documents.
- SingleStoreDB - Stores the prepared embeddings in a vector database.
- LLM of Your Choice (GPT-4, Gemini Pro) - Takes in user prompt + context from retrieval and generates output.
- LighthouzAI - Evaluates RAG chatbot responses.
- Install the
requrements.txt
- Set up environment variables for Google Gemini and OpenAI API key. (
export OPENAI_API_KEY='your-api-key-here'
). - Add your SingleStoreDB database URL to line 26 to establish the database connection.
- Replace the PDF document on line 15 with any document of your choice for RAG (or keep this one to test it out!)
- Add your LightHouz API Key on line 30 of
main.py
:LH = Lighthouz("LH-XRjjxBxtYjXPQqwpPJ0WyHcc0tjBx6vy")
. - Generate a new benchmark for your RAG app on the LightHouz AutoBench Dashboard. Enter the
benchmark_id
on line 31. - Create new apps in the LightHouz Dashboard for
gpt-4
andgemini-pro
. Enter theapp_id
s onto lines 32-33. - That's it! You're ready to use your chatbot and evaluations!
streamlit run main.py
- Slides From this Demo: https://docs.google.com/presentation/d/1JG57ZVd0_zKhzM6SkKtes72SrMsKh_wCd_O6vCfuDtw/edit?usp=sharing
- LightHouz Documentation: https://lighthouz.ai/docs/