You're developing a new AI-driven RAG application, but the process is chaotic. There are too many priorities and not enough time to tackle them all. Even if you could, you're not sure how to enhance the system. You sense that there's a "right path" – a set of steps that would lead to maximum growth in the shortest time. However, every workday feels like a gamble, and you're just hoping you're moving in the right direction.
As I mention in my Substack article, the key difference between success and failure isn't technical skills but the frameworks for making decisions and allocating resources. It's about knowing what's worth your time, how to prioritize, what trade-offs to make, and which metrics to focus on or ignore. This is why observability is so valuable. It gives you the insight needed to understand what's happening within your system, helping you identify issues and optimize performance. So when starting any RAG system, you need to capture valuable metrics like cosine similarity and reranker scores for every retrieval, right from the start. This repo has everything you need to get started with RAG with a focus on valuable observability metrics that you should store and use in future decision-making and resource allocation.
python3 --version
brew install python
python3 -m venv fastapi-env
source fastapi-env/bin/activate
pip3 install -r requirements.txt
cp .env.example .env
nano .env
You'll need a turbopuffer api key, an openai api key, and a cohere api key.
uvicorn main:app --reload
Testing out uploads (/routers/upload.py)
Go here: http://127.0.0.1:8000/docs#/default/upload_file_upload_post
Upload a file + specify a namespace
Testing out retrievals (/routers/retrieve.py)
Go here: http://127.0.0.1:8000/docs#/retrieve/get_context_retrieve_post
- Extract all text from file via PyMuPDF (or other library for other file types)
- Chunk text up to 512 tokens without splitting sentences
- Convert each chunk to a vector embedding via OpenAI text-embedding-3-small
- Upsert vectors + chunks to vectordb namespace w/ unique vectorIDs
- Run KMeans clustering on chunks to identify key topics
- Sample centermost chunk from each cluster (average cluster meaning) to create an array of cluster summary chunks + store to their vectorIDs
- Use gpt-3.5 turbo to generate a comprehensive summary of the cluster summary chunks
- Convert query to embedding
- Get top 10 relevant chunks via vectordb + store cosine similarity scores
- Rerank chunks + store reranker score