Prerequisites:

You're developing a new AI-driven RAG application, but the process is chaotic. There are too many priorities and not enough time to tackle them all. Even if you could, you're not sure how to enhance the system. You sense that there's a "right path" – a set of steps that would lead to maximum growth in the shortest time. However, every workday feels like a gamble, and you're just hoping you're moving in the right direction.

As I mention in my Substack article, the key difference between success and failure isn't technical skills but the frameworks for making decisions and allocating resources. It's about knowing what's worth your time, how to prioritize, what trade-offs to make, and which metrics to focus on or ignore. This is why observability is so valuable. It gives you the insight needed to understand what's happening within your system, helping you identify issues and optimize performance. So when starting any RAG system, you need to capture valuable metrics like cosine similarity and reranker scores for every retrieval, right from the start. This repo has everything you need to get started with RAG with a focus on valuable observability metrics that you should store and use in future decision-making and resource allocation.

Prerequisites:

python3 --version

brew install python

python3 -m venv fastapi-env

Install dependencies

source fastapi-env/bin/activate

pip3 install -r requirements.txt

Set up secret API keys in `.env` file

cp .env.example .env

nano .env

You'll need a turbopuffer api key, an openai api key, and a cohere api key.

Start server

uvicorn main:app --reload

Testing out uploads (/routers/upload.py)

Go here: http://127.0.0.1:8000/docs#/default/upload_file_upload_post

Upload a file + specify a namespace

Testing out retrievals (/routers/retrieve.py)

Go here: http://127.0.0.1:8000/docs#/retrieve/get_context_retrieve_post

Under the hood

Upload API

Extract all text from file via PyMuPDF (or other library for other file types)
Chunk text up to 512 tokens without splitting sentences
Convert each chunk to a vector embedding via OpenAI text-embedding-3-small
Upsert vectors + chunks to vectordb namespace w/ unique vectorIDs
Run KMeans clustering on chunks to identify key topics
Sample centermost chunk from each cluster (average cluster meaning) to create an array of cluster summary chunks + store to their vectorIDs
Use gpt-3.5 turbo to generate a comprehensive summary of the cluster summary chunks

Retrieval API

Convert query to embedding
Get top 10 relevant chunks via vectordb + store cosine similarity scores
Rerank chunks + store reranker score

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
routers		routers
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Prerequisites:

Install dependencies

Set up secret API keys in `.env` file

Start server

Testing out uploads (/routers/upload.py)

Testing out retrievals (/routers/retrieve.py)

Under the hood

Upload API

Retrieval API

About

Releases

Packages

Contributors 2

Languages

pashpashpash/python-rag-scaffold

Folders and files

Latest commit

History

Repository files navigation

Prerequisites:

Install dependencies

Set up secret API keys in .env file

Start server

Testing out uploads (/routers/upload.py)

Testing out retrievals (/routers/retrieve.py)

Under the hood

Upload API

Retrieval API

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Set up secret API keys in `.env` file

Packages