embeddings_playground.py
is a single-page streamlit app for experimenting with OpenAI embeddings.
Before running, install required dependencies with:
pip install -r examples/apps/embeddings_playground/requirements.txt
(You may need to change the path to match your local path.)
Verify installation of streamlit with streamlit hello
.
Run the script with:
streamlit run examples/apps/embeddings_playground.py
(Again, you may need to change the path to match your local path.)
In the app, first select your choice of:
- distance metric (we recommend cosine)
- embedding model (we recommend
text-embedding-ada-002
for most use cases, as of May 2023)
Then, enter a variable number of strings to compare. Click rank
to see:
- the ranked list of strings, sorted by distance from the first string
- a heatmap showing the distance between each pair of strings
Here's an example distance matrix for 8 example strings related to The sky is blue
:
From these distance pairs, you can see:
- embeddings measure topical similarity more than logical similarity (e.g.,
The sky is blue
is very close toThe sky is not blue
) - punctuation affects embeddings (e.g.,
"THE. SKY. IS. BLUE!"
is only third closest toThe sky is blue
) - within-language pairs are stronger than across-language pairs (e.g.,
El cielo as azul
is closer toEl cielo es rojo
than toThe sky is blue
)
Experiment with your own strings to see what you can learn.