Understanding SimilarityEvaluation and Similarity Threshold #577
-
I'm really confused about how the similarity evaluation function works and where the threshold takes effect. ContextIn my code, I initialize a semantic cache like so:
Where WrapEvaluation is just a wrap of the SearchDistanceEvaluation class for easy debugging (with max_distance set to 1):
According to the documentation on SearchDistanceEvaluation, when positive is set to False, that means that the bigger the distance that was calculated in the retrieval stage (search_result), the less similar the two queries. Example 1: Example 2: First Question:What does "rank1" mean? Second Question:To which value does the similarity threshold refer to? Third Question :When I set max_distance to 1 and similarity_threshold to 1, I don't get any hits. Why?? |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
This maximum distance actually depends on the VectorBase you set. For example, with the default faiss, the resulting search distance range should be 0-4. By default, the smaller the distance, the smaller the distance. At the same time, the range of similarity_threshold is 0-1. For example, if the similarity_threshold is now set to 0.6, the distance obtained by using faiss between the two vectors is now 0.3. At this time, the smaller the similarity distance, the more similar they are. So the similarity value we get is: 4-0.3=3.7, and the smallest similarity value is 0.6*4=2.4. Because 3.7 is greater than 2.4, we judge that the current cache value is valid. |
Beta Was this translation helpful? Give feedback.
This maximum distance actually depends on the VectorBase you set. For example, with the default faiss, the resulting search distance range should be 0-4. By default, the smaller the distance, the smaller the distance. At the same time, the range of similarity_threshold is 0-1.
For example, if the similarity_threshold is now set to 0.6, the distance obtained by using faiss between the two vectors is now 0.3. At this time, the smaller the similarity distance, the more similar they are. So the similarity value we get is: 4-0.3=3.7, and the smallest similarity value is 0.6*4=2.4. Because 3.7 is greater than 2.4, we judge that the current cache value is valid.