Binary quantization #82

irevoire · 2024-07-04T12:21:34Z

Related issue

Fixes #69

What does this PR do?

Introduce a new UnalignedVector generic type to replace the UnalignedF32Slice type we were using before
- It's parametrized by a Codec
- Supports the unaligned f32 (equivalent to the old UnalignedF32Slice type)
- Supports the binary quantized slice by converting each f32 to a single 0 or 1 depending on if their value was negative or positive
- It can convert the binary quantized slice to Vec<f32> quickly using SIMD
Provide new distance trait to binary quantized the Euclidean, Manhattan and Angular distances, respectively named BinaryQuantizedEuclidean, BinaryQuantizedManhattan and BinaryQuantizedAngular
To keep the relevancy good while binary quantizing, we had to re-develop two_means « un-binary quantize » the vectors before searching for the centroid

Warning

Below is the initial investigation of the different method we could use to make the binary quantization works without losing too much relevancy.

First version

Just binary quantize every operation.

Here's the measured relevancy:

Second version to improve relevancy

One issue we found out is that when binary quantizing vectors, we basically end up creating a bunch of clusters in two dimensions. It would look like this:

All vectors will end up on one of the four edges of the square.

The more dimensions we have, the more clusters we'll get.
The number of clusters is 2^nb_dimensions.

By making the internal computation of two_means not use the binary quantized vectors but instead use the real vectors, here are the results:

Warning

It’s actually worse than initially

Third idea to improve relevancy

In the second solution I was computing the two_means loop with non binary-quantized distances which greatly improved the relevancy.
But then the output of two_means was binary quantized again.
We should try to compute the normal on non binary quantized distance as well and then binary quantize this vector right before storing it in the DB.

Note

This improved the relevancy by almost 10 points of recall in the worst case over the previous best solution.

With the bits being [0:1] the relevancy is terrible

Fourth solution to improve relevancy

Store the non binary quantized distance in the SplitNode in the database directly.
This increase by a lot the size of the database (still way less than the non-binary-quantized distance though).
The search may become slower as well.

The results are bad and I can't explain why:

I made another branch in case we want to investigate further: #84

In conclusion

The best version is the third one.

The next steps are:

Optimize all the binary quantized version
Merge + make a release
Merge in meilisearch + find a way to change the distances => We'll talk about that with louis in two weeks after our vacations

Next step in Arroy:

Add the size of the DB in the relevancy benchmarks
Overfetch search results (between x3 and x6)
Compare ourselves to qdrant
Optimize performances

…account

…nsure it works

src/unaligned_vector/binary_quantized.rs

src/unaligned_vector/mod.rs

src/distance/binary_quantized_euclidean.rs

src/writer.rs

src/unaligned_vector/mod.rs

src/unaligned_vector/f32.rs

… from the distance trait

… same way to is_positive under x86 and aarch64

src/node.rs

src/unaligned_vector/binary_quantized.rs

examples/relevancy.rs

…cits

Kerollmops

It looks perfect to me 👌 That is a wonderful job that you've done here @irevoire 👏
Thank you!

Kerollmops and others added 5 commits July 1, 2024 14:42

First draft

3659904

prepare for the new distance trait

692f1ef

re-implements display for all kind of nodes taking the distance into …

27ed4d4

…account

everything compiles and the api is clean, need work on the tests to e…

3315465

…nsure it works

wip

a27a238