Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Raise diskann maximum dimension from 2K to 16K #181

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

tjgreen42
Copy link
Contributor

@tjgreen42 tjgreen42 commented Dec 12, 2024

This PR fixes #100 and raises the dimension limit for pgvectorscale's diskann index from 2000 to 16000, which is the maximum supported by the underlying pgvector vector type.

The previous limit of 2000 was needed to ensure that all data structures could be serialized onto single 8K pages. When going beyond 2000 dimensions, so long as SBQ is used for storage, quantized vectors, neighbor lists, and other data structures will still fit on a single page; the only thing that grows too large is SbqMeans. (The raw vectors used for reranking remain in the source relation, where standard Postgres TOAST machinery is used to read/write them). If plain storage is used, the old limit of 2000 remains in place.

To deal with SbqMeans, we introduce a ChainTape data structure that is similar to Tape but supports reads/writes of large buffers across pages. The chained representation is considered a property of the PageType, and we introduce a new PageType for SbqMeans along with upgrade machinery from the old version. Similarly to the versioned MetaPage, there are no unit tests for this, but I did ad-hoc testing to confirm that the upgrade path works.

@tjgreen42 tjgreen42 changed the title [WIP] Extend max dimensions beyond 2K Extend max dimensions beyond 2K Dec 12, 2024
@tjgreen42 tjgreen42 marked this pull request as ready for review December 12, 2024 23:02
@tjgreen42 tjgreen42 requested a review from a team as a code owner December 12, 2024 23:02
@tjgreen42 tjgreen42 requested review from cevian and syvb December 12, 2024 23:03
@tjgreen42 tjgreen42 changed the title Extend max dimensions beyond 2K Raise diskann maximum dimension from 2K to 16K Dec 12, 2024
pgvectorscale/src/util/chain.rs Outdated Show resolved Hide resolved
pgvectorscale/src/access_method/build.rs Outdated Show resolved Hide resolved
pgvectorscale/src/access_method/build.rs Outdated Show resolved Hide resolved
pgvectorscale/src/access_method/sbq.rs Show resolved Hide resolved
pgvectorscale/src/util/tape.rs Show resolved Hide resolved
@tjgreen42 tjgreen42 requested a review from syvb December 13, 2024 20:33
syvb
syvb previously approved these changes Dec 16, 2024
cevian
cevian previously approved these changes Dec 16, 2024
Copy link
Collaborator

@cevian cevian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

approved but left a few things to think about. Let me know if you decide to change anything and want another review.

pgvectorscale/src/util/chain.rs Show resolved Hide resolved
pgvectorscale/src/access_method/sbq.rs Show resolved Hide resolved
pgvectorscale/src/util/chain.rs Outdated Show resolved Hide resolved
pgvectorscale/src/util/chain.rs Outdated Show resolved Hide resolved
pgvectorscale/src/util/chain.rs Show resolved Hide resolved
@tjgreen42 tjgreen42 dismissed stale reviews from cevian and syvb via 362976c December 17, 2024 21:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Max dimensions supported by StreamingDiskANN
3 participants