-
Hi, would it be possible to add or delete documents to the indexed corpus and index with low time cost? |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 2 replies
-
At the moment, it is not possible to add or remove a document. The reason a document cannot be added is because the BM25 scores are computed based on the term-frequency and average document lengths, which are both based on the corpus when it is indexed. So modifying the corpus after the scores are computed (during indexing) means that all the scores need to be recomputed. Thus, it is the same as re-indexing a new corpus. A similar discussion can be found here: #5 Note that traditional bm25 implementations, like rank-bm25, support this - so I recommend checking it out if you specifically need dynamic add/remove. Otherwise, That said, rather than deleting a document, you could remove it from after the top-k selecting, by selecting k+#removed docs, removing the docs you don't want, and cutting off at k. |
Beta Was this translation helpful? Give feedback.
At the moment, it is not possible to add or remove a document. The reason a document cannot be added is because the BM25 scores are computed based on the term-frequency and average document lengths, which are both based on the corpus when it is indexed. So modifying the corpus after the scores are computed (during indexing) means that all the scores need to be recomputed. Thus, it is the same as re-indexing a new corpus.
A similar discussion can be found here: #5
Note that traditional bm25 implementations, like rank-bm25, support this - so I recommend checking it out if you specifically need dynamic add/remove. Otherwise,
bm25s
should be fairly fast for re-indexing for small documents (<5…