Is it possible to pass a pre-computed TF-IDF matrix? #13
-
Is it possible to pass a pre-computed TF-IDF matrix (with the shape |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 1 reply
-
Thanks for the question! Unfortunately the BM25 algorithm differ slightly from tf-idf, so the scores you will have are different. If you absolutely want to pass something, you would need to modify the Here's how it is currently being built behind the scene: https://github.com/xhluca/bm25s/blob/35036613340e2511790213a6fb988e573b1936e6/bm25s/__init__.py#L255-L270C16 |
Beta Was this translation helpful? Give feedback.
-
Ok. However, I think of TF-IDF as a representation of a collection of documents, commonly realized as a sparse matrix with shape (documents, vocabulary_size). Since the BM25 score function uses terms such as:
it could also be calculated from a TF-IDF matrix. The advantage is being able to enhance/filter the TF-IDF matrix before calculating the BM25 score. |
Beta Was this translation helpful? Give feedback.
Thanks for the question! Unfortunately the BM25 algorithm differ slightly from tf-idf, so the scores you will have are different. If you absolutely want to pass something, you would need to modify the
indptr
,indices
, anddata
keys ofobj.scores
whereobj
is yourbm25s.BM25
instance.Here's how it is currently being built behind the scene: https://github.com/xhluca/bm25s/blob/35036613340e2511790213a6fb988e573b1936e6/bm25s/__init__.py#L255-L270C16