Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added batch equivalent of computeQueryDocumentScore #1882

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

HAKSOAT
Copy link
Contributor

@HAKSOAT HAKSOAT commented May 21, 2022

Addresses #1484

@codecov-commenter
Copy link

codecov-commenter commented May 21, 2022

Codecov Report

Merging #1882 (3e06d73) into master (4430ac4) will increase coverage by 0.06%.
The diff coverage is 77.77%.

@@             Coverage Diff              @@
##             master    #1882      +/-   ##
============================================
+ Coverage     57.74%   57.80%   +0.06%     
- Complexity     1058     1065       +7     
============================================
  Files           179      179              
  Lines         10301    10328      +27     
  Branches       1417     1419       +2     
============================================
+ Hits           5948     5970      +22     
- Misses         3852     3857       +5     
  Partials        501      501              
Impacted Files Coverage Δ
.../main/java/io/anserini/index/IndexReaderUtils.java 76.89% <77.77%> (+0.10%) ⬆️
...a/io/anserini/analysis/DefaultEnglishAnalyzer.java 78.04% <0.00%> (+2.43%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4430ac4...3e06d73. Read the comment docs.

@HAKSOAT HAKSOAT changed the title Added batch equivalent of computeQueryDocumentScore (WIP) Added batch equivalent of computeQueryDocumentScore May 21, 2022
@HAKSOAT
Copy link
Contributor Author

HAKSOAT commented May 21, 2022

Hi @lintool I'd appreciate a review here when chanced.

Copy link
Member

@lintool lintool left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggesting a better impl.


Query filterQuery = new ConstantScoreQuery(new TermQuery(new Term(IndexArgs.ID, docid)));
BooleanQuery.Builder builder = new BooleanQuery.Builder();
builder.add(filterQuery, BooleanClause.Occur.MUST);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What you want to do is to move the docids here: In the non-batch impl, the filter clause restricts to a single docid. Here, in the batch impl, you want to restrict to a set of docids - i.e., add multiple sub-clauses in the filter query.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @lintool I took a look at this and tried testing with a set of documents from robust04. However, I came across the error: org.apache.lucene.search.BooleanQuery$TooManyClauses: maxClauseCount is set to 1024.

This resulted from doing:

    for (String docid: docids){
      // Setting default result value for all docids.
      results.put(docid, 0.0f);
      Query filterQuery = new ConstantScoreQuery(new TermQuery(new Term(IndexArgs.ID, docid)));
      builder.add(filterQuery, BooleanClause.Occur.SHOULD);
    }

What are your thoughts on this?

Am I doing the right thing? I tried this with tests and it works when the clause count is less than 1024.

@@ -20,6 +20,8 @@
import io.anserini.search.SearchArgs;
import io.anserini.search.query.BagOfWordsQueryGenerator;
import io.anserini.search.query.PhraseQueryGenerator;
import org.apache.logging.log4j.LogManager;
import org.apache.logging.log4j.Logger;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to @HAKSOAT: Remove these logger imports if not needed in the final implementation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants