-
Notifications
You must be signed in to change notification settings - Fork 461
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added batch equivalent of computeQueryDocumentScore #1882
base: master
Are you sure you want to change the base?
Conversation
Codecov Report
@@ Coverage Diff @@
## master #1882 +/- ##
============================================
+ Coverage 57.74% 57.80% +0.06%
- Complexity 1058 1065 +7
============================================
Files 179 179
Lines 10301 10328 +27
Branches 1417 1419 +2
============================================
+ Hits 5948 5970 +22
- Misses 3852 3857 +5
Partials 501 501
Continue to review full report at Codecov.
|
Hi @lintool I'd appreciate a review here when chanced. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggesting a better impl.
|
||
Query filterQuery = new ConstantScoreQuery(new TermQuery(new Term(IndexArgs.ID, docid))); | ||
BooleanQuery.Builder builder = new BooleanQuery.Builder(); | ||
builder.add(filterQuery, BooleanClause.Occur.MUST); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What you want to do is to move the docids here: In the non-batch impl, the filter clause restricts to a single docid. Here, in the batch impl, you want to restrict to a set of docids - i.e., add multiple sub-clauses in the filter query.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @lintool I took a look at this and tried testing with a set of documents from robust04
. However, I came across the error: org.apache.lucene.search.BooleanQuery$TooManyClauses: maxClauseCount is set to 1024
.
This resulted from doing:
for (String docid: docids){
// Setting default result value for all docids.
results.put(docid, 0.0f);
Query filterQuery = new ConstantScoreQuery(new TermQuery(new Term(IndexArgs.ID, docid)));
builder.add(filterQuery, BooleanClause.Occur.SHOULD);
}
What are your thoughts on this?
Am I doing the right thing? I tried this with tests and it works when the clause count is less than 1024.
@@ -20,6 +20,8 @@ | |||
import io.anserini.search.SearchArgs; | |||
import io.anserini.search.query.BagOfWordsQueryGenerator; | |||
import io.anserini.search.query.PhraseQueryGenerator; | |||
import org.apache.logging.log4j.LogManager; | |||
import org.apache.logging.log4j.Logger; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note to @HAKSOAT: Remove these logger imports if not needed in the final implementation.
Addresses #1484