-
Notifications
You must be signed in to change notification settings - Fork 5
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
f4e35a2
commit c06c38b
Showing
1 changed file
with
40 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,40 @@ | ||
# Refiners | ||
|
||
This README provides comprehensive information about the refiners used in the RePair project, categorized into _global_ and _local_ unsupervised refinement methods. | ||
These methods are crucial for generating gold-standard datasets for training supervised or semi-supervised query refinement techniques. | ||
[Global](#Global) methods operate solely on the original query, refining it without external context. | ||
In contrast, [Local](#Local) refiners take into account terms from the top-k retrieved documents obtained through an initial information retrieval process, such as _bm25_ or _qld_. | ||
The local approaches allow for the addition of similar or related terms to the original query, thereby enhancing the relevance and accuracy of the refined queries. | ||
|
||
# Global | ||
|
||
## tagme | ||
This method replaces the original query's terms with the title of their Wikipedia articles. | ||
|
||
## stemmers | ||
which utilize various lexical, syntactic, and semantic aspects of query terms and their relationships to reduce the terms to their roots, including krovetz, lovins, paiceHusk, porter, sremoval, trunc4, and trunc5, | ||
|
||
## semantic refiners | ||
which use an external linguistic knowledge-base including thesaurus, wordnet, and conceptnet, to extract related terms to the original query's terms, | ||
|
||
## sense-disambiguation | ||
which resolves the ambiguity of polysemous terms in the original query based on the surrounding terms and then adds the synonyms of the query terms as the related terms, | ||
|
||
## embedding-based methods | ||
which use pre-trained term embeddings from Glove and word2vec to find the most similar terms to the query terms, | ||
|
||
## anchor | ||
which is similar to embedding methods where the embeddings trained on Wikipedia articles' anchors, presuming an anchor is a concise summary of the content in the linked page, | ||
|
||
## wiki | ||
which uses the embeddings trained on Wikipedia's hierarchical categories to add the most similar concepts to each query term. | ||
|
||
# Local | ||
|
||
## relevance-feedback | ||
wherein important terms from the top-k retrieved documents are added to the original query based on metrics like tf-idf, | ||
clustering techniques including termluster, docluster, and conceptluster, where a graph clustering method like Louvain are employed on a graph whose nodes are the terms and edges are the terms' pairwise co-occurrence counts so that each cluster would comprise frequently co-occurring terms. | ||
Subsequently, to refine the original query, the related terms are chosen from the clusters to which the initial query terms belong. | ||
|
||
## bertqe | ||
which employs bert's contextualized word embeddings of terms in the top-k retrieved documents. |