diff --git a/src/refinement/refiners/README.md b/src/refinement/refiners/README.md new file mode 100644 index 0000000..8ce2527 --- /dev/null +++ b/src/refinement/refiners/README.md @@ -0,0 +1,40 @@ +# Refiners + +This README provides comprehensive information about the refiners used in the RePair project, categorized into _global_ and _local_ unsupervised refinement methods. +These methods are crucial for generating gold-standard datasets for training supervised or semi-supervised query refinement techniques. +[Global](#Global) methods operate solely on the original query, refining it without external context. +In contrast, [Local](#Local) refiners take into account terms from the top-k retrieved documents obtained through an initial information retrieval process, such as _bm25_ or _qld_. +The local approaches allow for the addition of similar or related terms to the original query, thereby enhancing the relevance and accuracy of the refined queries. + +# Global + +## tagme +This method replaces the original query's terms with the title of their Wikipedia articles. + +## stemmers +which utilize various lexical, syntactic, and semantic aspects of query terms and their relationships to reduce the terms to their roots, including krovetz, lovins, paiceHusk, porter, sremoval, trunc4, and trunc5, + +## semantic refiners +which use an external linguistic knowledge-base including thesaurus, wordnet, and conceptnet, to extract related terms to the original query's terms, + +## sense-disambiguation +which resolves the ambiguity of polysemous terms in the original query based on the surrounding terms and then adds the synonyms of the query terms as the related terms, + +## embedding-based methods +which use pre-trained term embeddings from Glove and word2vec to find the most similar terms to the query terms, + +## anchor +which is similar to embedding methods where the embeddings trained on Wikipedia articles' anchors, presuming an anchor is a concise summary of the content in the linked page, + +## wiki +which uses the embeddings trained on Wikipedia's hierarchical categories to add the most similar concepts to each query term. + +# Local + +## relevance-feedback +wherein important terms from the top-k retrieved documents are added to the original query based on metrics like tf-idf, +clustering techniques including termluster, docluster, and conceptluster, where a graph clustering method like Louvain are employed on a graph whose nodes are the terms and edges are the terms' pairwise co-occurrence counts so that each cluster would comprise frequently co-occurring terms. +Subsequently, to refine the original query, the related terms are chosen from the clusters to which the initial query terms belong. + +## bertqe +which employs bert's contextualized word embeddings of terms in the top-k retrieved documents.