Create README.md

fani-lab · Jul 29, 2024 · c06c38b · c06c38b
1 parent f4e35a2
commit c06c38b
Showing 1 changed file with 40 additions and 0 deletions.
diff --git a/src/refinement/refiners/README.md b/src/refinement/refiners/README.md
@@ -0,0 +1,40 @@
+# Refiners
+
+This README provides comprehensive information about the refiners used in the RePair project, categorized into _global_ and _local_ unsupervised refinement methods. 
+These methods are crucial for generating gold-standard datasets for training supervised or semi-supervised query refinement techniques. 
+[Global](#Global) methods operate solely on the original query, refining it without external context. 
+In contrast, [Local](#Local) refiners take into account terms from the top-k retrieved documents obtained through an initial information retrieval process, such as _bm25_ or _qld_. 
+The local approaches allow for the addition of similar or related terms to the original query, thereby enhancing the relevance and accuracy of the refined queries.
+
+# Global
+
+## tagme
+This method replaces the original query's terms with the title of their Wikipedia articles.
+
+## stemmers
+which utilize various lexical, syntactic, and semantic aspects of query terms and their relationships to reduce the terms to their roots, including krovetz, lovins, paiceHusk, porter, sremoval, trunc4, and trunc5,
+
+## semantic refiners
+which use an external linguistic knowledge-base including thesaurus, wordnet, and conceptnet, to extract related terms to the original query's terms,
+
+## sense-disambiguation
+which resolves the ambiguity of polysemous terms in the original query based on the surrounding terms and then adds the synonyms of the query terms as the related terms, 
+
+## embedding-based methods
+which use pre-trained term embeddings from Glove and word2vec to find the most similar terms to the query terms,
+
+## anchor
+which is similar to embedding methods where the embeddings trained on Wikipedia articles' anchors, presuming an anchor is a concise summary of the content in the linked page,
+
+## wiki
+which uses the embeddings trained on Wikipedia's hierarchical categories to add the most similar concepts to each query term.
+
+# Local
+
+## relevance-feedback
+wherein important terms from the top-k retrieved documents are added to the original query based on metrics like tf-idf,
+clustering techniques including termluster, docluster, and conceptluster, where a graph clustering method like Louvain are employed on a graph whose nodes are the terms and edges are the terms' pairwise co-occurrence counts so that each cluster would comprise frequently co-occurring terms. 
+Subsequently, to refine the original query, the related terms are chosen from the clusters to which the initial query terms belong. 
+
+## bertqe
+which employs bert's contextualized word embeddings of terms in the top-k retrieved documents.