Skip to content

Biword index

Le Thu Nguyen edited this page Apr 18, 2018 · 10 revisions

Previous

Biword index

Describe

Biword considers every pair of a consecutive term in a document as phrase (Manning, 2009, p.76). Each of these biwords is a dictionary term.

Example

For example:

“Stanford University”

“Student Stanford"

"Singer Adela"

"Singing bowl"

Negative

Biword is not perfect with the phrase that contains more than two words. In this case, the compound of all biwords should be a good solution as an example of Boolean biword index.

How the index is different from the inverted index:

Using phrase queries of the biword index is fast indexing and less efficient query The inverted index is the list of words and the documents. It contains two main files of vocabulary and occurrences.

For example, using a list of words in Google search. Using inverted index is slower indexing and fast query

Circumstances the index would be used:

  • It could apply to web search.
  • It is the most search queries on a web search as many more queries are implicit phrase queries

Advantage that the index has over the inverted index:

  • Biword index is not the standard solution but It is easy and understood by users.
  • All biwords could be a part of the compound strategy with longer phrase queries such as Boolean byword queries
  • Optimise speed and performance in finding relevant documents for the search query.

Reference:

Manning, C.D., Raghaven, P., & Schütze, H. (2009). An Introduction to Information Retrieval (Online ed.). Cambridge, MA: Cambridge University Press. Available at http://nlp.stanford.edu/IR-book/information-retrieval-book.html

Previous | Next