Skip to content

HaraldKi/approdictio

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

48 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

approdictio --- Approximate Dictionaries

a Java class library

Approdictio provides implementations of dictionaries that allow approximate lookup. When looking up a word, all words are returned that are approximately equal to the word given. The definition of approximately equal depends on the metric provided to define the distance between two words. An implementation of the Levensthein Metric with customizable edit costs is part of the package.

The slightly tricky part of approximate dictionary lookup is to find all similar words without explicit comparison of the word to look up with all words of the dictionary. Two implementations are provided that are reasonably fast:

BKTree

provides an implementation of a Burkhard-Keller Tree. In principle, this implementation can even be used for approximate lookup of other objects than strings, as long as a metric is provided.

###NgramDict

indexes all words of the dictionary by their n-grams. During lookup, candidate terms are retrieved with a crude n-gram similarity. Only the candidates are compared according to the metric provided. This seems generally to be faster than the BKTree, but has the disadvantage that even the Levensthein metric is not 100% compatible with the n-gram lookup. Consequently some similar terms may be missed.