Perfect Hashing

1. Introduction
2. Universal Hashing
- 2.1 Thereom 1
- 2.2 Constructing a Universal Hash Family: The Matrix Method
3. Perfect Hashing
- 3.1 O(N²) - Space Solution
- 3.2 O(N) - Space Solution
4. Application
- 4.1 English Dictionary
- 4.2 Command Line Interface
5. Comparison
- 5.1 Number of rebuilds
- 5.2 Space

1. Introduction

Implementation of a perfect hashing data structure. We say a hash function is perfect for S if all lookups involve O(1) work. In section 2, background about universal hashing is provided. Sections 3 and 4 describe two methods for constructing perfect hash functions for a given set S. It's required to design, analyze and implement a perfect hash table as described in sections 3 and 4.

Universal Hashing

A probability distribution H over hash functions from U to {1, ..., M} is universal if for all x=y in U, we have
Pr[h(x) = h(y)] ≤ 1/M

2.1 Thereom 1

If H is universal, then for any set S ⊂ U, for any x ∈ U (that we might want to insert or lookup), for a random h taken from H, the expected number of collisions between x and other elements in S is at most N/M.

2.2 Constructing a Universal Hash Family: The Matrix Method

Let’s say keys are u-bits long. Say the table size M is power of 2, so an index is b-bits long with M = 2^b. What we’ll do is pick h to be a random b-by-u 0/1 matrix, and define h(x)=hx, where we do addition mod 2. For instance:

We can show that for x = y, Pr[h(x) = h(y)] = 1/M = 1/2^b

3. Perfect Hashing:

3.1 O(N²) - Space Solution

Say we are willing to have a table whose size is quadratic in the size N of our dictionary S. Then, here is an easy method. Let H be universal and M = N². Pick a random h from H and try it out, hashing everything in S. So, we just try it, and if we got any collisions, we just try a new h. On average, we will only need to do this twice.

3.2 O(N)-Space Solution

The main idea for this method is to use universal hash functions in a 2-level scheme. The method is as follows. We will first hash into a table of size N using universal hashing. This will produce some collisions. However, we will then rehash each bin using Method 1, squaring the size of the bin to get zero collisions. So, the way to think of this scheme is that we have a first-level hash function h and first-level table A, and then N second-level hash functions h1, ..., hN and N second-level tables A1, ..., AN. To look up an element x, we first compute i = h(x) and then find the element in Ai[hi(x)].

4 Application:

4.1 English Dictionary

As an application based on the perfect hashing implementation, it's required to implement a simple English dictionary supporting the following functionalities:

Initialize (constructor): Takes the name of the type of the backend perfect hashing as an input and creates a new empty dictionary based on it.
Insert: Takes a single string key and tries to insert it.
Delete: Takes a single string key and tries to delete it.
Search: Takes a single string key, searches for it, and returns true if it exists and false otherwise.
Batch insert: Takes a path to a text file containing multiple words each in a separate line. And tries to insert all that words into the dictionary.
Batch delete: Takes a path to a text file containing multiple words each in a separate line. And tries to delete all that words from the dictionary.

4.2 Command Line Interface

Implement a command line interface that will enable us to deal with the dictionary and apply all its implemented operations. This interface must take the type of hash table as an initial input then create a dictionary based on it and allow the user to apply subsequent operations on it from the following list:

Insert a string and prints a confirmation message or an error one if the the string already exists in the dictionary.
Delete a string and prints a confirmation message or an error one if the the string doesn’t exist in the dictionary.

5. Comparison

5.1 Number of rebuilds:

size	Hash-O(N)	Hash-O(N²)
10	1	0
100	41	0
1,000	468	0
10,000	3292	0
20,000	6388	0

5.2 Space:

size	Hash-O(N)	Hash-O(N²)
10	14	128
100	185	16,384
1,000	1,965	1,048,576
10,000	16,288	134,217,728
20,000	33,349	536,870,912
30,000	57,782	1,073,741,824

Name		Name	Last commit message	Last commit date
Latest commit History 49 Commits
.idea		.idea
src		src
.gitignore		.gitignore
Perfect-Hashing.iml		Perfect-Hashing.iml
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Perfect Hashing

Contents:

1. Introduction

Universal Hashing

2.1 Thereom 1

2.2 Constructing a Universal Hash Family: The Matrix Method

3. Perfect Hashing:

3.1 O(N²) - Space Solution

3.2 O(N)-Space Solution

4 Application:

4.1 English Dictionary

4.2 Command Line Interface

5. Comparison

5.1 Number of rebuilds:

5.2 Space:

About

Releases

Packages

Contributors 4

Languages

DSA-2-labs/Perfect-Hashing

Folders and files

Latest commit

History

Repository files navigation

Perfect Hashing

Contents:

1. Introduction

Universal Hashing

2.1 Thereom 1

2.2 Constructing a Universal Hash Family: The Matrix Method

3. Perfect Hashing:

3.1 O(N2) - Space Solution

3.2 O(N)-Space Solution

4 Application:

4.1 English Dictionary

4.2 Command Line Interface

5. Comparison

5.1 Number of rebuilds:

5.2 Space:

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

3.1 O(N²) - Space Solution

Packages