Skip to content

BloomFilter is a space efficient storage of sets at the cost of a small overall error probability while maintaining scalability.

License

Notifications You must be signed in to change notification settings

azizkayumov/bloom-filters

Repository files navigation

BloomFilters

This project aims to test BloomFilters with k-slices.
Assume we have a giant set of English words, user enters any string to know if the string is an English word or not. It is important not to use a dictionary or set because of memory constraints.

Usage:

BloomFilters with k-slices:

  1. Clone this repo:
git clone https://github.com/AbduazizKayumov/BloomFilters.git
  1. Create venv:
python3 -m venv env
source env/bin/activate
  1. Install murmur hash and bitstring:
pip3 install --upgrade pip
pip3 install -r requierements.txt
  1. Run the example app (note that BloomFilter initialization may take some seconds):
python3 sample_bf.py
  1. Enter any string to detect if it is an English word or not:
BloomFilter initialized with: 
m =  533948
k =  9
N =  370104
PID =  7254
Enter any word: vladivostok
vladivostok is not an English word.

Enter any word: flowers
flowers may be an English word.
...

Optionally, run python3 sample_set.py (it uses Sets) to compare the memory usage with BloomFilters (both of them display their own PIDs, find them from memory usage monitoring app in your OS).

Scalable Bloom Filters implementation is in filters/sbf.py, run python3 sample_sbf.py to see how it scales for 370K English words.

English words are from this project.

About

BloomFilter is a space efficient storage of sets at the cost of a small overall error probability while maintaining scalability.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages