Skip to content

Python C extension to split a file by variable block sizes using a Rabin fingerprint.

License

Notifications You must be signed in to change notification settings

cschwede/python-rabin-fingerprint

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

37 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

python-rabin-fingerprint

Build Status

Python C extension to find chunk boundaries using a Rabin-Karp rolling hash. This is useful to slice data into variable sized chunks based on the content. If a file is changed the modified chunk (and maybe the next one) is affected, but not the following chunks. This makes it useful to apply data deduplication before sending data over slow connections or storing multiple similar files (like backups using tar snapshots).

Have a look at http://en.wikipedia.org/wiki/Rolling_hash for an introduction to the Rabin-Karp rolling hash.

Installation

Installation requires a working GCC compiler and Python development libraries.

git clone git://github.com/cschwede/python-rabin-fingerprint.git
cd python-rabin-fingerprint
sudo python setup.py install

About

Python C extension to split a file by variable block sizes using a Rabin fingerprint.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published