Skip to content

Clean up duplicate files using their content hash πŸ“„πŸ•΅οΈπŸ“ƒ

License

Notifications You must be signed in to change notification settings

acorelli/hashclean

Repository files navigation

hashclean

⚠️ USE AT YOUR OWN RISK ⚠️

A utility to de-duplicate media and other files based on hashing the contents of the file rather than relying on its filename

Usage

  1. Create safe_words.txt & target_extensions.txt (can rename/copy the .example files)
  2. Install requirements: python -m pip install -r requirements.txt
  3. Run hash_comp.py [space-separated list of directories to process (recursive)]

python.exe hash_comp.py C:/users/<me>/Documents/Media C:/users/<me>/Media

  1. Wait for results... (suggest run overnight)
  2. Run clean_hash.py on the results file (hash_res.txt)

python.exe clean_hash.py ./hash_res.txt -y

- use `-y` to indicate you want to delete files  
- omit `-y` or add `-t` to run in `test` mode  

About

Clean up duplicate files using their content hash πŸ“„πŸ•΅οΈπŸ“ƒ

Topics

Resources

License

Stars

Watchers

Forks

Languages