Skip to content

devanshkv/insight_docclean

Repository files navigation

DocClean

GitHub issues GitHub forks GitHub stars GitHub license HitCount codecov Python application Twitter Code style: black

Clean your document images with crumpled backgrounds, strains and folds with Deep Neural Networks.

Here is a demo:

You can find the slide deck accompanying this project here.

Installation

For installing docclean is easy. Just run the following:

git clone https://github.com/devanshkv/insight_docclean.git
cd insight_docclean
pip install -r requirements.txt
python3 setup.py install

Documentation

Have a look at our beauitful docs here.

Training the models

The models can be trained using train.py. The usage is as follows:

usage: train.py [-h] -t {cycle_gan,autoencoder} -k KAGGLE_DATA_DIR
                     [-c CLEAN_BOOKS_DIR]
                     [-d DIRTY_BOOKS_DIR] [-e EPOCHS]
                     [-b BATCH_SIZE] [-v]

Quick reference table

Short Long Default Description
-h --help show this help message and exit
-t --type None Which model to train
-k --kaggle_data_dir None Kaggle Data Directory
-c --clean_books_dir None Directory containing clean images
-d --dirty_books_dir None Directory containing dirty images
-e --epochs 100 Number of epochs to train for
-b --batch_size 16 Batch size
-v --verbose Be verbose

Running the inference

Using the trained model the infence can be run using infer.py. The usage is as follows:

usage: infer.py [-h] [-v] [-g GPU_ID] -c DATA_DIR [-b BATCH_SIZE] -t
               {cycle_gan,autoencoder} -w WEIGHTS

Quick reference table

Short Long Default Description
-h --help show this help message and exit
-v --verbose Be verbose
-g --gpu_id 0 GPU ID (use -1 for CPU)
-c --data_dir None Directory with candidate pngs.
-b --batch_size 32 Batch size for training data
-t --type None Which model to train
-w --weights None Model weights

Running the streamlit app

Run,

streamlit run app.py

and the use localhost:8501 to view the app.