Auto-scan docs and send them

Description

Auto-scan is a tool to automatically scan documents from HP, OCR files, optimize it and finally send it by email as attachment.

Architectures

Based on python 3.6 slim buster it supports

✓ linux/386
✓ linux/amd64
✓ linux/arm/v5
✓ linux/arm/v7
✓ linux/arm64/v8
✓ linux/ppc64le
✓ linux/s390x

Prerequisites

In order to work properly the container provided here needs access to DBus. Actual shortcut is to run it as privileged Docker container and map host socket to gain access to DBus (see launch script below).

How to get started

Git clone actual repo:

git clone https://github.com/matbgn/auto-scan.git

Git Shallow Clone the scan-pdf repo adapted for HP scanners:

cd auto-scan
git clone --depth=1 https://github.com/matbgn/scan-pdf.git

Build docker image locally or run it in dev mode (see below):

docker build -t matbgn/auto_scan .

Build for other platforms

source: https://medium.com/@artur.klauser/building-multi-architecture-docker-images-with-buildx-27d80f7e2408

Make sure you have a sufficient kernel on your Linux machine >= 4.8
```
$ uname -r
4.15.0
```

Make sure your docker version is greater than 19.03

$ docker --version
Docker version 19.03.5, build 633a0ea83

Make sure buildx is activated (if not run export DOCKER_CLI_EXPERIMENTAL=enabled)

$ docker buildx version
github.com/docker/buildx v0.6.3-docker 266c0eac611d64fcc0c72d80206aa364e826758d

Use a docker image based installation to have QEMU and binfmt-support up to date
```
$ docker run --rm --privileged multiarch/qemu-user-static --reset -p yes
```

ℹ️	Pay attention to the fact that you’ll need to re-run that docker image after every system reboot.

Create a buildx builder

$ docker buildx create --name mybuilder
mybuilder
$ docker buildx use mybuilder

You can check your newly created mybuilder with:

$ docker buildx ls
NAME/NODE    DRIVER/ENDPOINT             STATUS  PLATFORMS
mybuilder *  docker-container
  mybuilder0 unix:///var/run/docker.sock running linux/amd64, linux/arm64, linux/riscv64, linux/ppc64le, linux/s390x, linux/386, linux/mips64le, linux/mips64, linux/arm/v7, linux/arm/v6
default      docker
  default    default                     running linux/amd64, linux/386, linux/arm64, linux/riscv64, linux/ppc64le, linux/s390x, linux/arm/v7, linux/arm/v6

⚠️	If it only reports support for linux/amd64 and linux/386 you either still haven’t met all software requirements, or you had created a builder before you have met the software requirements. In the latter case remove it with docker buildx rm and recreate it.

Finally build it locally in a tar to be loaded later on the target machine
```
docker buildx build -t matbgn/auto_scan --platform linux/arm64 -o type=docker,dest=- . > auto_scan_arm64.tar
```
1. To load an image on the target machine, simply run
  docker load < auto_scan_arm64.tar
  1. Then double-check the list of your pulled images with docker images

Usage

Printer address in following format:

PRINTER_ADDRESS="hpaio:/net/OfficeJet_Pro_7740_series?ip=192.168.14.113"

Scanning mode supported:

# ADF|Duplex|Flatbed
SCAN_MODE=ADF

Run multiple batches and merge them in one single PDF:

# This argument is optionnal
BATCH_TOTAL=2

Change paper format scanning (A3, A4, A5 supported):

# This argument is optionnal (A4 will be selected as default)
PAPER_FORMAT="A3"

Run script:

sudo docker run -it --rm --net=host --privileged \
-v /var/run/dbus/system_bus_socket:/var/run/dbus/system_bus_socket \
-e SMTPSERVER_EMAIL=your_email@gmail.com \
-e SMTPSERVER_PASS=your_gmail_app_secret \
-e EMAIL_RECIPIENTS="email1@gmail.com;email2@gmail.com" \
-e SCAN_MODE=ADF \
-e SUBJECT="Test scan" \
-e BATCH_TOTAL=1 \
-e PAPER_FORMAT="A4" \
-e PRINTER_ADDRESS="hpaio:/net/OfficeJet_Pro_7740_series?ip=192.168.14.113" \
matbgn/auto_scan

Development mode

Be sure to have those packages installed:

sudo apt-get install -y sane-utils libsane-hpaio \
imagemagick ocrmypdf \
tesseract-ocr-fra tesseract-ocr-deu

(Run venv &) Install requirements with:
```
pip install -r requirements.txt
```

Grab scan_pdf depedency

cd auto-scan
git clone --depth=1 https://github.com/matbgn/scan-pdf.git
cd ..

Put your variables in a .env file with below possibilities:

SMTPSERVER_EMAIL=your_email@gmail.com
SMTPSERVER_PASS=your_gmail_app_secret
SMTPSERVER_HOST=smtp.gmail.com
EMAIL_RECIPIENTS=""email1@gmail.com;email2@gmail.com"
SCAN_MODE=Flatbed
SUBJECT="Test scan"
BATCH_TOTAL=1
PAPER_FORMAT="A4"
PRINTER_ADDRESS="hpaio:/net/OfficeJet_Pro_7740_series?ip=192.168.14.113"
PAPERLESS_LOCATION="~/paperless/consume"

Then run script directly with:
```
python ./main.py
```

In case of error

In case of following error proceed as below:

⚠️	Error during converting jpg to pdf convert-im6.q16: attempt to perform an operation not allowed by the security policy `PDF' @ error/constitute.c/IsCoderAuthorized/408.

As a temporary fix, edit /etc/ImageMagick-6/policy.xml and change the PDF rights down in the document from none to read|write there. Or simply run:

sed -i 's/rights="none" pattern="PDF"/rights="read|write" pattern="PDF"/' /etc/ImageMagick-6/policy.xml

Not sure about the implications, but at least it allows getting things done.

Deploy on a server

Follow the first two points of the above development mode on your server (or just locally).

Then simply run (on any port wanted):

waitress-serve --port=4040 app:app &

As a service:

See https://www.devdungeon.com/content/creating-systemd-service-files

# /etc/systemd/system/auto_scan.service
[Unit]
Description=Auto scan server
After=network.target

[Service]
Type=simple
User=USER_TO_BE_USED
WorkingDirectory=/home/USER_TO_BE_USED/auto-scan
ExecStart=/path/to/venv/bin/waitress-serve --port=4040 app:app
Restart=always

[Install]
WantedBy=multi-user.target

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.adoc

README.adoc

Auto-scan docs and send them

Description

Architectures

Prerequisites

How to get started

Build for other platforms

Usage

Development mode

In case of error

Deploy on a server

Files

README.adoc

Latest commit

History

README.adoc

File metadata and controls

Auto-scan docs and send them

Description

Architectures

Prerequisites

How to get started

Build for other platforms

Usage

Development mode

In case of error

Deploy on a server