Skip to content

Latest commit

 

History

History
222 lines (160 loc) · 6.23 KB

README.adoc

File metadata and controls

222 lines (160 loc) · 6.23 KB

Auto-scan docs and send them

Description

Auto-scan is a tool to automatically scan documents from HP, OCR files, optimize it and finally send it by email as attachment.

Architectures

Based on python 3.6 slim buster it supports

  • linux/386

  • linux/amd64

  • linux/arm/v5

  • linux/arm/v7

  • linux/arm64/v8

  • linux/ppc64le

  • linux/s390x

Prerequisites

In order to work properly the container provided here needs access to DBus. Actual shortcut is to run it as privileged Docker container and map host socket to gain access to DBus (see launch script below).

How to get started

Git clone actual repo:

git clone https://github.com/matbgn/auto-scan.git

Git Shallow Clone the scan-pdf repo adapted for HP scanners:

cd auto-scan
git clone --depth=1 https://github.com/matbgn/scan-pdf.git

Build docker image locally or run it in dev mode (see below):

docker build -t matbgn/auto_scan .

Build for other platforms

  1. Make sure you have a sufficient kernel on your Linux machine >= 4.8

    $ uname -r
    4.15.0
  2. Make sure your docker version is greater than 19.03

    $ docker --version
    Docker version 19.03.5, build 633a0ea83
  3. Make sure buildx is activated (if not run export DOCKER_CLI_EXPERIMENTAL=enabled)

    $ docker buildx version
    github.com/docker/buildx v0.6.3-docker 266c0eac611d64fcc0c72d80206aa364e826758d
  4. Use a docker image based installation to have QEMU and binfmt-support up to date

    $ docker run --rm --privileged multiarch/qemu-user-static --reset -p yes
ℹ️
Pay attention to the fact that you’ll need to re-run that docker image after every system reboot.
  1. Create a buildx builder

    $ docker buildx create --name mybuilder
    mybuilder
    $ docker buildx use mybuilder
    1. You can check your newly created mybuilder with:

      $ docker buildx ls
      NAME/NODE    DRIVER/ENDPOINT             STATUS  PLATFORMS
      mybuilder *  docker-container
        mybuilder0 unix:///var/run/docker.sock running linux/amd64, linux/arm64, linux/riscv64, linux/ppc64le, linux/s390x, linux/386, linux/mips64le, linux/mips64, linux/arm/v7, linux/arm/v6
      default      docker
        default    default                     running linux/amd64, linux/386, linux/arm64, linux/riscv64, linux/ppc64le, linux/s390x, linux/arm/v7, linux/arm/v6
⚠️
If it only reports support for linux/amd64 and linux/386 you either still haven’t met all software requirements, or you had created a builder before you have met the software requirements. In the latter case remove it with docker buildx rm and recreate it.
  1. Finally build it locally in a tar to be loaded later on the target machine

    docker buildx build -t matbgn/auto_scan --platform linux/arm64 -o type=docker,dest=- . > auto_scan_arm64.tar
    1. To load an image on the target machine, simply run

      docker load < auto_scan_arm64.tar
      1. Then double-check the list of your pulled images with docker images

Usage

Printer address in following format:

PRINTER_ADDRESS="hpaio:/net/OfficeJet_Pro_7740_series?ip=192.168.14.113"

Scanning mode supported:

# ADF|Duplex|Flatbed
SCAN_MODE=ADF

Run multiple batches and merge them in one single PDF:

# This argument is optionnal
BATCH_TOTAL=2

Change paper format scanning (A3, A4, A5 supported):

# This argument is optionnal (A4 will be selected as default)
PAPER_FORMAT="A3"

Run script:

sudo docker run -it --rm --net=host --privileged \
-v /var/run/dbus/system_bus_socket:/var/run/dbus/system_bus_socket \
-e SMTPSERVER_EMAIL=your_email@gmail.com \
-e SMTPSERVER_PASS=your_gmail_app_secret \
-e EMAIL_RECIPIENTS="email1@gmail.com;email2@gmail.com" \
-e SCAN_MODE=ADF \
-e SUBJECT="Test scan" \
-e BATCH_TOTAL=1 \
-e PAPER_FORMAT="A4" \
-e PRINTER_ADDRESS="hpaio:/net/OfficeJet_Pro_7740_series?ip=192.168.14.113" \
matbgn/auto_scan

Development mode

  1. Be sure to have those packages installed:

    sudo apt-get install -y sane-utils libsane-hpaio \
    imagemagick ocrmypdf \
    tesseract-ocr-fra tesseract-ocr-deu
  2. (Run venv &) Install requirements with:

    pip install -r requirements.txt
  3. Grab scan_pdf depedency

    cd auto-scan
    git clone --depth=1 https://github.com/matbgn/scan-pdf.git
    cd ..
  4. Put your variables in a .env file with below possibilities:

    SMTPSERVER_EMAIL=your_email@gmail.com
    SMTPSERVER_PASS=your_gmail_app_secret
    SMTPSERVER_HOST=smtp.gmail.com
    EMAIL_RECIPIENTS=""email1@gmail.com;email2@gmail.com"
    SCAN_MODE=Flatbed
    SUBJECT="Test scan"
    BATCH_TOTAL=1
    PAPER_FORMAT="A4"
    PRINTER_ADDRESS="hpaio:/net/OfficeJet_Pro_7740_series?ip=192.168.14.113"
    PAPERLESS_LOCATION="~/paperless/consume"
  5. Then run script directly with:

    python ./main.py

In case of error

In case of following error proceed as below:

⚠️

Error during converting jpg to pdf convert-im6.q16: attempt to perform an operation not allowed by the security policy `PDF' @ error/constitute.c/IsCoderAuthorized/408.

As a temporary fix, edit /etc/ImageMagick-6/policy.xml and change the PDF rights down in the document from none to read|write there. Or simply run:

sed -i 's/rights="none" pattern="PDF"/rights="read|write" pattern="PDF"/' /etc/ImageMagick-6/policy.xml

Not sure about the implications, but at least it allows getting things done.

Deploy on a server

Follow the first two points of the above development mode on your server (or just locally).

Then simply run (on any port wanted):

waitress-serve --port=4040 app:app &

As a service:

# /etc/systemd/system/auto_scan.service
[Unit]
Description=Auto scan server
After=network.target

[Service]
Type=simple
User=USER_TO_BE_USED
WorkingDirectory=/home/USER_TO_BE_USED/auto-scan
ExecStart=/path/to/venv/bin/waitress-serve --port=4040 app:app
Restart=always

[Install]
WantedBy=multi-user.target