- Optical text recognition (OCR) from images.
- Speech recognition from audio files.
- Convert PDF files to text documents, extract images and tables
Link to the installer file archive: Yandex.disk
Install the libraries using the command:
pip install -r requirements.txt
The file to run is main.py - main.py
Files with information about the application in the docs folder.
The settings are stored in JSON files, which are located in Settings
The weight of the final application is ~288 Mb But during recognition, language sets and recognition models (in speech recognition) will be downloaded to your computer as needed. Unfortunately, version 1.1 has almost no GPU support. The ffmpeg package must be installed for speech recognition to work, if it is not installed.