Philipp_Project_Python

Final product of the project.

How to Run the Script?

Download the sample file.
Run all commands from Commands_to_run.txt
Import all the PDFs in Documents folder inside Sample folder
Run main.py
All informations will be appended in Book-v2.csv

What Do this Script DO?

This script use poppler-utils to convert a pdf into image and after that use tesseract-ocr and arabic tesdata to extract the text.we run the ocr 2 times as the arabic ocr can't accurately extract numeric values. Finally we use python`s inbuild library csv to append the csv and also write name of extracted pdfs into a text file for future refrence to reduce redudency.

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
sample		sample
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Philipp_Project_Python

How to Run the Script?

What Do this Script DO?

About

Releases

Packages

Rohanpudasaini/Arabic_PDF_To_English_CSV

Folders and files

Latest commit

History

Repository files navigation

Philipp_Project_Python

How to Run the Script?

What Do this Script DO?

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages