PDF Translator

Overview

PDF Translator is a Python-based tool designed to extract text from PDF documents, translate it into Sinhala using Google Translator, and save the translated content in a well-structured text file format. This tool is ideal for users who need to convert large volumes of PDF content into another language while preserving the structure of tables and pages.

Features

Text Extraction: Extracts text from PDF files while preserving layout information.
Translation: Utilizes Google Translator for translating extracted text into Sinhala.
Table Identification: Detects and formats tables from the PDF content.
File Management: Saves translated content into text files, maintaining the structure of the original PDF.

Installation

Clone the Repository

git clone https://github.com/sithulaka/pdf-translator.git
cd pdf-translator

Set Up a Virtual Environment

It's recommended to use a virtual environment to manage dependencies.

python -m venv venv
source venv/bin/activate  # On Windows, use `venv\Scripts\activate`

Install Dependencies

Install the required Python packages using the requirements.txt file:

pip install -r requirements.txt

Usage

Place your PDF files in the input_pdfs/ folder.
Run the script to process the PDFs and generate translated text files:
```
python main.py
```
Check the output in the output_texts/ folder. Each PDF will have a corresponding .txt file with the translated content.

Example

To translate a PDF named example.pdf, place it in the input_pdfs/ folder and run:

python main.py

The translated text will be saved as example.txt in the output_texts/ folder.

Folder Structure

pdf_translator_project/
├── pdf_translator/
│   ├── __init__.py
│   ├── translator.py
├── input_pdfs/
├── output_texts/
├── requirements.txt
├── README.md
└── main.py

pdf_translator/: Contains the core translation and extraction logic.
input_pdfs/: Folder for input PDF files.
output_texts/: Folder where the translated text files will be saved.
requirements.txt: Lists the dependencies required for the project.
README.md: Provides an overview and instructions for the project.
main.py: The entry point script for processing PDFs.

Contributing

We welcome contributions to improve this project. To contribute:

Fork the repository and create a new branch.
Make your changes and test them thoroughly.
Submit a pull request with a description of the changes.

Please ensure that your contributions adhere to the project's coding standards and include tests where applicable.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Contributing

Contributions are welcome! Please fork the repository and submit a pull request with your improvements or bug fixes.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PDF Translator

Overview

Features

Table of Contents

Installation

Clone the Repository

Set Up a Virtual Environment

Install Dependencies

Usage

Example

Folder Structure

Contributing

License

Contributing

Connect with me

About

Releases 1

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
input_pdfs		input_pdfs
output_texts		output_texts
pdf_translator		pdf_translator
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt

License

sithulaka/pdf-translator

Folders and files

Latest commit

History

Repository files navigation

PDF Translator

Overview

Features

Table of Contents

Installation

Clone the Repository

Set Up a Virtual Environment

Install Dependencies

Usage

Example

Folder Structure

Contributing

License

Contributing

Connect with me

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 1

Languages