Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extract document structure from the PDF document #7

Open
sambitdash opened this issue Jul 24, 2017 · 1 comment
Open

Extract document structure from the PDF document #7

sambitdash opened this issue Jul 24, 2017 · 1 comment

Comments

@sambitdash
Copy link
Owner

This may not be very accurate but a good way to start understanding the document. The creators do not always provide the final reader intent of the document.

@sambitdash
Copy link
Owner Author

The document structure is being traversed as part of text extraction. The tags need to be added as a XML metadata overlay.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant