-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Table picker for PDF #2
Comments
I have written some lines of code to extract tabular data. Currently it is keyword based to determine the textlayouts to include. I also managed to make short IJulia notebook where you can interactively select text in a Plotly chart. |
@hhaensel thank you for your interest. I want to understand what level of complex cases can this software handle. If you submit a PR, I can review it and let you know if they are useful for this SDK. |
Sounds perfect, I'll submit a PR tomorrow. Looking forward to your feedback. |
Sorry, currently in overload, will take some more time ... |
Natural tabular objects in a PDF document should ideally be picked up for extraction.
The intent of the project is API development, hence it will be headless for most part. There may not be a WYSIWYG picker available unlike a reader. A heuristic table picker should scan the document for existence of table like structures and dump them in tabular HTML/CSS format or extracted image objects. In cased document tagging is enabled, the table picker can use the tagged text.
The text was updated successfully, but these errors were encountered: