Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Writing/modifying pdfs #56

Open
kskyten opened this issue Apr 24, 2019 · 6 comments
Open

Writing/modifying pdfs #56

kskyten opened this issue Apr 24, 2019 · 6 comments

Comments

@kskyten
Copy link

kskyten commented Apr 24, 2019

Is it possible to modify the parsed pdf and write it to a file? Specifically I'm interested in the ideas from here: open-source-ideas/ideas#46. Julia has excellent support for neural networks, so it would be interesting to experiment with something like this.

@sambitdash
Copy link
Owner

Both are definitely possible while first one can fit into the purview of PDFIO, the second one can be developed as a separate project that utilizes capabilities if PDFIO. PDFIO is a low level PDF reading (can be extended for manipulation) API.

There is no plans to move it to the realm of machine learning or NLP or document structure understanding.

@kskyten
Copy link
Author

kskyten commented Apr 24, 2019

I agree. What should be done to support writing pdf files? Is that a large undertaking?

@sambitdash
Copy link
Owner

For the list given 3-6 man months depending on how much you understand PDF specification. Many of the things need document understanding which can be excluded from the list. More than development, good PDF parsers have to tested with variety of file types. That can be overwhelming.

@kskyten
Copy link
Author

kskyten commented Apr 24, 2019

Unfortunately, I'm not very familiar with the PDF spec. What is the bare minimum that needs to be implemented just to write pdfs?

@sambitdash
Copy link
Owner

@kskyten unfortunately, without understanding the PDF specification it will be hard to write a writer particularly when you are looking at modifying page content. Moreover, writers require compression encoders which are not integrated to PDFIO only decoders are currently integrated.

Personally, writer is not very high on my priorities. While I can guide as a maintainer and owner of the library, I cannot commit on any implementation work myself.

@kskyten
Copy link
Author

kskyten commented Apr 24, 2019

I was hoping I would just be able to copy the unmodified streams over and modify the lengths and references to make it work. I don't think I need a full-blown writer as I only need to modify a specific subset of streams, but I might be wrong.

@sambitdash sambitdash mentioned this issue Jul 3, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants