Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extra spaces between letters in a single word #509

Open
pickhardt opened this issue Mar 27, 2023 · 2 comments
Open

Extra spaces between letters in a single word #509

pickhardt opened this issue Mar 27, 2023 · 2 comments

Comments

@pickhardt
Copy link

I noticed this gem has problems parsing some PDFs where the text is not necessarily clean.

For instance, this file: https://www.jstor.org/stable/3684663

Some parts of it get output like: "a b o u t a r e g r e s s i o n t o o r i g i n a l c h a o s"

However, it doesn't seem like it's inherently a problem with the file, because Python's PyPDF2 interprets it correctly as "about a regression to original chaos"

Do you think there is some step that this reader is missing? Or alternatively is there some option I should set when using the PDF::Reader to get it to read the pdfs better?

@shmolf
Copy link

shmolf commented Apr 15, 2024

I too am experiencing this issue.

@iprog21
Copy link

iprog21 commented May 31, 2024

same here.

I did some gsub. it works when the clustered word is in Pascal Case.

TheFirstWord = The First Word gsub(/([a-z])([A-Z])/, '\1 \2')
thefirstword = thefirstword ???

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants