Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Whitespaces removed with certain fonts #506

Open
dan-corneanu opened this issue Feb 18, 2023 · 0 comments
Open

Whitespaces removed with certain fonts #506

dan-corneanu opened this issue Feb 18, 2023 · 0 comments

Comments

@dan-corneanu
Copy link

Given the pdf file sample.pdf, that has a few lines of text using different fonts, when I try to extract the text on the page with

file = File.open('./tmp/sample.pdf')
reader = PDF::Reader.new(file)
puts reader.pages.first.text

I get

Spaces with font Courier bold
Spaces with font Courier normal
Spaces with font Times-Roman bold
Spaces with font Times-Roman normal
Spaces with font Helvetica bold
Spaces with font Helvetica normal

SpaceswithfontLatobold
SpaceswithfontLatonormal

Notice that for the text in Lato font, whitespaces have been removed.

I was expceting
whitespaces to be preserved.

Spaces with font Lato bold
Spaces with font Lato normal

Is this because Lato's space glyph is not wide enough for the criteria in PDF::Reader#+?

if (other.x - endx) <( font_size * 0.2)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant