Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(Translate): Implement post-processing layout analysis to avoid content overlap and optimize visual appearance #297

Open
awwaawwa opened this issue Dec 19, 2024 · 4 comments
Labels
enhancement New feature or request

Comments

@awwaawwa
Copy link
Contributor

awwaawwa commented Dec 19, 2024

Currently, the Chinese line spacing is fixed at 1.4 times to avoid overlap. By implementing layout analysis, the line spacing can be appropriately increased, utilizing the current post-paragraph blank space to improve the visual appearance.

@Byaidu Byaidu added the enhancement New feature or request label Dec 19, 2024
@timelic
Copy link

timelic commented Dec 19, 2024

保持行距一致更好。稀疏行距在排版上是大忌。

我不反对对全文进行扫描之后用自适应的统一行距,但是不应该就每一段用不同的行距。

@awwaawwa
Copy link
Contributor Author

My idea is to try to unify the line spacing as much as possible, and reduce the line spacing where it doesn't fit. For example, first calculate the maximum allowable line spacing page by page, then take the 90th percentile, and finally compress the remaining space. Consider using dynamic programming for calculating and reducing line spacing, as using greedy algorithms might yield poorer results. Of course, in the initial stage, a quick implementation using a greedy algorithm might be used to verify feasibility.

@timelic
Copy link

timelic commented Dec 19, 2024

由于要将代码改写成page-independent来并行化,感觉这种统一分析很难做。并且,似乎也无法区分哪些文本属于“正文”,从而统一正文的行距。

@awwaawwa
Copy link
Contributor Author

awwaawwa commented Dec 19, 2024

Maybe it can be done by word count, with the one having more words serving as the main text. Page independence is not a concern; it can be done in multiple stages, with as much parallel processing as possible within each stage, and serial processing between different stages.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants