Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

某些情况下caption和footnote的错误匹配 #1072

Open
wanxueyao opened this issue Nov 24, 2024 · 1 comment
Open

某些情况下caption和footnote的错误匹配 #1072

wanxueyao opened this issue Nov 24, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@wanxueyao
Copy link

Description of the bug | 错误描述

--问题概述--
当一个图像上下同时出现图像名称标注时几乎一定会错误判断对应的caption和footnote,从而影响后续caption和footnote的结果。经过测试这一问题并不只针对单一文档,在有类似排版的文档中都出现了。这个问题相当影响caption和footnote的可用性,希望在后续版本中修正这一问题,谢谢!
--详细信息--
返回结果的相关部分如下:
{
"type": "table",
"img_path": "images/51c5210ce9b4d027f9d58183d87117e381a24066f79d6f188828ba2a83738947.jpg",
"table_caption": [
"Table 6: Experimental results for MRC task. "
],
"table_footnote": [
"Table 7: Experimental results for PI task. "
],
"page_idx": 6
},
原始pdf的截图:
截屏2024-11-24 20 46 22
返回结果的相关部分如下:
{
"type": "table",
"img_path": "images/8b29d8e85fd2e2b5aaa96cfd93c72fd4c3c071b22723f0d00a387549c8b321b2.jpg",
"table_caption": [],
"table_footnote": [],
"page_idx": 7
},
{
"type": "table",
"img_path": "images/eb6762be7264abc475e3515060df93271dde0f89e19aea59c74da698416c6b7a.jpg",
"table_caption": [
"Table 8: The effect of different data augmentation ways for QQP in terms of F1-score. "
],
"table_footnote": [],
"page_idx": 7
}
原始pdf的截图:
截屏2024-11-24 20 42 19

How to reproduce the bug | 如何复现

正常运行即可。这样排版的论文很多,这里只提供两个文件作为参考。
文件1
2020.acl-main.45.pdf
文件2
P19-1416.pdf

Operating system | 操作系统

Linux

Python version | Python 版本

3.10

Software version | 软件版本 (magic-pdf --version)

0.9.x

Device mode | 设备模式

cuda

@wanxueyao wanxueyao added the bug Something isn't working label Nov 24, 2024
@myhloli
Copy link
Collaborator

myhloli commented Nov 25, 2024

某些情况下,layout模型没有将tablecaption和tablefootnote识别成正确的类型
比如你提供的两个例子:
image
在这个例子中
Table 6: Experimental results for MRC task. 被识别为table capiton
Table 7: Experimental results for PI task. 被识别为table footnote
后处理逻辑仅根据table的坐标和对应的描述区块坐标进行就近关联,且遵循一般通用逻辑,table caption在table body上方,table footnote在table body下方,因此将这两条文本都与下方table进行关联

image
在这个例子中
Table 8: The effect of different data augmentation ways for QQP in terms of F1-score. 被识别为table caption
而 Table 9真正的caption被识别成text block
因此后处理逻辑会将table8的capiton和下方的图片进行关联,而table9的capiton则被合进了正文

造成这种问题的根源是layout模型对这种复杂的图表布局无法做到有效且精准的识别,后处理逻辑在优化结果上存在局限性。目前暂时没有有效方案解决。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants