You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I want to convert the output from json to separate tables csv. I wrote a code for it. But, I am seeing numbers converted to text. json { "bbox": { "l": 364.781005859375, "t": 337.5539855957031, "r": 377.7409973144531, "b": 328.56201171875, "coord_origin": "BOTTOMLEFT" }, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "/five.lt/period.tab/eight.lt", "column_header": false, "row_header": false, "row_section": false }, { "bbox": { "l": 420.2619934082031, "t": 337.5539855957031, "r": 433.22198486328125, "b": 328.56201171875, "coord_origin": "BOTTOMLEFT" }, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "/six.lt/period.tab/seven.lt", "column_header": false, "row_header": false, "row_section": false },
5.8 is showing as "/five.lt/period.tab/eight.lt"
In one of the other tables; "/three.osf_tab./zero.osf_tab/zero.osf_tab% ", "-/zero.osf_tab./two.osf_tab/three.osf_tab%"
I think this is due to otsl; the work from this https://arxiv.org/abs/2305.03393 used in the tsr model; but how can convert it to normal numbers with post-processing. Any utility code available for this already? or any other help will be appreciated
The text was updated successfully, but these errors were encountered:
@mllife this is a matter of how the PDF encoded the text, you'll be getting out whatever the PDF has encoded in it. So, this is not a matter of TableFormer but one of the PDF backend and its string sanitation.
Question
I want to convert the output from json to separate tables csv. I wrote a code for it. But, I am seeing numbers converted to text.
json { "bbox": { "l": 364.781005859375, "t": 337.5539855957031, "r": 377.7409973144531, "b": 328.56201171875, "coord_origin": "BOTTOMLEFT" }, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 2, "end_col_offset_idx": 3, "text": "/five.lt/period.tab/eight.lt", "column_header": false, "row_header": false, "row_section": false }, { "bbox": { "l": 420.2619934082031, "t": 337.5539855957031, "r": 433.22198486328125, "b": 328.56201171875, "coord_origin": "BOTTOMLEFT" }, "row_span": 1, "col_span": 1, "start_row_offset_idx": 4, "end_row_offset_idx": 5, "start_col_offset_idx": 3, "end_col_offset_idx": 4, "text": "/six.lt/period.tab/seven.lt", "column_header": false, "row_header": false, "row_section": false },
5.8 is showing as "/five.lt/period.tab/eight.lt"
In one of the other tables; "/three.osf_tab./zero.osf_tab/zero.osf_tab% ", "-/zero.osf_tab./two.osf_tab/three.osf_tab%"
I think this is due to otsl; the work from this https://arxiv.org/abs/2305.03393 used in the tsr model; but how can convert it to normal numbers with post-processing. Any utility code available for this already? or any other help will be appreciated
The text was updated successfully, but these errors were encountered: