External OCR API Integration #575

savi8sant8s · 2024-12-11T16:41:14Z

Question

Can an OCR API be run externally? To receive the output and add to the converter?

dolfim-ibm · 2024-12-12T07:59:31Z

It would be possible to implement it, but be aware of

The OCR latency will for sure increase substantially
This should be clearly communicated to the users opting for this type of OCR

savi8sant8s · 2024-12-12T17:14:30Z

@dolfim-ibm One more question. Is it possible to use only the document conversion part with an output like the one attached and skip the initial steps? If so, do any of the repositories (docling, docling-core, docling-parser) have a piece of code similar to this goal? I want to test doclingo to see its performance in transforming a standard OCR output into a more readable structure.

Output:

{
    "status": "succeeded",
    "createdDateTime": "2024-12-24T08:46:32Z",
    "lastUpdatedDateTime": "2024-12-24T08:46:34Z",
    "analyzeResult": {
        "version": "3.0.0",
        "readResults": [
            {
                "page": 1,
                "angle": 0,
                "width": 8.2639,
                "height": 11.6944,
                "unit": "inch",
                "lines": [
                    {
                        "boundingBox": [
                            0.7901,
                            0.9701,
                            1.0501,
                            0.9668,
                            1.0634,
                            1.1834,
                            0.8067,
                            1.1901
                        ],
                        "text": "Dunder Mifflin",
                        "words": [
                            {
                                "boundingBox": [
                                    0.7901,
                                    0.9701,
                                    1.0101,
                                    0.9668,
                                    1.0134,
                                    1.1868,
                                    0.7934,
                                    1.1901
                                ],
                                "text": "D",
                                "confidence": 0.981
                            }
                        ]
                    }
                ]
            }
        ]
    }
}

dolfim-ibm · 2024-12-13T08:03:47Z

Pointing you to the new tech report. In Figure 1 we outline the steps in the PDF pipeline: #574

If you have a scanned document (I assume since you refer to OCR) the PDF parsing is just identifying the image bitmap, which are then sent to the OCR engines.

In my opinion, the output you want is simply the DoclingDocument in result.document which has all the document components.

savi8sant8s · 2024-12-13T10:58:34Z

Thanks for information

savi8sant8s added the question Further information is requested label Dec 11, 2024

savi8sant8s closed this as completed Dec 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

External OCR API Integration #575

External OCR API Integration #575

savi8sant8s commented Dec 11, 2024

dolfim-ibm commented Dec 12, 2024

savi8sant8s commented Dec 12, 2024 •

edited

Loading

dolfim-ibm commented Dec 13, 2024

savi8sant8s commented Dec 13, 2024

External OCR API Integration #575

External OCR API Integration #575

Comments

savi8sant8s commented Dec 11, 2024

Question

dolfim-ibm commented Dec 12, 2024

savi8sant8s commented Dec 12, 2024 • edited Loading

dolfim-ibm commented Dec 13, 2024

savi8sant8s commented Dec 13, 2024

savi8sant8s commented Dec 12, 2024 •

edited

Loading