Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

External OCR API Integration #575

Closed
savi8sant8s opened this issue Dec 11, 2024 · 4 comments
Closed

External OCR API Integration #575

savi8sant8s opened this issue Dec 11, 2024 · 4 comments
Labels
question Further information is requested

Comments

@savi8sant8s
Copy link

Question

Can an OCR API be run externally? To receive the output and add to the converter?

@savi8sant8s savi8sant8s added the question Further information is requested label Dec 11, 2024
@dolfim-ibm
Copy link
Contributor

It would be possible to implement it, but be aware of

  1. The OCR latency will for sure increase substantially
  2. This should be clearly communicated to the users opting for this type of OCR

@savi8sant8s
Copy link
Author

savi8sant8s commented Dec 12, 2024

@dolfim-ibm One more question. Is it possible to use only the document conversion part with an output like the one attached and skip the initial steps? If so, do any of the repositories (docling, docling-core, docling-parser) have a piece of code similar to this goal? I want to test doclingo to see its performance in transforming a standard OCR output into a more readable structure.
image

Output:

{
    "status": "succeeded",
    "createdDateTime": "2024-12-24T08:46:32Z",
    "lastUpdatedDateTime": "2024-12-24T08:46:34Z",
    "analyzeResult": {
        "version": "3.0.0",
        "readResults": [
            {
                "page": 1,
                "angle": 0,
                "width": 8.2639,
                "height": 11.6944,
                "unit": "inch",
                "lines": [
                    {
                        "boundingBox": [
                            0.7901,
                            0.9701,
                            1.0501,
                            0.9668,
                            1.0634,
                            1.1834,
                            0.8067,
                            1.1901
                        ],
                        "text": "Dunder Mifflin",
                        "words": [
                            {
                                "boundingBox": [
                                    0.7901,
                                    0.9701,
                                    1.0101,
                                    0.9668,
                                    1.0134,
                                    1.1868,
                                    0.7934,
                                    1.1901
                                ],
                                "text": "D",
                                "confidence": 0.981
                            }
                        ]
                    }
                ]
            }
        ]
    }
}

@dolfim-ibm
Copy link
Contributor

Pointing you to the new tech report. In Figure 1 we outline the steps in the PDF pipeline: #574

If you have a scanned document (I assume since you refer to OCR) the PDF parsing is just identifying the image bitmap, which are then sent to the OCR engines.

In my opinion, the output you want is simply the DoclingDocument in result.document which has all the document components.

@savi8sant8s
Copy link
Author

Thanks for information

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants