Skip to content

Speed for low resource machine #245

Answered by cau-git
timif2 asked this question in Q&A
Nov 5, 2024 · 1 comments · 5 replies
Discussion options

You must be logged in to vote

@timif2 Good to see this question coming up 😃 .

There are several things you can do to improve the performance, depending on the use case you have. The pipeline features, ordered from most expensive to cheapest: OCR, table structure recognition, PDF parsing. My recommendations are:

  1. Turn off OCR if you don't need it for your data (e.g. you bring digital-only PDFs)
  2. Turn of table structure recognition if you don't need table structure (e.g. your PDFs have no tables or you don't need the table's content)
    • only possible in python API code, see below.
  3. Switch the PDF backend to DoclingParseV2DocumentBackend (beta), which speeds up PDF loading by ~10x, with good impact o…

Replies: 1 comment 5 replies

Comment options

You must be logged in to vote
5 replies
@AdityaMannu1709
Comment options

@cau-git
Comment options

@AdityaMannu1709
Comment options

@simjak
Comment options

@simjak
Comment options

Answer selected by timif2
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
4 participants