Replies: 5 comments
-
From what I know (but I might be wrong) Tesseract is not optimized for multi-threaded environment, that's probably why you are seeing low utilization of the CPU. It should be implemented in Tesseract itself, on our side we can probably only do some parallelization ourselves, for example if you have many images that you want to process, or if the image you want to process can be divided into more smaller images (e.g. based on the layout determining phase of the Tesseract, where you might get info about different blocks of text on the image). But none of that is ideal, Tesseract should support it natively. Another slow-down is probably computation of dotproduct, which is not HW accelerated on Android, but there is AVX / SSE optimizations for desktop. I saw someone implemented dotproduct with NEON instructions for Android, which might increase performance somewhat. If someone could implement it for current codebase and send pull request (ideally into official Tesseract repository, or at least for this library) would be great. And another time-affecting aspect is size and quality of the processed image. In official Tesseract repo they recommend doing various own pre-processing to speed-up processing. You can compare by using clean screenshot of text document without distortions versus taking photo of printed text on paper. Also Tesseract LSTM engine is way slower (but also produces better quality results) than the previous LEGACY engine. Btw, in your code you don't need to initialize Tesseract on each call, you can reuse the instance. |
Beta Was this translation helpful? Give feedback.
-
@dankito you can see my implantation here and My app is working fine and recognizing text in less than 5 sec |
Beta Was this translation helpful? Give feedback.
-
Tesseract 5 finally brings some support for NEON instructions, so the processing time is greatly improved (from my quick test it is 30 % faster with tessdata_fast and 40 % faster with standard tessdata). |
Beta Was this translation helpful? Give feedback.
-
That would be great! What i also figured out: There's an environment parameter, Can you see where in source this can be set and make this setting available via your API? A hint may can be found in the question of this issue: tesseract-ocr/tesseract#1600 I think this also would greatly improve the performance. |
Beta Was this translation helpful? Give feedback.
-
@dankito For Tesseract 5 you can try the new branch here (but you have to compile it yourself): For multithreading you need to compile the library with OpenMP support. It can't be controlled via runtime parameter. If you want to try that, you need to uncomment these lines: Tesseract4Android/tesseract4android/build.gradle Lines 19 to 20 in 8fb4eae Note that in current Android NDK (or build tools) there is a bug that compiled OpenMP library file is not added to the resulting APK (or AAR in this case). See android/ndk#1028 for some ideas how to work around that. Maybe using specific new version of NDK would help? You can try and tell me. |
Beta Was this translation helpful? Give feedback.
-
Hi,
first of all thanks for the library and the hard work you invested to get Tesseract 4 running on Android!
During my tests I saw, that for a 3.1 MB large image Tesseract4Android takes 2 minutes and for a 0.9 MB large file 1:40 minute to recognize the image.
All work is done on a separate thread dedicated only to Tesseract4Android. Giving that thread a high priority didn't help either.
What I saw in top (by executing "adb shell top -m 10") is that Tesseract4Android only uses 16% of the CPU.
Is there any way to tell Tesseract4Android to use the whole CPU or to speed up recognition otherwise?
Should it be of any relevance, here is the code I used (it's in Kotlin):
Beta Was this translation helpful? Give feedback.
All reactions