Skip to content
This repository has been archived by the owner on Sep 26, 2023. It is now read-only.

cneud/alto-ocr-confidence

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 

Repository files navigation

This is no longer supported, please use https://github.com/cneud/alto-tools instead.

alto-ocr-confidence

Calculates the OCR confidence score per page in ALTO files.

The method used is really simple:

  • find all String elements
  • get value of attribute "(WC)" (word confidence) for each String
  • calculate sum of all "WC" values
  • divide sum by the count of words per page

Use like:

python alto_ocr_confidence.py <inputdir>

Example output:

File: alto\AZ_1926_04_25_0001.xml, Confidence: 54.13

Note that OCR confidence (which is a native output of the OCR engine) is NOT equal to the actual OCR accuracy, which can only be determined by evaluation against Ground Truth.

Read more about OCR evaluation here.

About

calculate OCR confidence per page in ALTO

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages