Skip to content

Chemical equation identification, autocorrection and conversion in searchable format from document images.

Notifications You must be signed in to change notification settings

anubhabMajumdar/Chemical-Equation-Identification-Autocorrection

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 

Repository files navigation

Developed by Prerana Jana(prerana.jana@gmail.com), Anubhab Majumdar(anubhabmajumdar93@gmail.com).

PDF format of scanned document images is not searchable. OCR tries to remedy this adversity by converting document images into editable and searchable data, but it has its own limitations in presence of equations - both mathematical and chemical. OCR system for mathematical equation is already a major research area and has provided success- ful result. However, chemical equation segmentation has been a less ventured road. In this paper, we present a novel method for automated generation of searchable PDF format of segmented chemical equations from scanned doc- ument images by performing chemical symbol recognition and auto-correction of OCR output. We use existing OCR system, pattern recognition technique, contextual data anal- ysis and a standard LATEX package to generate the chemical equation in searchable PDF format. The effectiveness of the proposed method is verified through exhaustive testing on 234 document images.

PUBLISHED IN ACM SYMPOSIUM ON DOCUMENT ENGINEERING (DocEng 2016)

About

Chemical equation identification, autocorrection and conversion in searchable format from document images.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages