Skip to content

Latest commit

 

History

History
30 lines (17 loc) · 1.05 KB

File metadata and controls

30 lines (17 loc) · 1.05 KB

Convert_UK_Tier2_Tier5_SponsorPDF_To_XML

Convert UK Tier 2 & Tier 5 Work Sponsor list in PDF to XML structured file

Download and execute the file: Create UK-Tier2-Tier5-SponsorList-PDF-To-XML.py in order to download, format & convert the sponsor list in PDF to XML.

There are few important dependecies that need to installed in the system or else the program won't execute properly.

Dependecies Required

PDFTOHTML

You need to install pdftohtml on the system.

It can be installed with the following command:

>> brew install pdftohtml 

This adds pdftohtml to your path.

Website of PDFTOHTML: http://pdftohtml.sourceforge.net

PDFtk

You also need to install PDFtk Server to format the PDF file properly so that it can be converted to structured XML format.

It can be sintalled from the following web link:

https://www.pdflabs.com/tools/pdftk-server/

Simply choose the version based on your operating system and install. The path for 'pdftk' will be automatically added.

Website of PDFtk: https://www.pdflabs.com/tools/pdftk-the-pdf-toolkit/