Skip to content

Get the Archival resource keys from eluxemburgensia.lu public opendata set (the text analysis pack)

Notifications You must be signed in to change notification settings

ymaurer/eluxemburgensia-opendata-ark

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

eluxemburgensia-opendata-ark

Script to extract the ARK identifier from the BNL opendata set

# first download the script get_ark_from_issue.sh
# then download the data
curl "https://data.bnl.lu/open-data/digitization/newspapers/export01-newspapers1841-1878.zip" --output papers.zip
unzip papers.zip
cd export01-newspapers1841-1878
find . -type f -name '*.xml' -exec ../get_ark_from_issue.sh '{}' \; > ark2paperid.txt
# then filter by paperid and expand to full ARK of page 4 and urlencode
grep 'lunion' ark2paperid.txt | sort -k 2 | awk '{print "ark:/70795/" $1 "/pages/4"}' | sed 's/\//\%2f/g' > lunion.txt
# expand to full iiif URL
awk '{print "https://iiif.eluxemburgensia.lu/iiif/2/" $1 "/full/full/0/default.jpg" }' lunion.txt > todownload.txt

About

Get the Archival resource keys from eluxemburgensia.lu public opendata set (the text analysis pack)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages