Skip to content

globalise-huygens/datasprint-amh

Repository files navigation

GLOBALISE AMH Datasprint

License: CC BY 4.0 DOI

Materials for the GLOBALISE datasprint on places in the Indian Ocean world, University of Amsterdam, May 15th 2023.

⚠️ A more recent version of the 4.VEL map data can be found in the GLOBALISE maps repository.

Introduction

Historical places are important building blocks for the reconstruction of historical events. The GLOBALISE corpus of about 5 million pages from the VOC archives describes hundreds of thousands of events that took place over a period of two centuries in a large number of locations spread over a huge area around the Indian Ocean and Indonesian archipelago. Thanks to initiatives like the Atlas of Mutual Heritage and the World Historical Gazetteer, we can locate some of the places mentioned, but by no means all of them. Within GLOBALISE, we would like to bring as much of these locations to light as possible by creating a dataset that identifies and geolocates historical places mentioned in our texts. This is challenging, as disambiguation of spelling variations is not always easy, place names appear in different languages, change over time, and sources present ambiguous references to locations.

This datasprint aims to foster collaboration between historians, heritage professionals and data scientists for better availability of data on historical places. It intends to curate, publish, and link data on historical places collected by researchers within their own projects, as well as test and improve digital techniques to extract, structure, and share data on places. In addition to data creation, curation, and linking, this datasprint will offer a space to exchange knowledge and expertise on historical places and contexts, and digital techniques. We hope that by the end of the datasprint, all participants will have learned something, and that we will have generated valuable data on historical locations with which to improve our understanding of the early modern Indian Ocean and Indonesian archipelago worlds.

Sessions and documentation

The datasprint consists of three sessions:

  1. Georeferencing early modern maps
  2. Data extraction from early modern maps
  3. Curating and linking new places data(sets)

Reports and documentation for each session can be found in the docs folder.

Preparation and data

Two of these sessions require access to digital map data. For this purpose, we selected the National Archives' 4.VEL collection. The maps from this collection are presented in a standardized format according to the IIIF Image API specification. Additionally, the Atlas of Mutual Heritage provides detailed descriptions of this material. We incorporated that metadata into the IIIF Collections and Manifests we generated, making these maps accessible through session annotation tools. Furthermore, we connected the images, the IIIF Manifests, and the structured metadata of the Atlas of Mutual Heritage by modeling the data in the Europeana Data Model (RDF).

We thank the RCE and the National Archives for providing us with a data dump of the Atlas of Mutual Heritage and the 4.VEL collection.

Collections

These IIIF Collections can be found in the manifests folder. The URLs below point to a Mirador3 viewer with the respective collection loaded.

Aggregations

To connect the image itself, the IIIF Collections and Manifests, and the structured metadata of the Atlas of Mutual Heritage, the Atlas's data is modelled in the Europeana Data Model as RDF. The Manifests then link to the structured RDF data using the rdfs:seeAlso property, while the map links to the Manifest using the dcterms:isReferencedBy property. An example of this data can be found in aggregations/4.VEL/297.json:

{
  "@context": "https://globalise-huygens.github.io/datasprint-amh/context.json",
  "id": "https://globalise-huygens.github.io/datasprint-amh/aggregations/4.VEL/297.json",
  "type": "ore:Aggregation",
  "edm:aggregatedCHO": {
    "id": "http://hdl.handle.net/10648/ad12d7d6-3531-4cb7-8a24-50d3e0b41633",
    "type": ["edm:ProvidedCHO", "schema:Map"],
    "image": "https://www.atlasofmutualheritage.nl/image/2022/4/21/vel0297.jpg%28mediaclass-meta-tag-image.4b190bfcc55e159332679890b17bd2261ced7954%29.jpg",
    "dc:title": "Plattegrond van het kasteel St.Jago te Manilha",
    "dc:description": "Titel in catalogus Leupe (Nationaal Archief): Platte grond van het Kasteel St.Jago en de Stadt Manilha.\nNotities verso: Behoort by de overgekomen brieven en papieren van Batavia 4e deel 1704, N1 / 2106 [folionummer in de band ?].",
    "dc:type": "tekening",
    "dc:identifier": "VEL0297",
    "dc:subject": ["gebouw", "plattegrond / kaart", "vesting"],
    "dc:language": "nl",
    "dcterms:medium": "papier",
    "edmfp:technique": "ingekleurde tekening",
    "dcterms:extent": "41,5 x 54,5 cm",
    "dcterms:date": "1680-1704",
    "dcterms:provenance": "Nationaal Archief",
    "dcterms:isPartOf": "Atlas of Mutual Heritage",
    "seeAlso": "https://www.atlasofmutualheritage.nl/page/7863/plattegrond-van-het-kasteel-st.jago-te-manilha"
  },
  "edm:isShownBy": {
    "id": "https://service.archief.nl/iip/96/98/8a/b6/17/f4/42/0f/97/b9/eb/ce/9f/aa/28/65/7b0fecf8-26da-4bb6-8fd6-73cef3002bd5.jp2/full/full/0/default.jpg",
    "type": "edm:WebResource",
    "svcs:has_service": {
      "id": "https://service.archief.nl/iip/96/98/8a/b6/17/f4/42/0f/97/b9/eb/ce/9f/aa/28/65/7b0fecf8-26da-4bb6-8fd6-73cef3002bd5.jp2",
      "type": "svcs:Service",
      "profile": "http://iiif.io/api/image",
      "implements": "http://iiif.io/api/image/2/level1.json"
    },
    "rights": "https://creativecommons.org/publicdomain/mark/1.0/",
    "isReferencedBy": {
      "id": "https://globalise-huygens.github.io/datasprint-amh/manifests/4.VEL/297.json",
      "type": "iiif:Manifest",
      "rdfs:label": "Platte grond van het Kasteel St. Jago en de Stadt Manilha."
    }
  },
  "edm:dataProvider": "Rijksdienst voor het Cultureel Erfgoed",
  "edm:provider": "Nationaal Archief",
  "edm:rights": "https://creativecommons.org/publicdomain/mark/1.0/"
}

An RDF dump of all the aggregations (in text/turtle) can be found in the rdf folder.

Scripts

  • inventory2handle_title_date.py: Used to create a mapping between the inventory number and the title and date of the map. Data stored as JSON.
  • inventory2hierarchy.py: Used to create a mapping between the inventory number and the hierarchy of the map (its archival (sub)series and file structure). Data stored as JSON.
  • make_manifest_v3.py: Used to create IIIF Manifests (v3) for the maps. Also outputs ore:Aggregations and EDM RDF. Data stored as JSON-LD.
  • make_manifest_v2.py: Used to create IIIF Manifests (v2) for the maps, for use in Recogito.
  • jsonld2ttl.py: Used to convert the JSON-LD output of make_manifest_v3.py to text/turtle RDF (as a dump instead of individual files).

About

The scripts and data in this repository were created for the GLOBALISE and CREATE datasprint in Spring 2023. A blog post is written about the sprint's results: https://globalise.huygens.knaw.nl/old-maps-new-discoveries-a-datasprints-digital-exploration/.

See the GLOBALISE website (https://globalise.huygens.knaw.nl/) for more information about the project.

A copy of the repository is archived on Zenodo:

  • Van Wissen, L., Schoonman, J., Wevers, M., Stapel, R., & GLOBALISE Project. (2023). GLOBALISE Datasprint: Mapping Places in the Indian Ocean World (v0.1). Zenodo. https://doi.org/10.5281/zenodo.13341556