This repository contains the code and data necessary to reproduce results published in the article "Unlinked: Conflict Management and Structural Balance in the Early Modern Republic of Letters" (DOI paper forthcoming). This paper is authored by Ingeborg van Vugt (https://orcid.org/0000-0002-7703-1791). The data preparation steps for that article were done in collaboration with Liliana Melgar-Estrada (https://orcid.org/0000-0003-2003-4200) during the SKILLNET project, who created this Jupyter notebook to facilitate reproducibility of the results.
The SKILLNET project ("Sharing Knowledge in Learned and Literary Networks") is an European Research Council (ERC)-funded project led by Dirk van Miert at Utrecht University. This project investigates the ideals of sharing knowledge as a legacy of a bottom-up social network of scholars and scientists from the Early Modern period. These scholars transcended religious, political and linguistic boundaries through their correspondence with one another. From about 1500 to around 1800, the ‘citizens’ of this European knowledge-based civil society referred to their own community as the ‘Republic of Letters’.
The article shows research done by van Vugt, who adapts the structural balance theory to understand crucial moments of change in the structure of the network around the extremely well-connected Florentine scholar Antonio Magliabechi (1633-1714), librarian to Cosimo III de’ Medici, Grand Duke of Tuscany from 1670 until his death in 1723.
The data and code used in the article are shared in this public Github repository in order to facilitate to a great extent the reproduction of the results presented in the paper. By doing this, the authors adhere to the principles of Open Science, which outline the value of transparency, acountability and reproducibility of research in the sciences but also in the humanities.
In order to perform all the steps that we, the authors of this paper, followed to obtain the results analyzed in the article, we share here all the data and code, structured in this way:
- The \data folder contains the files with raw data and processed data. The raw data folder contains four data sources (the main letter datasets outlined in the "Data sources" section below, plus two files with annotated ties compiled by the main author of this article). The \data\processed folder contains individual files, per year, of the "correspondent pairs" with the annotated ties (negative or positive). These individual files are generated by processing the raw data with the python notebook.
- The \src folder contains a Jupyter notebook in which we added the script (python code) plus descriptions of the steps followed. The python notebook contains all the code we used for the "cleaning" and preprocessing of the data.
- The \data\processed folder contains the files that were used for the data analysis of the structural changes in the Dutch network of Magliabechi over time. Because we have uncertainties in the data (e.g., when the year in which a letter was written is uncertain), we created sub-sets in our data with the "certain" data only, and with all the data (including both certain and uncertain data).
- There is other notebook (in R), deposited in another repository (DOI to add) which takes as output the files that are generated by the Python notebook (contained in the \data\processed folder). The analysis notebook in R has the script that performs the network analyzes and generates the visualization(s) that served as a basis for the interpretations. In the results folder of that repository, the figures and outputs are divided into these two types, but in the article we only use the "certain" portion of the data (which is ca 95% of it).
If you are only interested in viewing/reading the files here in Github to trace our steps, you only have to click on the different folders and files, recommended in this order:
- First take a look to the python notebook, which has the data preparation steps.
- Then visit the other repository to take a look to the R analysis notebook, which has the analysis steps.
The first notebook contain the data outputs within them, thus, you don't need to execute the code to see what each step does.
If you want to reproduce (and execute) all the steps in your own computer, you need to either clone this repository with git clone
or to download it as a zip file. The difference between cloning and downloading is that the first one allows you to synchronize to update your files to the most updated versions (via Git), while the second one downloads a copy (version) that is static. We don't cover here the explanation on how to do this, but there are several tutorials that can serve as guidance, for example this one: (ToDo)
In order to execute a Jupyter notebook in your own computer, you need to follow certain steps, this tutorial can guide you through that process (https://programminghistorian.org/en/lessons/jupyter-notebooks).
The versions of the software we use are the following (see also the requirements.txt file):
- Jupyter notebooks (version 3.2.1)
- anaconda version: 2021.11
- Python version: 3.9.7
We do not include here the steps to install Python or R in your computer but, if you prefer not to install these software, you can upload the Jupyter notebook(s) to a cloud environment (e.g., Google Collab) and run it there without the need for installation. If you use the notebooks in this way, you need to also take care of uploading the data to the cloud environment. In that case:
- download the .ipynb file(s) from this repository
- open a cloud environment that allows working with Jupyter notebooks (e.g., Google Colab)
- upload the file there
- upload the data files
This article uses two main data sources. Their latest versions (as processed during the SKILLNET project) are deposited in Dataverse (https://dataverse.nl/dataverse/skillnet), where more details about the cleaning process are detailed. For the purpose of this repository, which is to facilitate the reproduction of our results, the versions of those files used for this article are "frozen" as copies stored in the \data\raw folder of this Github repository (these versions were frozen on May 12, 2022 for the first time, and on November 15, 2022).
-
Garfagnini's inventory: The printed inventory of Antonio Magliabechi's letters by Doni Garfagnini (1988). The inventory of over 22.000 letters is based on the collections of the National Library of Florence and records the names of Magliabechi’s correspondents as well as the dates and places of sending (more information and the dataset can be found here: https://dataverse.nl/dataset.xhtml?persistentId=doi:10.34894/GTV2RN)
-
The Catalogus Epistolarum Neerlandicarum (CEN): This is an aggregated letter dataset of letters' metadata created at different archives and libraries in The Netherlands since the years 1980's which contains more than five hundred thousand records. Since January 2020, one can consult the CEN via Worldcat (https://picarta.on.worldcat.org (last accessed December 29, 2021). The CEN dataset was obtained by Ingeborg van Vugt in XML format from the Online Computer Library Center (OCLC) and the Royal Dutch Library (KB) in October 2019. The dataset has been sliced (years between 1200 to 1820) and cleaned during the SKILLNET project. Its complete description and the data itself are available here (https://dataverse.nl/dataset.xhtml?persistentId=doi:10.34894/G8XQI0).
Data wrangling processes were applied to the original datasets mentioned above. We do not detail all those processes here but in the repositories where the data is deposited: https://dataverse.nl/dataverse/skillnet. The specific processes applied for this article using those raw datasets as the input, are detailed here in this repository, in the Jupyter notebook named "script_dataPreparation_Python_unlinkedArticle.ipynb".
For the purpose of doing network analysis we had to select, filter, and merge different slices of the datasets mentioned above and then generate slices per individual years. The Jupyter notebook "unlinkedArticle_dataPreparation_python" has a step by step explanation of these processes. The output of this notebook goes to two folders:
- \data\processed\perYear_ALL: here there is one csv file per year, containing the positive and negative relations. This folder contains both certain and uncertain data (more explanations about this are given in the notebook itself).
- \data\processed\perYear_certain: here there is one csv file per year, containing the positive and negative relations. This folder contains only the certain data, which we use in the article (more explanations about this are given in the notebook itself).
The files from the second folder above (\data\processed\perYear_certain) are used as input for the Jupyter notebook named "unlinkedArticle_dataAnalysis_R". This notebook lives in a separate Github repository (https://github.com/inge1211/unlinked_historical_structural_balance) (ToDo: add DOI from repository). It includes a script that uses the R package Signnet to create the four configurations of balance and imbalance from a signed network.
You can submit Github issues to us. These are useful, for instance, if you detect errors in the data, or if you want to contribute to fill the gaps in the missing data (for example, if you encounter the year in which a letter was sent which was absent in our dataset).
We name the versions of our data and scripts with the date of their latest update. The version submitted to the editors of the journal has the date of 20220512 (this is version 1). The second version was generated on November 15, 2022 when a new version of the raw datasets (CEN) was used.
- Doni Garfagnini, M. 1988. Lettere e carte Magliabechi. Inventario cronologico. Rome: Istituto Storico Italiano per l'Età.
This project is licensed under the MIT License - see the LICENSE.md file for details
- Ingeborg van Vugt - main researcher and author. Responsible for data analysis and interpretation, responsible for data collections and co-responsible for data preparation.
- Liliana Melgar-Estrada - co-author methods section. Proofreading. Responsible for data preparation and depositing.
Contributors who participated in this project:
- Dirk van Miert (project funding, reading and feedback)
- Koen Scholten (reading and feedback)