Data and R Markdown Notebook for a conference paper titled "The lexicalisation of HAPPINESS in the Malayic varieties of Indonesia" to be presented at the International Seminar on Austronesian Languages and Literature IX, at the Faculty of Humanities, Udayana University, Indonesia


Supplementary materials, including data and R Markdown Notebook, for a paper titled The lexicalisation of HAPPINESS in the Malayic varieties of Indonesia

Gede Primahadi Wijaya Rajeg ORCID iD icon & I Made Rajeg ORCID iD icon
Universitas Udayana, Indonesia


Creative Commons License DOI DOI DOI
This repository is licensed with the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Please cite this repository (in OSF) (Rajeg and Rajeg 2021) as follows if you use the data and other materials here in your research and/or teaching (in Unified Style Sheet for Linguistics):

Rajeg, Gede Primahadi Wijaya & I Made Rajeg. 2021. Supplementary materials for The lexicalisation of HAPPINESS in the Malayic varieties of Indonesia. Open Science Framework (OSF).

Or using the Zenodo repository version:

Rajeg, Gede Primahadi Wijaya & I Made Rajeg. 2021. Supplementary materials for The lexicalisation of HAPPINESS in the Malayic varieties of Indonesia. Zenodo.


The repository provides supplementary materials for our paper titled The lexicalisation of HAPPINESS in the Malayic varieties of Indonesia, presented at the International Seminar on Austronesian Languages and Literature IX (10 September 2021) (conference website). The materials include (i) the data; (ii) the R Markdown Notebook interleaving our paper-texts and R codes used for writing the whole paper and running the statistical analyses and visualisations; and (iii) the figures included in the paper (see the figures folder). The study is based on the open-access, large corpora of naturalistic colloquial Malay/Indonesian published by the Max Planck Institute for Evolutionary Anthropology (MPI EVA) Jakarta Field Station (JFS) (Gil et al. 2015).

Data description

The data folder holds the data used in this paper.

  • indo-prov-latlong.csv provides latitude and longitude data for the whole provinces in Indonesia
  • malayic_happy_freq_long_lat.tsv provides the original data for the latitude and longitude and those manually culled from Google Maps
  • malayic_happy.tsv contains the original raw data for the HAPPINESS lexicalisation
  • malayic_LIKE_df.tsv contains the distribution of morphs glossed as ‘to like’ in all regions
  • malayic_LIKE_df_WK_ENT.tsv contains distribution of morphs glossed as ‘to like’ in West Kalimantan and East Nusa Tenggara regions
  • non_acquisition_malayic_sessions_dataset_project.tsv contains the metadata information for the Malayic subset of the MPI EVA JFS corpora; the metadata include the session names, regions, languoid, word-count per session, genre, mode, among others

Required R packages

The following R packages are used in the data processing, statistical analyses, visualisation, and knitting the content of the R Markdown Notebook file (austronesian-paper-2021-gpwrajeg.Rmd) into MS Word format. Please make sure that they are installed in R to run the codes in the R Notebook and reproduce the results.

The R Session info sub-section below shows the R version (R Core Team 2021) and operating system used for this project.

R Session info

