update Docusaurus

cltl · Jun 12, 2024 · 70e182d · 70e182d
commit 70e182d
Show file tree

Hide file tree

Showing 36 changed files with 16,797 additions and 0 deletions.
diff --git a/.gitignore b/.gitignore
@@ -0,0 +1,20 @@
+# Dependencies
+/node_modules
+
+# Production
+/build
+
+# Generated files
+.docusaurus
+.cache-loader
+
+# Misc
+.DS_Store
+.env.local
+.env.development.local
+.env.test.local
+.env.production.local
+
+npm-debug.log*
+yarn-debug.log*
+yarn-error.log*
diff --git a/README.md b/README.md
@@ -0,0 +1,132 @@
+This repository contains the source documents for the [cltl.github.io](https://cltl.github.io) pages.
+
+**NEW** The site is now published through GitHub actions when you push changes onto the main branch. Source docs are now kept on the `main` branch.
+
+## Goal of these GitHub pages
+
+The goal of this repository is to document and organize **all** our repositories, regardless of their maturity or level of activity: all repositories are welcome, whether they accompany a publication or published software, or whether they are work in progress, or even premature repositories. The GitHub pages are thus not only intended for finished, presentable code, but also for repositories that might otherwise be forgotten, and become deadwood.
+
+If you create a CLTL repository, mention it in these pages!
+
+#### README and ME 
+
+This repository contains two README files:
+
+* this README contains information on the GitHub setup of this website, and instructions to modify the content and commit changes.
+* The [README located in `src`](https://github.com/cltl/cltl.github.io/src/README.md) was generated by Docusaurus. It provides additional information on content modification and pointers to the Docusaurus documentation.
+
+## Contributing to this website
+Instructions to add and modify these pages are provided below. Before pushing changes, you may want to 
+1. work on a separate branch to avoid merge conflicts (see next)
+2. [deploy the website locally](#deploying-the-site-locally)
+
+### Collaborative setup
+Clone this repository
+
+```sh
+$ git clone https://github.com/cltl/cltl.github.io
+$ cd cltl.github.io
+```
+
+Or update its contents to the latest version
+
+```sh
+cltl.github.io$ git checkout main
+cltl.github.io$ git pull
+```
+
+You should make changes in a local branch to prevent clashes when pushing your commits. Create a local branch, e.g. `mybranch`:
+
+```sh
+git checkout -b mybranch    # or git switch -c mybranch
+```
+
+You can now access and modify the content in `docs`. You will find below instructions on: 
+
+- [Modifying an existing page](#modifying-an-existing-page)
+- [Adding a page](#adding-a-page-to-an-existing-category)
+- [Adding a category](#adding-a-category-section)
+
+When you are ready to commit, you can check the website locally following these [instructions](#deploying-the-site-locally).
+
+Now you can commit your changes on `mybranch`, go back to the `main` branch, update its contents and merge your commit:
+
+```sh
+git add --all
+git commit -m "message for your commit"
+git checkout main
+git pull 			# in case somebody else pushed new commits in the meantime...
+git merge mybranch 
+```
+
+If somebody else made a commit while you were busy, you might have to resolve conflicts when merging `mybranch` (and merge your changes manually).
+
+Push your changes to the `main` branch when you are done.
+
+```sh
+git push 
+```
+
+The website should be published automatically after a few minutes.
+
+You can now delete `mybranch` by running:
+
+```sh
+git branch -d mybranch
+```
+
+### Modifying an existing page
+#### Content pages
+
+You will find content pages under `docs/<category>/<page-title>`. These are markdown files that you can edit.
+
+### Adding a page to an existing category
+
+You will find that category's folder under `docs/<category>`. Add a markdown file `<page-name>.md` for your page there. The name is not important, as it is the 'page id' that serves for linking documents through the site. The file should have the following header:
+
+```md
+---
+id: <page-id>
+title: <page-title>
+---
+```
+
+You should link this page by adding its category and id to the file `sidebars.json`. Subcategories for the sidebar can be specified directly in this file. 
+
+### Adding a category section
+
+We now have a number of categories: `research`, `projects`, `resources`, `teaching` and `CLTL`.
+
+To create a new category:
+
+* Create a new folder under `docs`, e.g. `new-category`. This is not required for the website, but only to keep a clean project structure.
+* Add a new page for this category, e.g. `new-page.md` with id `new-page`.
+* Edit `docusaurus.config.js` to add a tab for that category in the navigation bar and link it to an index or a first document: look for `themeConfig`, and add a new item to the navigation bar.
+* Edit `sidebars.json` to create a page sidebar for this category.
+
+### Changing the website
+#### Homepage
+
+The index page for the website is located in `src/pages/index.js`. This is a React/Javascript file, that you can also edit. 
+
+#### Footer
+The footer of the website can be modified through `website/core/Footer.js`.
+You can [initialize a (separate) docusaurus project](https://docusaurus.io/docs/en/installation) to get a richer example for the index or the footer.
+
+#### Styling
+See `src/css/custom.css` for colors, and `src/pages/index.module.css` for the homepage.
+
+## Deploying the site locally 
+
+#### Requirements
+
+You will need [Node.js](https://nodejs.org/en/download/package-manager) version 18.0 or more. 
+
+
+#### Instructions
+
+You can view the website locally (on <http://localhost:3000>) by running
+
+```sh
+npm start
+```
diff --git a/babel.config.js b/babel.config.js
@@ -0,0 +1,3 @@
+module.exports = {
+  presets: [require.resolve('@docusaurus/core/lib/babel/preset')],
+};
diff --git a/docs/internal/internal.md b/docs/internal/internal.md
@@ -0,0 +1,21 @@
+---
+id: internal
+title: Internal repositories
+sidebar_label: Internal
+---
+
+## Writing
+
+* [CLTL LaTeX bib](https://github.com/cltl/bibliography)
+* [PhD Thesis tips](https://github.com/cltl/ThesisTips)
+
+## Server and scripts
+
+* [Kyoto scripts](https://github.com/cltl/cltl_kyoto_scripts)
+* [Magic place](https://github.com/cltl/cltl-magicplace)
+* [Kyoto quota](https://github.com/cltl/kyotoquota)
+
+## Organizational websites
+
+* [GitHub Pages](https://github.com/cltl/cltl.github.io)
+* [web page with Jekyll](https://github.com/cltl/cltl.github.com) *(legacy)*
diff --git a/docs/projects/clariah-plus.md b/docs/projects/clariah-plus.md
@@ -0,0 +1,19 @@
+---
+id: clariah-plus
+title: CLARIAH+ VOC Use Case
+---
+
+The repositories listed here are all work in progress.
+
+## Preprocessing for NER 
+
+The following repositories contain code to preprocess data for the Clariah WP6 VOC use case, for NER annotations in particular.
+
+* [voc-clariah-scripts](https://github.com/cltl/clariah-voc-scripts): a first set of scripts for text and index extraction from VOC pre-TEI missives
+* [TeiReader](https://github.com/cltl/teiReader): extraction of TEI texts to raw text for the *Chronicles* and *CLARIAH-PLUS VOC use case* projects
+* [voc-missives](https://cltl.github.io/voc-missives): extraction of TEI-formatted missives to NAF and Conll, and integration of manually annotated entities 
+* [voc-missives-data](https://github.com/cltl/voc-missives-data): companion data repository to [voc-missives](https://cltl.github.io/voc-missives), containing the Generale Missiven corpus, and manual named-entity annotations  
+
+## Entity identification
+
+* [entity-identification-from-scratch](https://github.com/cltl/entity-identification-from-scratch): entity identification by clustering 
diff --git a/docs/projects/dutch-framenet.md b/docs/projects/dutch-framenet.md
@@ -0,0 +1,14 @@
+---
+id: dutch-framenet
+title: Dutch FrameNet
+---
+
+The repositories listed here are all work in progress.
+
+## Building and representing a FrameNet
+* [Dutch FrameNet](https://cltl.github.io/DutchFrameNet/): we host Dutch FrameNet by means of this repository
+* [FrameNetNLTK](https://github.com/cltl/FrameNetNLTK): build, load, and edit any FrameNet in the NLTK format, i.e., you can build a FrameNet in any language you prefer.
+* [Frame Annotation tool](https://github.com/cltl/frame-annotation-tool): annotate both conceptually and referentially documents that refer to the same incident.
+* [Lexical data frame annotation tool](https://github.com/cltl/LexicalDataDTDAnnotationTool): facilitates create the lexical data needed to annotate in the Frame Annotation tool
+* [multiligual-wiki-event-pipeline](https://github.com/cltl/multilingual-wiki-event-pipeline): multilingual wiki event pipeline
+* [MWEP on one incident](https://github.com/cltl/MWEP_on_one_incident): run the multilingual-wiki-event-pipeline on one incident
diff --git a/docs/projects/spotter.md b/docs/projects/spotter.md
@@ -0,0 +1,85 @@
+---
+id: spotter
+title: The Spotter Framework
+---
+
+This page documents the code, data and development of the SPOTTER Framework,
+a framework designed to investigate referring expressions and convention formation within an increasing common ground in
+Human-Robot Interaction.
+
+The framework is described in [this paper](https://aclanthology.org/2024.lrec-main.1322).
+
+### Paper abstract
+
+Linguistic conventions that arise in dialogue reflect common ground and can increase communicative efficiency. Social robots that can understand these conventions and the process by which they arise have the potential to become efficient communication partners. 
+Nevertheless, it is unclear how robots can engage in convention formation when presented with both familiar and new information.
+We introduce an adaptable game framework, **SPOTTER**, to study the dynamics of convention formation for visually grounded referring expressions in both human-human and human-robot interaction. Specifically, we seek to elicit convention forming for members of an *inner circle* of well-known individuals in the common ground, as opposed to individuals from an *outer circle*, who are unfamiliar. We release an initial corpus of 5000 utterances from two exploratory pilot experiments in spoken Dutch.
+Different from previous work focussing on human-human interaction, we find that referring expressions for both familiar and unfamiliar individuals maintain their length throughout human-robot interaction. Stable conventions are formed, although these conventions can be impacted by distracting outer circle individuals. With our distinction between familiar and unfamiliar, we create a contrastive operationalization of common ground, which aids research into convention formation.
+
+## The SPOTTER game
+The SPOTTER game is a two-person reference game. It consists of six rounds
+in which the goal is to locate the position of characters in a visual scene.
+The visual scene for each player contains the same characters, but they are in
+a different order. Players must communicate to find the position of each character
+in the other player's picture. The game is designed to support Human-Robot Interaction. However, it can also be used to
+investigate Human-Human Interaction.
+
+### Links
+- [Source code for the game and the dataset](https://github.com/leolani/spotter)
+- [Code for running the Wizard of Oz experiment](https://github.com/leolani/spot-woz/tree/spot)
+
+## Data and experiments
+The SPOTTER framework has been used in two Human-Robot Interaction pilot studies. The language used in the pilot studies was Dutch. Here,
+the robot behaviour was 'faked' using the Wizard-of-Oz approach. The two pilot studies used 
+two different versions of the game:
+- **version 1**: The original version. This version uses cartoon-like figures. Players also only had to select whether a character was in the *same* or a *different* position.
+- **version_2**: The latest, updated version. The cartoon-like faces have been replaced by more realistic faces. Players now have to select the exact position of a character in the other player's picture.
+
+If you wish to use the framework for your experiments, we recommend to use the latest
+version. 
+
+### Participants
+The dataset contains interactions from 21 participants:
+- 7 participants for Version 1
+- 14 participants for Version 2
+
+### Annotation
+
+The dataset contains one **Utterance** per line. Utterances been annotated with the following features:
+- **Start**: The start time of an utterance in seconds
+- **End**: The end time of an utterance in seconds
+- **Text**: The text in the utterance
+- **Speaker**: The source of the utterance, either *Human* or *Robot*
+- **Mention**: The part of the utterance which contains the description of a character
+- **Character**: The gold annotation for the referent of the mention
+- **Round**: The round of the game. Any utterances that are not part of a round (i.e. before or in between rounds) are annotated as '0'
+- **Transaction Unit**: A unit of the interaction which contains the utterances and turns needed to resolve the mention for one referent and identify them in the picture
+- **Transaction Unit Relation**: The relation between subsequent utterances within the same Transaction Unit. For a full list of relations, we refer to Appendix C of our paper.
+- **Dialog Act (DA)**: An automatically extracted Dialog Act for the utterance
+- **Dialog Act Confidence (DA_conf)**: The confidence score for the automatically extracted Dialog Act
+
+## Citation
+
+If you use our framework or data, please cite our paper:
+
+`@inproceedings{kruijt-etal-2024-spotter-framework,
+    title = "{SPOTTER}: A Framework for Investigating Convention Formation in a Visually Grounded Human-Robot Reference Task",
+    author = "Kruijt, Jaap  and
+      van Minkelen, Peggy  and
+      Donatelli, Lucia  and
+      Vossen, Piek T.J.M.  and
+      Konijn, Elly  and
+      Baier, Thomas",
+    editor = "Calzolari, Nicoletta  and
+      Kan, Min-Yen  and
+      Hoste, Veronique  and
+      Lenci, Alessandro  and
+      Sakti, Sakriani  and
+      Xue, Nianwen",
+    booktitle = "Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)",
+    month = may,
+    year = "2024",
+    address = "Torino, Italy",
+    publisher = "ELRA and ICCL",
+    url = "https://aclanthology.org/2024.lrec-main.1322",
+    pages = "15202--15215"}`
diff --git a/docs/research/distributional-semantics.md b/docs/research/distributional-semantics.md
@@ -0,0 +1,12 @@
+---
+id: distrib-sem
+title: Distributional Semantics
+---
+
+Most of our work on distributional semantic models is collected at the repository [Semantic Space Navigation](https://cltl.github.io/semantic_space_navigation/).
+
+Other repositories:
+
+* [meaning_space](https://github.com/cltl/meaning_space)
+* [semantic_property_dataset](https://github.com/cltl/semantic_property_dataset)
+* [variword](https://github.com/cltl/variword)
diff --git a/docs/research/entity-detection-linking.md b/docs/research/entity-detection-linking.md
@@ -0,0 +1,34 @@
+---
+id: entity-detect
+title: Entity Detection and Linking
+---
+
+## Long-tail in Entity linking
+
+* [EL-long-tail-phenomena](https://cltl.github.io/EL-long-tail-phenomena/): Systematic study of long tail phenomena in the task of entity linking. 
+* [LongTailAnnotation](https://github.com/cltl/LongTailAnnotation): Annotation tool for data2text approaches
+* [LongTailIdentity](https://github.com/cltl/LongTailIdentity): Generating profiles of long tail identities from text 
+
+### SemEval2018
+* [LongTailQATask](https://cltl.github.io/LongTailQATask/). Code for SemEval-2018 task \#5 "Counting events and participants in the long tail".
+* [SemEval2018-5_Postprocessing](https://github.com/cltl/SemEval2018-5_Postprocessing): Postprocessing steps for the SemEval-2018 task 5: Counting events and participants in the long tail
+
+## Human-Like EL
+
+* A human-inspired Entity linking system in progress can be found in the repository [HumanLikeEL](https://cltl.github.io/HumanLikeEL/) 
+* [HELAnalysis](https://github.com/cltl/HELAnalysis): Analysis of the Human-like Entity Linking system
+
+## More on Entity Linking
+
+* [ELBaselines](https://github.com/cltl/ELBaselines): This repository aims at creating baseline results for Entity Linking, by running a text against the state-of-the-art systems for entity linking, using their most standard configuration.
+* [entity-link-postprocess](https://github.com/cltl/entity-link-postprocess)
+
+See also our [entity-linking systems](../resources/entity-linking.md).
+
+## Entity Detection and Typing
+
+* [multilingual-finegrained-entity-typing](https://github.com/cltl/multilingual-finegrained-entity-typing)
+* [entity-identification-from-scratch](https://github.com/cltl/entity-identification-from-scratch): entity identification by clustering 
+
+
+
diff --git a/docs/research/event-detection-and-coreference.md b/docs/research/event-detection-and-coreference.md
@@ -0,0 +1,22 @@
+---
+id: event-detect
+title: Event Detection and Coreference
+---
+
+## Event detection
+
+* The [TimeMLEventTrigger repository](https://cltl.github.io/TimeMLEventTrigger) contains conversion scripts and models to automatically extract event triggers following the TimeML Annotation Guidelines.
+* [CatFrameNet](https://github.com/cltl/CatFrameNet)
+* [ceopathfinder](https://github.com/cltl/ceopathfinder): Finds a path of circumstantial relations between events on the basis of the CircumstantialEventOntology
+* [FrameNet-annotation-tool](https://github.com/cltl/FrameNet-annotation-tool): Python-based command-line tool for FrameNet annotation
+* [nwr-semeval2018-5](https://github.com/cltl/nwr-semeval2018-5): NewsReader participation to task 5 of SemEval2018
+* [OntoTagger](https://github.com/cltl/OntoTagger): Ontotagger inserts (semantic) labels into KAF representation on the basis of lemma or wordnet synset representations of text
+* [TimeMLEventTrigger](https://github.com/cltl/TimeMLEventTrigger)
+* [TripleEvaluation](https://github.com/cltl/TripleEvaluation): This program evaluates text mining output from text on the basis of a triple representation.
+
+## Event coreference
+
+* [coreference-evaluation](https://github.com/cltl/coreference-evaluation): Evaluation package for event coreference using the reference-scorer
+* [EventCoreference](https://github.com/cltl/EventCoreference): Compares descriptions of events within and across documents to decide if they refer to the same events. Also converts NAF to GRASP-RDF and SEM-RDF.
+* [reference-coreference-scorers](https://github.com/cltl/reference-coreference-scorers): This is the reference implementation of commonly used coreference metrics.
+* [sem10scorer-stability](https://github.com/cltl/sem10scorer-stability): Testing the stability of sem10scorer
diff --git a/docs/research/image-sound.md b/docs/research/image-sound.md
@@ -0,0 +1,11 @@
+---
+id: image-sound
+title: Language of Image and Sound
+---
+
+* [DutchDescriptions](https://github.com/cltl/DutchDescriptions): Dutch descriptions for the Flickr30K validation and test data, plus a cross-lingual comparison tool.
+* [GroundedTranslation](https://github.com/cltl/GroundedTranslation): Multilingual image description
+* [Image-Specificity](https://github.com/cltl/Image-Specificity): Reimplementation of Jas &amp; Parikh's (2015) image specificity metric, using word embeddings.
+* [SoundBrowser](https://github.com/cltl/SoundBrowser): Interface for the VU Sound Corpus, made in Flask.
+* [Spoken-versus-Written](https://github.com/cltl/Spoken-versus-Written): Code and data for our VarDial 2018 paper on spoken versus written image descriptions
+* [VU-Sound-Corpus](https://github.com/cltl/VU-Sound-Corpus): Collection of crowd-sourced annotations for the Freesound database.