Skip to content

Commit

Permalink
update Docusaurus
Browse files Browse the repository at this point in the history
  • Loading branch information
sarnoult committed Jun 12, 2024
0 parents commit 70e182d
Show file tree
Hide file tree
Showing 36 changed files with 16,797 additions and 0 deletions.
20 changes: 20 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# Dependencies
/node_modules

# Production
/build

# Generated files
.docusaurus
.cache-loader

# Misc
.DS_Store
.env.local
.env.development.local
.env.test.local
.env.production.local

npm-debug.log*
yarn-debug.log*
yarn-error.log*
132 changes: 132 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,132 @@
This repository contains the source documents for the [cltl.github.io](https://cltl.github.io) pages.

**NEW** The site is now published through GitHub actions when you push changes onto the main branch. Source docs are now kept on the `main` branch.

## Goal of these GitHub pages

The goal of this repository is to document and organize **all** our repositories, regardless of their maturity or level of activity: all repositories are welcome, whether they accompany a publication or published software, or whether they are work in progress, or even premature repositories. The GitHub pages are thus not only intended for finished, presentable code, but also for repositories that might otherwise be forgotten, and become deadwood.

If you create a CLTL repository, mention it in these pages!

#### README and ME

This repository contains two README files:

* this README contains information on the GitHub setup of this website, and instructions to modify the content and commit changes.
* The [README located in `src`](https://github.com/cltl/cltl.github.io/src/README.md) was generated by Docusaurus. It provides additional information on content modification and pointers to the Docusaurus documentation.

## Contributing to this website
Instructions to add and modify these pages are provided below. Before pushing changes, you may want to
1. work on a separate branch to avoid merge conflicts (see next)
2. [deploy the website locally](#deploying-the-site-locally)

### Collaborative setup
Clone this repository

```sh
$ git clone https://github.com/cltl/cltl.github.io
$ cd cltl.github.io
```

Or update its contents to the latest version

```sh
cltl.github.io$ git checkout main
cltl.github.io$ git pull
```

You should make changes in a local branch to prevent clashes when pushing your commits. Create a local branch, e.g. `mybranch`:

```sh
git checkout -b mybranch # or git switch -c mybranch
```

You can now access and modify the content in `docs`. You will find below instructions on:

- [Modifying an existing page](#modifying-an-existing-page)
- [Adding a page](#adding-a-page-to-an-existing-category)
- [Adding a category](#adding-a-category-section)

When you are ready to commit, you can check the website locally following these [instructions](#deploying-the-site-locally).

Now you can commit your changes on `mybranch`, go back to the `main` branch, update its contents and merge your commit:

```sh
git add --all
git commit -m "message for your commit"
git checkout main
git pull # in case somebody else pushed new commits in the meantime...
git merge mybranch
```

If somebody else made a commit while you were busy, you might have to resolve conflicts when merging `mybranch` (and merge your changes manually).

Push your changes to the `main` branch when you are done.

```sh
git push
```

The website should be published automatically after a few minutes.

You can now delete `mybranch` by running:

```sh
git branch -d mybranch
```

### Modifying an existing page
#### Content pages

You will find content pages under `docs/<category>/<page-title>`. These are markdown files that you can edit.

### Adding a page to an existing category

You will find that category's folder under `docs/<category>`. Add a markdown file `<page-name>.md` for your page there. The name is not important, as it is the 'page id' that serves for linking documents through the site. The file should have the following header:

```md
---
id: <page-id>
title: <page-title>
---
```

You should link this page by adding its category and id to the file `sidebars.json`. Subcategories for the sidebar can be specified directly in this file.

### Adding a category section

We now have a number of categories: `research`, `projects`, `resources`, `teaching` and `CLTL`.

To create a new category:

* Create a new folder under `docs`, e.g. `new-category`. This is not required for the website, but only to keep a clean project structure.
* Add a new page for this category, e.g. `new-page.md` with id `new-page`.
* Edit `docusaurus.config.js` to add a tab for that category in the navigation bar and link it to an index or a first document: look for `themeConfig`, and add a new item to the navigation bar.
* Edit `sidebars.json` to create a page sidebar for this category.

### Changing the website
#### Homepage

The index page for the website is located in `src/pages/index.js`. This is a React/Javascript file, that you can also edit.

#### Footer
The footer of the website can be modified through `website/core/Footer.js`.
You can [initialize a (separate) docusaurus project](https://docusaurus.io/docs/en/installation) to get a richer example for the index or the footer.

#### Styling
See `src/css/custom.css` for colors, and `src/pages/index.module.css` for the homepage.

## Deploying the site locally

#### Requirements

You will need [Node.js](https://nodejs.org/en/download/package-manager) version 18.0 or more.


#### Instructions

You can view the website locally (on <http://localhost:3000>) by running

```sh
npm start
```
3 changes: 3 additions & 0 deletions babel.config.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
module.exports = {
presets: [require.resolve('@docusaurus/core/lib/babel/preset')],
};
21 changes: 21 additions & 0 deletions docs/internal/internal.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
---
id: internal
title: Internal repositories
sidebar_label: Internal
---

## Writing

* [CLTL LaTeX bib](https://github.com/cltl/bibliography)
* [PhD Thesis tips](https://github.com/cltl/ThesisTips)

## Server and scripts

* [Kyoto scripts](https://github.com/cltl/cltl_kyoto_scripts)
* [Magic place](https://github.com/cltl/cltl-magicplace)
* [Kyoto quota](https://github.com/cltl/kyotoquota)

## Organizational websites

* [GitHub Pages](https://github.com/cltl/cltl.github.io)
* [web page with Jekyll](https://github.com/cltl/cltl.github.com) *(legacy)*
19 changes: 19 additions & 0 deletions docs/projects/clariah-plus.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
---
id: clariah-plus
title: CLARIAH+ VOC Use Case
---

The repositories listed here are all work in progress.

## Preprocessing for NER

The following repositories contain code to preprocess data for the Clariah WP6 VOC use case, for NER annotations in particular.

* [voc-clariah-scripts](https://github.com/cltl/clariah-voc-scripts): a first set of scripts for text and index extraction from VOC pre-TEI missives
* [TeiReader](https://github.com/cltl/teiReader): extraction of TEI texts to raw text for the *Chronicles* and *CLARIAH-PLUS VOC use case* projects
* [voc-missives](https://cltl.github.io/voc-missives): extraction of TEI-formatted missives to NAF and Conll, and integration of manually annotated entities
* [voc-missives-data](https://github.com/cltl/voc-missives-data): companion data repository to [voc-missives](https://cltl.github.io/voc-missives), containing the Generale Missiven corpus, and manual named-entity annotations

## Entity identification

* [entity-identification-from-scratch](https://github.com/cltl/entity-identification-from-scratch): entity identification by clustering
14 changes: 14 additions & 0 deletions docs/projects/dutch-framenet.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
---
id: dutch-framenet
title: Dutch FrameNet
---

The repositories listed here are all work in progress.

## Building and representing a FrameNet
* [Dutch FrameNet](https://cltl.github.io/DutchFrameNet/): we host Dutch FrameNet by means of this repository
* [FrameNetNLTK](https://github.com/cltl/FrameNetNLTK): build, load, and edit any FrameNet in the NLTK format, i.e., you can build a FrameNet in any language you prefer.
* [Frame Annotation tool](https://github.com/cltl/frame-annotation-tool): annotate both conceptually and referentially documents that refer to the same incident.
* [Lexical data frame annotation tool](https://github.com/cltl/LexicalDataDTDAnnotationTool): facilitates create the lexical data needed to annotate in the Frame Annotation tool
* [multiligual-wiki-event-pipeline](https://github.com/cltl/multilingual-wiki-event-pipeline): multilingual wiki event pipeline
* [MWEP on one incident](https://github.com/cltl/MWEP_on_one_incident): run the multilingual-wiki-event-pipeline on one incident
85 changes: 85 additions & 0 deletions docs/projects/spotter.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,85 @@
---
id: spotter
title: The Spotter Framework
---

This page documents the code, data and development of the SPOTTER Framework,
a framework designed to investigate referring expressions and convention formation within an increasing common ground in
Human-Robot Interaction.

The framework is described in [this paper](https://aclanthology.org/2024.lrec-main.1322).

### Paper abstract

Linguistic conventions that arise in dialogue reflect common ground and can increase communicative efficiency. Social robots that can understand these conventions and the process by which they arise have the potential to become efficient communication partners.
Nevertheless, it is unclear how robots can engage in convention formation when presented with both familiar and new information.
We introduce an adaptable game framework, **SPOTTER**, to study the dynamics of convention formation for visually grounded referring expressions in both human-human and human-robot interaction. Specifically, we seek to elicit convention forming for members of an *inner circle* of well-known individuals in the common ground, as opposed to individuals from an *outer circle*, who are unfamiliar. We release an initial corpus of 5000 utterances from two exploratory pilot experiments in spoken Dutch.
Different from previous work focussing on human-human interaction, we find that referring expressions for both familiar and unfamiliar individuals maintain their length throughout human-robot interaction. Stable conventions are formed, although these conventions can be impacted by distracting outer circle individuals. With our distinction between familiar and unfamiliar, we create a contrastive operationalization of common ground, which aids research into convention formation.

## The SPOTTER game
The SPOTTER game is a two-person reference game. It consists of six rounds
in which the goal is to locate the position of characters in a visual scene.
The visual scene for each player contains the same characters, but they are in
a different order. Players must communicate to find the position of each character
in the other player's picture. The game is designed to support Human-Robot Interaction. However, it can also be used to
investigate Human-Human Interaction.

### Links
- [Source code for the game and the dataset](https://github.com/leolani/spotter)
- [Code for running the Wizard of Oz experiment](https://github.com/leolani/spot-woz/tree/spot)

## Data and experiments
The SPOTTER framework has been used in two Human-Robot Interaction pilot studies. The language used in the pilot studies was Dutch. Here,
the robot behaviour was 'faked' using the Wizard-of-Oz approach. The two pilot studies used
two different versions of the game:
- **version 1**: The original version. This version uses cartoon-like figures. Players also only had to select whether a character was in the *same* or a *different* position.
- **version_2**: The latest, updated version. The cartoon-like faces have been replaced by more realistic faces. Players now have to select the exact position of a character in the other player's picture.

If you wish to use the framework for your experiments, we recommend to use the latest
version.

### Participants
The dataset contains interactions from 21 participants:
- 7 participants for Version 1
- 14 participants for Version 2

### Annotation

The dataset contains one **Utterance** per line. Utterances been annotated with the following features:
- **Start**: The start time of an utterance in seconds
- **End**: The end time of an utterance in seconds
- **Text**: The text in the utterance
- **Speaker**: The source of the utterance, either *Human* or *Robot*
- **Mention**: The part of the utterance which contains the description of a character
- **Character**: The gold annotation for the referent of the mention
- **Round**: The round of the game. Any utterances that are not part of a round (i.e. before or in between rounds) are annotated as '0'
- **Transaction Unit**: A unit of the interaction which contains the utterances and turns needed to resolve the mention for one referent and identify them in the picture
- **Transaction Unit Relation**: The relation between subsequent utterances within the same Transaction Unit. For a full list of relations, we refer to Appendix C of our paper.
- **Dialog Act (DA)**: An automatically extracted Dialog Act for the utterance
- **Dialog Act Confidence (DA_conf)**: The confidence score for the automatically extracted Dialog Act

## Citation

If you use our framework or data, please cite our paper:

`@inproceedings{kruijt-etal-2024-spotter-framework,
title = "{SPOTTER}: A Framework for Investigating Convention Formation in a Visually Grounded Human-Robot Reference Task",
author = "Kruijt, Jaap and
van Minkelen, Peggy and
Donatelli, Lucia and
Vossen, Piek T.J.M. and
Konijn, Elly and
Baier, Thomas",
editor = "Calzolari, Nicoletta and
Kan, Min-Yen and
Hoste, Veronique and
Lenci, Alessandro and
Sakti, Sakriani and
Xue, Nianwen",
booktitle = "Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)",
month = may,
year = "2024",
address = "Torino, Italy",
publisher = "ELRA and ICCL",
url = "https://aclanthology.org/2024.lrec-main.1322",
pages = "15202--15215"}`
12 changes: 12 additions & 0 deletions docs/research/distributional-semantics.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
---
id: distrib-sem
title: Distributional Semantics
---

Most of our work on distributional semantic models is collected at the repository [Semantic Space Navigation](https://cltl.github.io/semantic_space_navigation/).

Other repositories:

* [meaning_space](https://github.com/cltl/meaning_space)
* [semantic_property_dataset](https://github.com/cltl/semantic_property_dataset)
* [variword](https://github.com/cltl/variword)
34 changes: 34 additions & 0 deletions docs/research/entity-detection-linking.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
---
id: entity-detect
title: Entity Detection and Linking
---

## Long-tail in Entity linking

* [EL-long-tail-phenomena](https://cltl.github.io/EL-long-tail-phenomena/): Systematic study of long tail phenomena in the task of entity linking.
* [LongTailAnnotation](https://github.com/cltl/LongTailAnnotation): Annotation tool for data2text approaches
* [LongTailIdentity](https://github.com/cltl/LongTailIdentity): Generating profiles of long tail identities from text

### SemEval2018
* [LongTailQATask](https://cltl.github.io/LongTailQATask/). Code for SemEval-2018 task \#5 "Counting events and participants in the long tail".
* [SemEval2018-5_Postprocessing](https://github.com/cltl/SemEval2018-5_Postprocessing): Postprocessing steps for the SemEval-2018 task 5: Counting events and participants in the long tail

## Human-Like EL

* A human-inspired Entity linking system in progress can be found in the repository [HumanLikeEL](https://cltl.github.io/HumanLikeEL/)
* [HELAnalysis](https://github.com/cltl/HELAnalysis): Analysis of the Human-like Entity Linking system

## More on Entity Linking

* [ELBaselines](https://github.com/cltl/ELBaselines): This repository aims at creating baseline results for Entity Linking, by running a text against the state-of-the-art systems for entity linking, using their most standard configuration.
* [entity-link-postprocess](https://github.com/cltl/entity-link-postprocess)

See also our [entity-linking systems](../resources/entity-linking.md).

## Entity Detection and Typing

* [multilingual-finegrained-entity-typing](https://github.com/cltl/multilingual-finegrained-entity-typing)
* [entity-identification-from-scratch](https://github.com/cltl/entity-identification-from-scratch): entity identification by clustering



22 changes: 22 additions & 0 deletions docs/research/event-detection-and-coreference.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
---
id: event-detect
title: Event Detection and Coreference
---

## Event detection

* The [TimeMLEventTrigger repository](https://cltl.github.io/TimeMLEventTrigger) contains conversion scripts and models to automatically extract event triggers following the TimeML Annotation Guidelines.
* [CatFrameNet](https://github.com/cltl/CatFrameNet)
* [ceopathfinder](https://github.com/cltl/ceopathfinder): Finds a path of circumstantial relations between events on the basis of the CircumstantialEventOntology
* [FrameNet-annotation-tool](https://github.com/cltl/FrameNet-annotation-tool): Python-based command-line tool for FrameNet annotation
* [nwr-semeval2018-5](https://github.com/cltl/nwr-semeval2018-5): NewsReader participation to task 5 of SemEval2018
* [OntoTagger](https://github.com/cltl/OntoTagger): Ontotagger inserts (semantic) labels into KAF representation on the basis of lemma or wordnet synset representations of text
* [TimeMLEventTrigger](https://github.com/cltl/TimeMLEventTrigger)
* [TripleEvaluation](https://github.com/cltl/TripleEvaluation): This program evaluates text mining output from text on the basis of a triple representation.

## Event coreference

* [coreference-evaluation](https://github.com/cltl/coreference-evaluation): Evaluation package for event coreference using the reference-scorer
* [EventCoreference](https://github.com/cltl/EventCoreference): Compares descriptions of events within and across documents to decide if they refer to the same events. Also converts NAF to GRASP-RDF and SEM-RDF.
* [reference-coreference-scorers](https://github.com/cltl/reference-coreference-scorers): This is the reference implementation of commonly used coreference metrics.
* [sem10scorer-stability](https://github.com/cltl/sem10scorer-stability): Testing the stability of sem10scorer
11 changes: 11 additions & 0 deletions docs/research/image-sound.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
---
id: image-sound
title: Language of Image and Sound
---

* [DutchDescriptions](https://github.com/cltl/DutchDescriptions): Dutch descriptions for the Flickr30K validation and test data, plus a cross-lingual comparison tool.
* [GroundedTranslation](https://github.com/cltl/GroundedTranslation): Multilingual image description
* [Image-Specificity](https://github.com/cltl/Image-Specificity): Reimplementation of Jas &amp; Parikh's (2015) image specificity metric, using word embeddings.
* [SoundBrowser](https://github.com/cltl/SoundBrowser): Interface for the VU Sound Corpus, made in Flask.
* [Spoken-versus-Written](https://github.com/cltl/Spoken-versus-Written): Code and data for our VarDial 2018 paper on spoken versus written image descriptions
* [VU-Sound-Corpus](https://github.com/cltl/VU-Sound-Corpus): Collection of crowd-sourced annotations for the Freesound database.
Loading

0 comments on commit 70e182d

Please sign in to comment.