- *Status: Completed
- Type: Specific
- Work Package: WP3
- Research Coordinators: Merijn Beeksma
- Coordinators for CLARIAH: Maarten van Gompel
- Participating Institutes: Radboud University, Nijmegen
- End-users: The CLIN28 shared-task organisers
- Developers: The CLIN28 shared-task organisers
- Interest Groups: Text
- Task IDs: T062 (FLAT), T108 (FoLiA)
A gold-standard corpus with spelling errors and corrections thereof needed to be established for the CLIN28 Shared Task (2018).
The efficacy of spelling correction systems by shared task participants was to be assessed. An annotation environment was needed so annotators could establish a gold standard.
(What is currently lacking that inhibits this research?)
Data was extracted from Wikipedia and stored in the FoLiA format.
We need an annotation environment with support for spelling correction in many forms, including complexities such as run-on errors, split-errors, missing words and redundant words. FLAT was used as a solution, as it, and the underlying FoLiA format, has significant correction spelling correction features.
References to related resources and publications and especially links to related use-cases:
- CLIN28 Shared Task: Spelling Correction
- Beeksma et al (2018) - Detecting and correcting spelling errors in high-quality Dutch Wikipedia text. CLIN Journal