This corpus contain line-by-line aligned parallel text in several Uralic languages. The organisation of materials is still on-going, and especially Erzya needs to be converted to CoNLL-U format. Also the language tags need to be imported to CoNLL-U files of the other languages.
The corpus has been used in following works:
Bradley Jeremy, Kellner Alexandra & Partanen Niko 2018: Variation in word order in Permic and Mari varieties: a corpus-based investigation. Proceedings of the symposium "Language contacts of the nations of Volga-Ural region", Cheboksary, 21–24.5.2018.
Janurik Boglarka, Kantele Simo & Partanen Niko 2017: Three Uralic languages walk into a bar. Presentation in SLE 2017, Zurich.
The links to original data in National Library of Finland's Fenno-Ugrica collection are as follows:
- Erzya: http://urn.fi/URN:NBN:fi-fe2014082633380
- Hill Mari: http://urn.fi/URN:NBN:fi-fe2014100345029
- Komi-Permyak: http://urn.fi/URN:NBN:fi-fe2014101045137
- Komi-Zyrian: http://urn.fi/URN:NBN:fi-fe2014102045428
- Udmurt: http://urn.fi/URN:NBN:fi-fe2014092444879
The materials in Fenno-Ugrica are licensed as Public Domain.
Part of the Komi annotations are also in the Universal Dependencies Komi-Zyrian Lattice treebank. Those annotations are under CC-BY-SA license. However, the texts themselves are entirely copyright free.
The Russian translations are available from publ.lib.ru archive, where it is released on non-commercial license.