Polish test datsets for grammatical error correction.
dataset nam | lines | errors fleks | errors orth | errors punct | errors syntax | errors lex |
---|---|---|---|---|---|---|
human_expert_gec_dataset.jsonl | 1804 | 299 | 599 | 300 | 300 | 299 |
human_annotators_common_errors_10k.jsonl | 10000 | 2442 | 4925 | 2423 | 2459 | |
dataset name | lines | errors fleks | errors orth | errors punct | errors syntax | errors lex |
---|---|---|---|---|---|---|
synthetic_syntax_10k.jsonl | 10000 | 10336 | ||||
synthetic_nie_bez_jednostkowe_10k.json | 10000 | 10831 | ||||
synthetic_simple_fleks_10k.json | 10000 | 10000 | ||||
synthetic_ fleks_morf_regex_10K.jsonl | 10000 | 15158 | ||||
synthetic_orth_phonteic_10k.jsonl | 10000 | 10000 |
This work was supported by NCBiR grant number POIR.01.01.01-00-1128/18-00 “Technologia kontekstowego rozumienia języka pisanego na potrzeby poprawy błędów oraz automatycznej oceny zrozumiałości tekstu.”