From 03aee0c3dd43d2559f8467d510fce2689290afb5 Mon Sep 17 00:00:00 2001 From: Eddie Antonio Santos Date: Mon, 22 Jul 2019 12:31:22 -0600 Subject: [PATCH] Add rudimentary documentation on using hfstol in bulk. Closes #3. --- README.md | 40 ++++++++++++++++++++++++++++++++++++++++ 1 file changed, 40 insertions(+) diff --git a/README.md b/README.md index bacb1b1..0e04899 100644 --- a/README.md +++ b/README.md @@ -57,6 +57,46 @@ Using [Foma](https://fomafst.github.io/): PV/e+wâpamêw+V+TA+Cnj+Prs+3Sg+4Sg/PlO ê-wâpamât +Bulk lookups +------------ + +If you want to generate a large amount of word forms all at once, it is +recommended that you use `hfst-optimized-lookup` command, as this is the +fastest way to generate lookups. +You will provide analyses, one per line. For example, say I want to +conjugate `mîcisow`, and I have a file of analyses called `conjugations.txt`: + +``` +mîcisow+V+AI+Ind+Prs+1Sg +mîcisow+V+AI+Ind+Prs+2Sg +mîcisow+V+AI+Ind+Prs+3Sg +PV/e+mîcisow+V+AI+Cnj+Prs+1Sg +PV/e+mîcisow+V+AI+Cnj+Prs+2Sg +PV/e+mîcisow+V+AI+Cnj+Prs+3Sg +``` + +You can pipe this into `hfst-optimized-lookup`: + +```sh +$ cat conjugations.txt | hfst-optimized-lookup crk-normative-generator.hfstol +mîcisow+V+AI+Ind+Prs+1Sg nimîcison + +mîcisow+V+AI+Ind+Prs+2Sg kimîcison + +mîcisow+V+AI+Ind+Prs+3Sg mîcisow + +PV/e+mîcisow+V+AI+Cnj+Prs+1Sg ê-mîcisoyân + +PV/e+mîcisow+V+AI+Cnj+Prs+2Sg ê-mîcisoyan + +PV/e+mîcisow+V+AI+Cnj+Prs+3Sg ê-mîcisot +``` + +You can use the two-column output to map the input to the generated word +form. This is useful, since some analyses have multiple possible word +forms (e.g., `cactus+Pl` in English can be "cactuses" or "cacti"). + + Working on the FSTs -------------------