Skip to content

Commit

Permalink
Mentioned morpheme IDs in the documentation.
Browse files Browse the repository at this point in the history
  • Loading branch information
timarkh committed Sep 10, 2021
1 parent 601dd32 commit 8bdabaf
Show file tree
Hide file tree
Showing 7 changed files with 16 additions and 2 deletions.
Binary file modified docs/_build/doctrees/environment.pickle
Binary file not shown.
Binary file modified docs/_build/doctrees/paradigms.doctree
Binary file not shown.
5 changes: 5 additions & 0 deletions docs/_build/html/_sources/paradigms.rst.txt
Original file line number Diff line number Diff line change
Expand Up @@ -188,6 +188,11 @@ Sometimes it is convenient to put certain stem characters into the paradigm. For
gramm: gen,sg
gloss: GEN.SG

Morpheme IDs
^^^^^^^^^^^^

You can add an ``id`` field to morphemes and/or lexemes. IDs do not need to be unique and do not need to be assigned to each and every item. An analyzed word form will contain an ``id`` attribute if any of its parts had an ID. The value will contain the IDs of all its parts separated by a comma. Duplicate IDs will be truncated.

Incorporated words
^^^^^^^^^^^^^^^^^^

Expand Down
4 changes: 4 additions & 0 deletions docs/_build/html/paradigms.html
Original file line number Diff line number Diff line change
Expand Up @@ -212,6 +212,10 @@ <h3>Stem parts<a class="headerlink" href="#stem-parts" title="Permalink to this
</pre></div>
</div>
</div>
<div class="section" id="morpheme-ids">
<h3>Morpheme IDs<a class="headerlink" href="#morpheme-ids" title="Permalink to this headline"></a></h3>
<p>You can add an <code class="docutils literal notranslate"><span class="pre">id</span></code> field to morphemes and/or lexemes. IDs do not need to be unique and do not need to be assigned to each and every item. An analyzed word form will contain an <code class="docutils literal notranslate"><span class="pre">id</span></code> attribute if any of its parts had an ID. The value will contain the IDs of all its parts separated by a comma. Duplicate IDs will be truncated.</p>
</div>
<div class="section" id="incorporated-words">
<h3>Incorporated words<a class="headerlink" href="#incorporated-words" title="Permalink to this headline"></a></h3>
<p>There are no tools for handling productive incorporation yet in <code class="docutils literal notranslate"><span class="pre">uniparser-morph</span></code>. Nevertheless, some incorporation can be accounted for in the paradigms. That can work if you have a limited number of words, e.g. pronominal clitics, that can be incorporated or orthographically fused with other words (hosts). Such words can be described as morphemes with a special <code class="docutils literal notranslate"><span class="pre">LEX</span></code> tag. Units with a <code class="docutils literal notranslate"><span class="pre">LEX</span></code> tag are processed as ordinary morphemes during parsing, but a separate “subword” analysis is added for each of them as one of the postprocessing steps. A <code class="docutils literal notranslate"><span class="pre">LEX</span></code> tag should look like <code class="docutils literal notranslate"><span class="pre">LEX:xxx:yyy</span></code>, where <code class="docutils literal notranslate"><span class="pre">xxx</span></code> is the lemma and <code class="docutils literal notranslate"><span class="pre">yyy</span></code> contains grammatical tags separated by a semicolon. (A semicolon is used so that a morpheme can have both <code class="docutils literal notranslate"><span class="pre">LEX</span></code> tags and regular tags, which are separated by a comma.)</p>
Expand Down
2 changes: 1 addition & 1 deletion docs/_build/html/searchindex.js

Large diffs are not rendered by default.

5 changes: 5 additions & 0 deletions docs/paradigms.rst
Original file line number Diff line number Diff line change
Expand Up @@ -188,6 +188,11 @@ Sometimes it is convenient to put certain stem characters into the paradigm. For
gramm: gen,sg
gloss: GEN.SG

Morpheme IDs
^^^^^^^^^^^^

You can add an ``id`` field to morphemes and/or lexemes. IDs do not need to be unique and do not need to be assigned to each and every item. An analyzed word form will contain an ``id`` attribute if any of its parts had an ID. The value will contain the IDs of all its parts separated by a comma. Duplicate IDs will be truncated.

Incorporated words
^^^^^^^^^^^^^^^^^^

Expand Down
2 changes: 1 addition & 1 deletion setup.cfg
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[metadata]
name = uniparser-morph
version = 2.3.0
version = 2.4.0
author = Timofey Arkhangelskiy
author_email = timarkh@gmail.com
description = Rule-based, linguist-friendly (and rather slow) morphological analysis
Expand Down

0 comments on commit 8bdabaf

Please sign in to comment.