Weights are ignored in monolingual dictionary entries #44

marcriera · 2019-03-14T12:12:01Z

Given the following paradigms and entries:

<pardef n="liv/e__vblex">
  <e>       <p><l>e</l>         <r>e<s n="vblex"/><s n="inf"/></r></p></e>
  <e>       <p><l>e</l>         <r>e<s n="vblex"/><s n="imp"/></r></p></e>
  <e>       <p><l>ed</l>        <r>e<s n="vblex"/><s n="pp"/></r></p></e>
  <e w="1"> <p><l>ing</l>       <r>e<s n="vblex"/><s n="pprs"/></r></p></e>
  <e w="3"> <p><l>ing</l>       <r>e<s n="vblex"/><s n="ger"/></r></p></e>
  <e w="2"> <p><l>ing</l>       <r>e<s n="vblex"/><s n="subs"/></r></p></e>
  <e>       <p><l>e</l>         <r>e<s n="vblex"/><s n="pres"/></r></p></e>
  <e>       <p><l>es</l>        <r>e<s n="vblex"/><s n="pres"/><s n="p3"/><s n="sg"/></r></p></e>
  <e>       <p><l>ed</l>        <r>e<s n="vblex"/><s n="past"/></r></p></e>
</pardef>
<pardef n="house__n">
  <e>       <p><l></l>          <r><s n="n"/><s n="sg"/></r></p></e>
  <e r="RL"><p><l>'s</l>        <r><s n="n"/><s n="sg"/><j/>'s<s n="gen"/></r></p></e>
  <e>       <p><l>s</l>         <r><s n="n"/><s n="pl"/></r></p></e>
  <e r="RL"><p><l>s'</l>        <r><s n="n"/><s n="pl"/><j/>'s<s n="gen"/></r></p></e>
</pardef>

<e lm="house" w="1">     <i>house</i><par n="house__n"/></e>
<e lm="house" w="2">     <i>hous</i><par n="liv/e__vblex"/></e>

lt-proc seems to ignore the weights for the entries:

$ echo "house" | lt-proc -wW eng-cat.automorf.bin
^house/house<n><sg><W:0.000000>/house<vblex><inf><W:0.000000>/house<vblex><pres><W:0.000000>/house<vblex><imp><W:0.000000>$

The expected result would be:

$ echo "house" | lt-proc -wW eng-cat.automorf.bin
^house/house<n><sg><W:1.000000>/house<vblex><inf><W:2.000000>/house<vblex><pres><W:2.000000>/house<vblex><imp><W:2.000000>$

However, the weights work fine when they are used inside a paradigm:

$ echo "housing" | lt-proc -wW eng-cat.automorf.bin
^housing/housing<n><sg><W:0.000000>/house<vblex><pprs><W:1.000000>/house<vblex><subs><W:2.000000>/house<vblex><ger><W:3.000000>$

The text was updated successfully, but these errors were encountered:

unhammer · 2019-03-14T12:37:43Z

@Techievena

Techievena · 2019-03-15T05:27:52Z

@unhammer I will definitely look into it.

AMR-KELEG · 2019-03-25T01:47:19Z

I might be facing the same problem.
I am using an input written in .att format to generate a weighted transducer.

0       1       c       c       0.000000
1       2       a       a       0.000000
2       3       t       t       0.000000
3       4       @0@     <n>     0.000000
3       5       s       <n>     0.000000
4       2.000000
5       6       @0@     <pl>    0.000000
6       1.000000

I generate the transducer using lt-comp lr in.att apert_model.
The output of lt-print apert_model is:

0       1       c       c       0.000000
1       2       a       a       0.000000
2       3       t       t       0.000000
3       4       ε       <n>     0.000000
3       5       s       <n>     0.000000
4       7       ε       ε       2.000000
5       6       ε       <pl>    0.000000
6       7       ε       ε       1.000000
7       0.000000

which seems to be correct.

However, the output of the echo 'cat' | lt-proc apert_model -W seems to ignore the weights.
^cat/cat<n><W:0.000000>$

AMR-KELEG · 2019-03-25T02:14:12Z

I think the bug might be related to this line and its following lines:

lttoolbox/lttoolbox/state.cc

Line 607 in f73c541

response = NFinals(response, max_analyses, max_weight_classes);

When computing the overall weight of an analysis, The weight of the final state was ignored. Fix apertium#44

TinoDidriksen · 2019-04-03T12:17:14Z

I guess editing the comment on #49 to remove "Fix #44" was not enough to make Github understand it was not a closing merge.

AMR-KELEG · 2019-04-03T13:15:05Z

@marcriera I think the bug is with the lt-comp command.
Is lt-comp used in the apertium-eng to compile the dictionary?

I have prepared a sample dictionary:

<dictionary>
  <alphabet>ÀÁÂÄÆÇÈÉÊËÌÍÎÏÑÒÓÔÖÙÚÛÜàáâäçèéêëìíîïñòóôöùúûüABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz</alphabet>
  <sdefs>
    <sdef n="n"   c="Noun"/>  
    <sdef n="vblex"   c="Verb"/> 
    <sdef n="p1"  c="First person"/> 
    <sdef n="p3"  c="Third person"/> 
    <sdef n="sg"  c="Singular"/> 
    <sdef n="pl"  c="Plural"/> 
    <sdef n="pres"  c="Present (tense)"/> 
    <sdef n="past"  c="Past"/> 
    <sdef n="imp"   c="Imperative"/> 
    <sdef n="inf"   c="Infinitive"/> 
    <sdef n="pp"  c="Past participle"/> 
    <sdef n="subs"  c="Verbal noun"/> 
    <sdef n="pprs"  c="Present participle"/> 
    <sdef n="ger"   c="Gerund"/> 
  </sdefs>
  <pardefs>
    <pardef n="liv/e__vblex">
      <e>       <p><l>e</l>         <r>e<s n="vblex"/><s n="inf"/></r></p></e>
      <e>       <p><l>e</l>         <r>e<s n="vblex"/><s n="imp"/></r></p></e>
      <e>       <p><l>ed</l>        <r>e<s n="vblex"/><s n="pp"/></r></p></e>
      <e w="1"> <p><l>ing</l>       <r>e<s n="vblex"/><s n="pprs"/></r></p></e>
      <e w="3"> <p><l>ing</l>       <r>e<s n="vblex"/><s n="ger"/></r></p></e>
      <e w="2"> <p><l>ing</l>       <r>e<s n="vblex"/><s n="subs"/></r></p></e>
      <e>       <p><l>e</l>         <r>e<s n="vblex"/><s n="pres"/></r></p></e>
      <e>       <p><l>es</l>        <r>e<s n="vblex"/><s n="pres"/><s n="p3"/><s n="sg"/></r></p></e>
      <e>       <p><l>ed</l>        <r>e<s n="vblex"/><s n="past"/></r></p></e>
    </pardef>
    <pardef n="house__n">
      <e>       <p><l></l>          <r><s n="n"/><s n="sg"/></r></p></e>
      <e r="RL"><p><l>'s</l>        <r><s n="n"/><s n="sg"/><j/>'s<s n="gen"/></r></p></e>
      <e>       <p><l>s</l>         <r><s n="n"/><s n="pl"/></r></p></e>
      <e r="RL"><p><l>s'</l>        <r><s n="n"/><s n="pl"/><j/>'s<s n="gen"/></r></p></e>
    </pardef>
  </pardefs>
<section id="main" type="standard">
  <e lm="house" w="1">     <i>house</i><par n="house__n"/></e>
  <e lm="house" w="2">     <i>hous</i><par n="liv/e__vblex"/></e>
</section>
</dictionary>

And the output transducer isn't correct

0	1	h	h	0.000000	
1	2	o	o	0.000000	
2	3	u	u	0.000000	
3	4	s	s	0.000000	
4	5	e	e	0.000000	 # THIS EDGE SHOULD HAVE WEIGHT=2
4	6	e	e	1.000000 # THIS EDGE HAVE A CORRECT WEIGHT!!	
4	7	i	e	0.000000	
5	8	ε	<vblex>	0.000000	
5	9	d	<vblex>	0.000000	
5	10	s	<vblex>	0.000000	
6	11	ε	<n>	0.000000	
6	12	s	<n>	0.000000	
7	13	n	<vblex>	0.000000	
8	14	ε	<inf>	0.000000	
8	14	ε	<imp>	0.000000	
8	14	ε	<pres>	0.000000	
9	14	ε	<pp>	0.000000	
9	14	ε	<past>	0.000000	
10	15	ε	<pres>	0.000000	
11	14	ε	<sg>	0.000000	
12	14	ε	<pl>	0.000000	
13	14	g	<pprs>	1.000000	
13	14	g	<ger>	3.000000	
13	14	g	<subs>	2.000000	
15	11	ε	<p3>	0.000000	
14	0.000000

When I use the command echo "house" | lt-proc house.bin -W I get only correct weights for the noun analysis:

^house/house<vblex><inf><W:0.000000>/house<vblex><imp><W:0.000000>/house<vblex><pres><W:0.000000>/house<n><sg><W:1.000000>$

flammie · 2019-04-03T14:20:54Z

the correct weighting here is not trivial (so there seems to be something wrong in the compilation part too), keep in mind that the prefix "hous" is shared by both verb and noun, and the verb that needs that weight of 2 needs it also for "housing" which does not go through the
"4 5 e e" arc.

Here's the hfst + lexc equivalent for reference:

 $ ▓▒cat house.lexc 
Multichar_Symbols
%<n%>
%<vblex%>
%<p1%>
%<p3%>
%<sg%>
%<pl%>
%<pres%>
%<past%>
%<imp%>
%<inf%>
%<pp%>
%<subs%>
%<pprs%>
%<ger%>
%<gen%>

LEXICON Root

house:house house__n "weight: 1" ;
hous:hous liv/e__vblex "weight: 2" ;

LEXICON liv/e__vblex

e%<vblex%>%<inf%>:e # ;
e%<vblex%>%<imp%>:e # ;
e%<vblex%>%<pp%>:ed # ;
e%<vblex%>%<pprs%>:ing # "weight: 1" ;
e%<vblex%>%<ger%>:ing  # "weight: 2" ;
e%<vblex%>%<subs%>:ing  # "weight: 3" ;
e%<vblex%>%<pres%>:e # ;
e%<vblex%>%<pres%>%<p3%>%<sg%>:es # ;
e%<vblex%>%<past%>:ed # ;

LEXICON house__n

%<n%>%<sg%>:0  # ;
%<n%>%<sg%>+'s%<gen%>:'s  # ;
%<n%>%<pl%>:s  # ;
%<n%>%<pl%>+'s%<gen%>:s'  # ;

$ ▓▒hfst-lexc house.lexc | hfst-fst2txt 
hfst-lexc: warning: Defaulting to OpenFst tropical type
Root...2 liv/e__vblex...9 house__n...
0	1	h	h	1.000000
1	2	o	o	0.000000
2	3	u	u	0.000000
3	4	s	s	0.000000
4	5	e	i	2.000000
4	6	e	e	0.000000
5	7	<vblex>	n	0.000000
6	8	<n>	@0@	0.000000
6	9	<n>	s	0.000000
6	10	<n>	'	0.000000
6	11	<vblex>	@0@	1.000000
6	12	<vblex>	s	1.000000
6	13	<vblex>	d	1.000000
7	14	<subs>	g	2.000000
7	14	<ger>	g	1.000000
7	14	<pprs>	g	0.000000
8	14	<sg>	@0@	0.000000
9	14	<pl>	@0@	0.000000
9	15	<pl>	'	0.000000
10	15	<sg>	s	0.000000
11	14	<pres>	@0@	0.000000
11	14	<imp>	@0@	0.000000
11	14	<inf>	@0@	0.000000
12	16	<pres>	@0@	0.000000
13	14	<past>	@0@	0.000000
13	14	<pp>	@0@	0.000000
14	0.000000
15	17	+	@0@	0.000000
16	8	<p3>	@0@	0.000000
17	18	'	@0@	0.000000
18	19	s	@0@	0.000000
19	14	<gen>	@0@	0.000000

$ ▓▒hfst-lexc house.lexc | hfst-fst2strings  -w
hfst-lexc: warning: Defaulting to OpenFst tropical type
Root...2 liv/e__vblex...9 house__n...
house<vblex><subs>:housing	5
house<vblex><ger>:housing	4
house<vblex><pprs>:housing	3
house<n><sg>:house	1
house<n><pl>:houses	1
house<n><pl>+'s<gen>:houses'	1
house<n><sg>+'s<gen>:house's	1
house<vblex><pres>:house	2
house<vblex><imp>:house	2
house<vblex><inf>:house	2
house<vblex><pres><p3><sg>:houses	2
house<vblex><past>:housed	2

nonetheless for the lt-proc part there should be at least a bit more of the weight accumulated :-/

unhammer · 2019-04-03T15:17:32Z

Is lt-comp used in the apertium-eng to compile the dictionary?

it is

mr-martian · 2022-07-06T18:18:47Z

I believe the issue here is that Transducer::closure() disregards weight and as a result determinize() and minimize() lose any weights which are on epsilon transitions.

* don't lose weights when minimizing * don't lose weights while compiling * don't duplicate weights while compiling * ensure that final weight after joinFinals() is always 0 * version bump so lexd can depend on this

This reverts commit b040536. I will not push untested code. I will not push untested code. I will not push untested code.

xavivars · 2023-11-01T19:24:26Z

@mr-martian, it seems at some point you attempted to fix it, but then had to revert. Any idea on what needs to be done?

mr-martian · 2023-11-01T19:30:04Z

The issue is that FST minimization was written for unweighted automata and when weight support for added, closure() and/or minimize() were updated incorrectly and my first attempt at fixing it failed. So someone who knows FST algorithms better than me needs to go through that code.

unhammer added the bug Something isn't working label Mar 14, 2019

AMR-KELEG added a commit to AMR-KELEG/lttoolbox that referenced this issue Apr 2, 2019

Fix the analysis weight computation bug

49cb8d8

When computing the overall weight of an analysis, The weight of the final state was ignored. Fix apertium#44

AMR-KELEG mentioned this issue Apr 2, 2019

Fix the analysis weight computation bug #49

Merged

TinoDidriksen closed this as completed in 6703481 Apr 3, 2019

TinoDidriksen reopened this Apr 3, 2019

ftyers added the weighting label Jun 20, 2020

mr-martian closed this as completed in 19050b7 Jul 6, 2022

mr-martian added a commit that referenced this issue Jul 7, 2022

Revert "handle weights more correctly (closes #44)"

785882a

This reverts commit b040536. I will not push untested code. I will not push untested code. I will not push untested code.

mr-martian reopened this Jul 8, 2022

mr-martian mentioned this issue Sep 26, 2022

Compose #161

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Weights are ignored in monolingual dictionary entries #44

Weights are ignored in monolingual dictionary entries #44

marcriera commented Mar 14, 2019

unhammer commented Mar 14, 2019

Techievena commented Mar 15, 2019

AMR-KELEG commented Mar 25, 2019

AMR-KELEG commented Mar 25, 2019

TinoDidriksen commented Apr 3, 2019

AMR-KELEG commented Apr 3, 2019

flammie commented Apr 3, 2019

unhammer commented Apr 3, 2019 via email

mr-martian commented Jul 6, 2022

xavivars commented Nov 1, 2023

mr-martian commented Nov 1, 2023

Weights are ignored in monolingual dictionary entries #44

Weights are ignored in monolingual dictionary entries #44

Comments

marcriera commented Mar 14, 2019

unhammer commented Mar 14, 2019

Techievena commented Mar 15, 2019

AMR-KELEG commented Mar 25, 2019

AMR-KELEG commented Mar 25, 2019

TinoDidriksen commented Apr 3, 2019

AMR-KELEG commented Apr 3, 2019

flammie commented Apr 3, 2019

unhammer commented Apr 3, 2019 via email

mr-martian commented Jul 6, 2022

xavivars commented Nov 1, 2023

mr-martian commented Nov 1, 2023