Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Weights are ignored in monolingual dictionary entries #44

Open
marcriera opened this issue Mar 14, 2019 · 11 comments
Open

Weights are ignored in monolingual dictionary entries #44

marcriera opened this issue Mar 14, 2019 · 11 comments
Labels
bug Something isn't working weighting

Comments

@marcriera
Copy link
Member

Given the following paradigms and entries:

<pardef n="liv/e__vblex">
  <e>       <p><l>e</l>         <r>e<s n="vblex"/><s n="inf"/></r></p></e>
  <e>       <p><l>e</l>         <r>e<s n="vblex"/><s n="imp"/></r></p></e>
  <e>       <p><l>ed</l>        <r>e<s n="vblex"/><s n="pp"/></r></p></e>
  <e w="1"> <p><l>ing</l>       <r>e<s n="vblex"/><s n="pprs"/></r></p></e>
  <e w="3"> <p><l>ing</l>       <r>e<s n="vblex"/><s n="ger"/></r></p></e>
  <e w="2"> <p><l>ing</l>       <r>e<s n="vblex"/><s n="subs"/></r></p></e>
  <e>       <p><l>e</l>         <r>e<s n="vblex"/><s n="pres"/></r></p></e>
  <e>       <p><l>es</l>        <r>e<s n="vblex"/><s n="pres"/><s n="p3"/><s n="sg"/></r></p></e>
  <e>       <p><l>ed</l>        <r>e<s n="vblex"/><s n="past"/></r></p></e>
</pardef>
<pardef n="house__n">
  <e>       <p><l></l>          <r><s n="n"/><s n="sg"/></r></p></e>
  <e r="RL"><p><l>'s</l>        <r><s n="n"/><s n="sg"/><j/>'s<s n="gen"/></r></p></e>
  <e>       <p><l>s</l>         <r><s n="n"/><s n="pl"/></r></p></e>
  <e r="RL"><p><l>s'</l>        <r><s n="n"/><s n="pl"/><j/>'s<s n="gen"/></r></p></e>
</pardef>
<e lm="house" w="1">     <i>house</i><par n="house__n"/></e>
<e lm="house" w="2">     <i>hous</i><par n="liv/e__vblex"/></e>

lt-proc seems to ignore the weights for the entries:

$ echo "house" | lt-proc -wW eng-cat.automorf.bin
^house/house<n><sg><W:0.000000>/house<vblex><inf><W:0.000000>/house<vblex><pres><W:0.000000>/house<vblex><imp><W:0.000000>$

The expected result would be:

$ echo "house" | lt-proc -wW eng-cat.automorf.bin
^house/house<n><sg><W:1.000000>/house<vblex><inf><W:2.000000>/house<vblex><pres><W:2.000000>/house<vblex><imp><W:2.000000>$

However, the weights work fine when they are used inside a paradigm:

$ echo "housing" | lt-proc -wW eng-cat.automorf.bin
^housing/housing<n><sg><W:0.000000>/house<vblex><pprs><W:1.000000>/house<vblex><subs><W:2.000000>/house<vblex><ger><W:3.000000>$
@unhammer unhammer added the bug Something isn't working label Mar 14, 2019
@unhammer
Copy link
Member

@Techievena

@Techievena
Copy link
Member

@unhammer I will definitely look into it.

@AMR-KELEG
Copy link
Contributor

I might be facing the same problem.
I am using an input written in .att format to generate a weighted transducer.

0       1       c       c       0.000000
1       2       a       a       0.000000
2       3       t       t       0.000000
3       4       @0@     <n>     0.000000
3       5       s       <n>     0.000000
4       2.000000
5       6       @0@     <pl>    0.000000
6       1.000000

I generate the transducer using lt-comp lr in.att apert_model.
The output of lt-print apert_model is:

0       1       c       c       0.000000
1       2       a       a       0.000000
2       3       t       t       0.000000
3       4       ε       <n>     0.000000
3       5       s       <n>     0.000000
4       7       ε       ε       2.000000
5       6       ε       <pl>    0.000000
6       7       ε       ε       1.000000
7       0.000000

which seems to be correct.

However, the output of the echo 'cat' | lt-proc apert_model -W seems to ignore the weights.
^cat/cat<n><W:0.000000>$

@AMR-KELEG
Copy link
Contributor

I think the bug might be related to this line and its following lines:

response = NFinals(response, max_analyses, max_weight_classes);

AMR-KELEG added a commit to AMR-KELEG/lttoolbox that referenced this issue Apr 2, 2019
When computing the overall weight of an analysis,
The weight of the final state was ignored.
Fix apertium#44
@TinoDidriksen
Copy link
Member

I guess editing the comment on #49 to remove "Fix #44" was not enough to make Github understand it was not a closing merge.

@TinoDidriksen TinoDidriksen reopened this Apr 3, 2019
@AMR-KELEG
Copy link
Contributor

@marcriera I think the bug is with the lt-comp command.
Is lt-comp used in the apertium-eng to compile the dictionary?

I have prepared a sample dictionary:

<dictionary>
  <alphabet>ÀÁÂÄÆÇÈÉÊËÌÍÎÏÑÒÓÔÖÙÚÛÜàáâäçèéêëìíîïñòóôöùúûüABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz</alphabet>
  <sdefs>
    <sdef n="n"   c="Noun"/>  
    <sdef n="vblex"   c="Verb"/> 
    <sdef n="p1"  c="First person"/> 
    <sdef n="p3"  c="Third person"/> 
    <sdef n="sg"  c="Singular"/> 
    <sdef n="pl"  c="Plural"/> 
    <sdef n="pres"  c="Present (tense)"/> 
    <sdef n="past"  c="Past"/> 
    <sdef n="imp"   c="Imperative"/> 
    <sdef n="inf"   c="Infinitive"/> 
    <sdef n="pp"  c="Past participle"/> 
    <sdef n="subs"  c="Verbal noun"/> 
    <sdef n="pprs"  c="Present participle"/> 
    <sdef n="ger"   c="Gerund"/> 
  </sdefs>
  <pardefs>
    <pardef n="liv/e__vblex">
      <e>       <p><l>e</l>         <r>e<s n="vblex"/><s n="inf"/></r></p></e>
      <e>       <p><l>e</l>         <r>e<s n="vblex"/><s n="imp"/></r></p></e>
      <e>       <p><l>ed</l>        <r>e<s n="vblex"/><s n="pp"/></r></p></e>
      <e w="1"> <p><l>ing</l>       <r>e<s n="vblex"/><s n="pprs"/></r></p></e>
      <e w="3"> <p><l>ing</l>       <r>e<s n="vblex"/><s n="ger"/></r></p></e>
      <e w="2"> <p><l>ing</l>       <r>e<s n="vblex"/><s n="subs"/></r></p></e>
      <e>       <p><l>e</l>         <r>e<s n="vblex"/><s n="pres"/></r></p></e>
      <e>       <p><l>es</l>        <r>e<s n="vblex"/><s n="pres"/><s n="p3"/><s n="sg"/></r></p></e>
      <e>       <p><l>ed</l>        <r>e<s n="vblex"/><s n="past"/></r></p></e>
    </pardef>
    <pardef n="house__n">
      <e>       <p><l></l>          <r><s n="n"/><s n="sg"/></r></p></e>
      <e r="RL"><p><l>'s</l>        <r><s n="n"/><s n="sg"/><j/>'s<s n="gen"/></r></p></e>
      <e>       <p><l>s</l>         <r><s n="n"/><s n="pl"/></r></p></e>
      <e r="RL"><p><l>s'</l>        <r><s n="n"/><s n="pl"/><j/>'s<s n="gen"/></r></p></e>
    </pardef>
  </pardefs>
<section id="main" type="standard">
  <e lm="house" w="1">     <i>house</i><par n="house__n"/></e>
  <e lm="house" w="2">     <i>hous</i><par n="liv/e__vblex"/></e>
</section>
</dictionary>

And the output transducer isn't correct

0	1	h	h	0.000000	
1	2	o	o	0.000000	
2	3	u	u	0.000000	
3	4	s	s	0.000000	
4	5	e	e	0.000000	 # THIS EDGE SHOULD HAVE WEIGHT=2
4	6	e	e	1.000000 # THIS EDGE HAVE A CORRECT WEIGHT!!	
4	7	i	e	0.000000	
5	8	ε	<vblex>	0.000000	
5	9	d	<vblex>	0.000000	
5	10	s	<vblex>	0.000000	
6	11	ε	<n>	0.000000	
6	12	s	<n>	0.000000	
7	13	n	<vblex>	0.000000	
8	14	ε	<inf>	0.000000	
8	14	ε	<imp>	0.000000	
8	14	ε	<pres>	0.000000	
9	14	ε	<pp>	0.000000	
9	14	ε	<past>	0.000000	
10	15	ε	<pres>	0.000000	
11	14	ε	<sg>	0.000000	
12	14	ε	<pl>	0.000000	
13	14	g	<pprs>	1.000000	
13	14	g	<ger>	3.000000	
13	14	g	<subs>	2.000000	
15	11	ε	<p3>	0.000000	
14	0.000000

When I use the command echo "house" | lt-proc house.bin -W I get only correct weights for the noun analysis:

^house/house<vblex><inf><W:0.000000>/house<vblex><imp><W:0.000000>/house<vblex><pres><W:0.000000>/house<n><sg><W:1.000000>$

@flammie
Copy link
Member

flammie commented Apr 3, 2019

the correct weighting here is not trivial (so there seems to be something wrong in the compilation part too), keep in mind that the prefix "hous" is shared by both verb and noun, and the verb that needs that weight of 2 needs it also for "housing" which does not go through the
"4 5 e e" arc.

Here's the hfst + lexc equivalent for reference:

 $ ▓▒cat house.lexc 
Multichar_Symbols
%<n%>
%<vblex%>
%<p1%>
%<p3%>
%<sg%>
%<pl%>
%<pres%>
%<past%>
%<imp%>
%<inf%>
%<pp%>
%<subs%>
%<pprs%>
%<ger%>
%<gen%>

LEXICON Root

house:house house__n "weight: 1" ;
hous:hous liv/e__vblex "weight: 2" ;

LEXICON liv/e__vblex

e%<vblex%>%<inf%>:e # ;
e%<vblex%>%<imp%>:e # ;
e%<vblex%>%<pp%>:ed # ;
e%<vblex%>%<pprs%>:ing # "weight: 1" ;
e%<vblex%>%<ger%>:ing  # "weight: 2" ;
e%<vblex%>%<subs%>:ing  # "weight: 3" ;
e%<vblex%>%<pres%>:e # ;
e%<vblex%>%<pres%>%<p3%>%<sg%>:es # ;
e%<vblex%>%<past%>:ed # ;

LEXICON house__n

%<n%>%<sg%>:0  # ;
%<n%>%<sg%>+'s%<gen%>:'s  # ;
%<n%>%<pl%>:s  # ;
%<n%>%<pl%>+'s%<gen%>:s'  # ;

$ ▓▒hfst-lexc house.lexc | hfst-fst2txt 
hfst-lexc: warning: Defaulting to OpenFst tropical type
Root...2 liv/e__vblex...9 house__n...
0	1	h	h	1.000000
1	2	o	o	0.000000
2	3	u	u	0.000000
3	4	s	s	0.000000
4	5	e	i	2.000000
4	6	e	e	0.000000
5	7	<vblex>	n	0.000000
6	8	<n>	@0@	0.000000
6	9	<n>	s	0.000000
6	10	<n>	'	0.000000
6	11	<vblex>	@0@	1.000000
6	12	<vblex>	s	1.000000
6	13	<vblex>	d	1.000000
7	14	<subs>	g	2.000000
7	14	<ger>	g	1.000000
7	14	<pprs>	g	0.000000
8	14	<sg>	@0@	0.000000
9	14	<pl>	@0@	0.000000
9	15	<pl>	'	0.000000
10	15	<sg>	s	0.000000
11	14	<pres>	@0@	0.000000
11	14	<imp>	@0@	0.000000
11	14	<inf>	@0@	0.000000
12	16	<pres>	@0@	0.000000
13	14	<past>	@0@	0.000000
13	14	<pp>	@0@	0.000000
14	0.000000
15	17	+	@0@	0.000000
16	8	<p3>	@0@	0.000000
17	18	'	@0@	0.000000
18	19	s	@0@	0.000000
19	14	<gen>	@0@	0.000000

$ ▓▒hfst-lexc house.lexc | hfst-fst2strings  -w
hfst-lexc: warning: Defaulting to OpenFst tropical type
Root...2 liv/e__vblex...9 house__n...
house<vblex><subs>:housing	5
house<vblex><ger>:housing	4
house<vblex><pprs>:housing	3
house<n><sg>:house	1
house<n><pl>:houses	1
house<n><pl>+'s<gen>:houses'	1
house<n><sg>+'s<gen>:house's	1
house<vblex><pres>:house	2
house<vblex><imp>:house	2
house<vblex><inf>:house	2
house<vblex><pres><p3><sg>:houses	2
house<vblex><past>:housed	2

nonetheless for the lt-proc part there should be at least a bit more of the weight accumulated :-/

@unhammer
Copy link
Member

unhammer commented Apr 3, 2019 via email

@mr-martian
Copy link
Contributor

I believe the issue here is that Transducer::closure() disregards weight and as a result determinize() and minimize() lose any weights which are on epsilon transitions.

mr-martian added a commit that referenced this issue Jul 6, 2022
* don't lose weights when minimizing
* don't lose weights while compiling
* don't duplicate weights while compiling
* ensure that final weight after joinFinals() is always 0
* version bump so lexd can depend on this
mr-martian added a commit that referenced this issue Jul 7, 2022
This reverts commit b040536.

I will not push untested code.
I will not push untested code.
I will not push untested code.
@mr-martian mr-martian reopened this Jul 8, 2022
@mr-martian mr-martian mentioned this issue Sep 26, 2022
@xavivars
Copy link
Member

xavivars commented Nov 1, 2023

@mr-martian, it seems at some point you attempted to fix it, but then had to revert. Any idea on what needs to be done?

@mr-martian
Copy link
Contributor

The issue is that FST minimization was written for unweighted automata and when weight support for added, closure() and/or minimize() were updated incorrectly and my first attempt at fixing it failed. So someone who knows FST algorithms better than me needs to go through that code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working weighting
Projects
None yet
Development

No branches or pull requests

9 participants