You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Line "June-July 2000" is tokenized into "June", "July", "2000". Hyphen disappeared.
Probably not a bug. But the problem happens in case you need to distinguish "June July" from "June-July" on rule level.
The text was updated successfully, but these errors were encountered:
If I remember well, the tokenizer itself should keep the hyphen untouched and produce one unique token for "June-July". It is then the role of the hyphen alternatives module to either keep the original token with hyphen if it is known from the dictionary or to split the token in the two tokens (here June and July) othewise.
But, if it is splitted, I think it currently just removes the hyphen. In fact, I think that a third token should be created for the hyphen. @romaricb what do you think about it ?
Line "June-July 2000" is tokenized into "June", "July", "2000". Hyphen disappeared.
Probably not a bug. But the problem happens in case you need to distinguish "June July" from "June-July" on rule level.
The text was updated successfully, but these errors were encountered: