Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve typos analyzer quality #758

Open
irinakhismatullina opened this issue Apr 18, 2019 · 0 comments
Open

Improve typos analyzer quality #758

irinakhismatullina opened this issue Apr 18, 2019 · 0 comments
Assignees
Labels
large Large size typos

Comments

@irinakhismatullina
Copy link
Contributor

Mostly about improving the quality of the current TyposCorrector model:

  • Improve the data quality: Improve TokenParser in cases containing abbreviations ml#403 and Integrate the neural token splitter ml#402 - the most important stuff, several stages of the pipeline depend strongly on it.
  • -> Improve the vocabulary. Right now it's mostly fine, but with good splitting it will be much, much better.
  • Work out the best fasttext configuration - I'm already alright with the one that I have, it's light and gives some boost to the quality, so it doesn't have that big priority already.
  • Work on the model training configuration - haven't touched it yet, not sure that there is a much better one from the current default (most mistakes that I see now I can explain through bad splits or vocabulary or lack of training data (the last will go with training on the bigger dataset, that's easy ofc)).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
large Large size typos
Projects
None yet
Development

No branches or pull requests

1 participant