-
-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Standard handler for Numbers #334
Comments
I've been working on this somewhat at Numbers. The following are basic number words: 'A' can be considered a number word if directly followed by a number (other than ONE): 'OH' can be considered a number word if it occurs next to another number 'AND' can be considered a number word if it is both preceded and followed by a number word I think what we need to do is create a special placemarker for numbers, then scan the transcription for number words and replace them with the placemarker before passing the transcription to the intent parser. After the intent parser does its work, the numbers are placed back into a NUMBERS match group which lists all the numbers. I'm not sure how to handle number associations. For example: In this case the order of the numbers and the order of the prepositions both matter and can change the meaning of the command. Unfortunately, I think the plugin author would have to analyze the exact phrase to determine the beginning and ending points of the count. |
I'm planning to do slot types now. If you want a numeric value, you would create a template like Then there will be a pre-parser that will look for any groups of number words using regex and put them into the matches for the variant, replacing the original locations with , so we will be passing "Count from to " to the intent parser, which should match the correct template. Once the template is matched, then the identities of the numbers ("from" and "to") will be looked up and matched to the numbers. It will be the responsibility of the template author to check that the numbers returned are in the correct range. |
Detailed Description
We would like to provide a special keyword for number, as opposed to the plugin-author-defined keywords like {ColorKeyword} or {DayKeyword} because recognizing and parsing a number is both more complex and extremely common. I propose either using square brackets ("[NUMBER]") or colons ("{:NUMBER:}") to distinguish system keywords from plugin keywords. Eventually I would like to have system keywords for Number, Date, and Time and I'm sure others will arise as we work on them.
Context
Right now, it would be difficult for an author to simply ask for a number in the template. For instance "WHAT IS {NumberKeyword} PLUS {NumberKeyword}" would require listing every possible number in the template itself:
NumberKeyword: [ONE, TWO, THREE, ...]
which would be incredibly time consuming, and when parsed into an expanded form would make the template take up as many lines as you added. In addition, there are numerous ways to say each number, so one person might say 'ONE NINE SIX FIVE' another might say 'ONE THOUSAND NINE HUNDRED SIXTY FIVE' another 'NINETEEN SIXTY FIVE' etc. This quickly becomes overwhelming.
Possible Implementation
There are rules for how numbers can be constructed. You might say, for instance "ONE HUNDRED THOUSAND" but you wouldn't say "THOUSAND". Since most language dictionaries are based on trigrams, I should be able to generate a set of trigrams for speaking numbers (ONE, ONE HUNDRED, ONE HUNDRED THOUSAND, SEVENTEEN OH ONE, SEVENTEEN THOUSAND AND, etc) and then insert only a list of words that may appear first or last in a number into the basic template. This should allow the language model to insert the full trigram model in its place.
The text was updated successfully, but these errors were encountered: