Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add some new rules to en-GB grammar.xml. #10596

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion languagetool-client-example/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@
<dependency>
<groupId>org.languagetool</groupId>
<artifactId>language-all</artifactId>
<version>5.9</version>
<version>6.5-SNAPSHOT</version>
</dependency>
<!--
<dependency>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -534,6 +534,55 @@ USA
<example type="triggers_error">Explore releases from Ait! at Discogs.</example>
</rule>

</category>
</category>

<rule id="YOUR_YOURE" name="Your/You're Usage">
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LT already has rules for {your=>you're}. Additionally, as currently written, this rule will change every instance of your to you're, which will result in a large number of False Positives.
If you want to refine this rule, you should find instances where LT does not successfully correct {your=>you're} (i.e., False Negatives) and refine our already-existing rules for those cases.

<pattern>
<token>your</token>
</pattern>
<message>Did you mean <suggestion>you're</suggestion>?</message>
<example correction="you're">I think <marker>your</marker> going to love it.</example>
</rule>

<rule id="WORD_COLLOCATION_1" name="Inappropriate Word Collocation">
Copy link
Collaborator

@evan-defran-lt evan-defran-lt May 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We would probably want a more descriptive id for this rule than just WORD_COLLOCATION_1. Additionally, this wouldn't really be a collocation issue (the classic example of a collocation error would be powerful coffee instead of strong coffee); it's more along the lines of an adverb/adjective morphology error.

<pattern>
<token>strong</token>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way to make this rule more robust so that it covers all (or at least more) instances where an adjective is incorrectly being used to modify a verb? Right now, this rule would apply only to the bi-gram strong believe.

<token>believe</token>
</pattern>
<message>Consider using <suggestion>strongly believe</suggestion> instead of 'strong believe'. "Strongly" is the adverb form of "strong" and is used to modify verbs like "believe."</message>
<example correction="strongly believe">I strong believe in your abilities.</example>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to include <marker>...</marker> here.

</rule>

<rule id="WORD_COLLOCATION_3" name="Inappropriate Word Collocation">
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as above. We'd want a more descriptive id, and this isn't really a collocation error.

<pattern>
<token>fast</token>
<token>drive</token>
</pattern>
<message>Consider using <suggestion>fast driving</suggestion> instead of 'fast drive'. "Driving" is the gerund form of the verb "drive" and is used after an adjective like "fast" to describe the act of driving.</message>
<example correction="fast driving">He was caught for fast drive.</example>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to include <marker>...</marker> here.

</rule>

<rule id="WORD_COLLOCATION_2" name="Inappropriate Word Collocation">
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'd want a more descriptive id. And this one is a collocation error! :-)

<pattern>
<token>fast</token>
<token>meal</token>
</pattern>
<message>Consider using <suggesiton>fast food</suggestion> instead of 'fast meal'. "Fast food" is a specific type of food that is prepared and served quickly, whereas "fast meal" is not commonly used.</message>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a typo here: should be <suggestion>, not <suggesiton>.

<example correction="fast food">Let's grab some fast meal.</example>
</rule>

<rule id="Singular_Subject_Verb_Agreement" name="Singular Subject-Verb Agreement">
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LT already has rules for subject verb agreement. Additionally, as currently written, this rule will result in some false positives, like in the sentence, He let his dog run free in the field.
If you want to refine this rule, you should find instances where LT does not successfully correct the issue (i.e., False Negatives) and refine our already-existing rules for those cases.

<pattern>
<token postag="DT" min="0" max="1"/>
<token postag="NN">dog</token>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This rule will apply only when the subject is dog and the verb is run. Could you think of ways to make it more robust? (Just flagging this for you. As noted above, LT already has rules for subject-verb agreement issues, so it would be preferable to refine those rules than create new ones from scratch.)

<token postag="RB" min="0" max="1"/>
<token postag="VBP">run</token>
</pattern>
<message>The verb "run" does not agree with the singular subject "dog". Consider using <suggestion>runs</suggestion> to match the singular subject.</message>
<example correction="runs">The dog <marker>run</marker> in the park.</example>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The <marker>...</marker> tags in all the examples in this rule are not placed correctly.

<example correction="runs">The dog quickly <marker>run</marker> in the park.</example>
<example correction="runs">Dog <marker>run</marker> in the park.</example>
<example correction="runs">Dog quickly <marker>run</marker> in the park.</example>
</rule>

</rules>
</rules>
1 change: 1 addition & 0 deletions pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,7 @@
</developer>
</developers>
<modules>
<module>languagetool-client-example</module>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you let me know why you added this here?

<module>languagetool-core</module>
<module>languagetool-language-modules/en</module>
<module>languagetool-language-modules/fa</module>
Expand Down