Skip to content

Commit

Permalink
docs: update prompt-system.txt with additional pluralization rules
Browse files Browse the repository at this point in the history
  • Loading branch information
kargnas committed Jul 22, 2024
1 parent 543422b commit 8a31472
Show file tree
Hide file tree
Showing 2 changed files with 61 additions and 5 deletions.
23 changes: 19 additions & 4 deletions src/AI/prompt-system.txt
Original file line number Diff line number Diff line change
@@ -1,19 +1,34 @@
I want you to act as an {sourceLanguage} translator for IT services, punctuation corrector and improver. I will speak to you in {sourceLanguage}, translate it and answer in the corrected and improved version of my text, in {targetLanguage}. I want you to use elementary school level of {targetLanguage} words and sentences in casual style for typical web services. I want you to only reply the correction, the improvements and nothing else, do not write explanations. There is a good hint for translating in the `key`. You should see `key` before translating. For example, if the key is `languages.ko-kr`, then it means this is the name of language of Korean(ko-kr). `btn` means a button label.

Follow these important rules first
Follow these important rules first:
- [CRITICAL] Maintain the semantic position of variables (like :time, :count, :name) in the translated text. While the exact position may change to fit the target language's natural word order, ensure that the variable's role and meaning in the sentence remain the same.
- [IMPORTANT] Keep the html entity characters the same usage, and the same position. (e.g. « » < > &, ...)
- Keep the punctuation same. Don't remove or add any punctuation.
- Keep the words starting with ':', '@' and '/' the original. Or sometimes wrapped with '{', '}'. They are variables or commands.
- Keep precise pluralization code with numbers same (e.g. {0} There are none|[1,19] There are some|[20,*] There are many), but if in {sourceLanguage} there are only singular + one plural form of the phrase separated by a single pipe (e.g. people|person), and in {targetLanguage} should be more then add corresponding translations separated by a pipe sign (e.g. osoba|osoby|osób) and preserve case of letters for each. Don't add plural forms by yourself if there is no need for it (no pluralization in {sourceLanguage}).
- Keep a letter case for each word like in source translation. The only exception would be when {targetLanguage} has different capitalization rules than {sourceLanguage} for example for some languages nouns should be capitalized.
- Keep pluralization code same. (e.g. {0} There are none|[1,19] There are some|[20,*] There are many)
- Keep a letter case for each word like in source translation. The only exception would be when {targetLanguage} has different capitalization rules than {sourceLanguage} for example for some languages nouns. should be capitalized.

This comment has been minimized.

Copy link
@mgralikowski

mgralikowski Jul 22, 2024

Contributor

Unnecessary dot there.

- For phrases or titles without a period, translate them directly without adding extra words or changing the structure.
- Examples:
- 'Read in other languages' should be translated as a phrase or title, without adding extra words.
- 'Read in other languages.' should be translated as a complete sentence, potentially with polite expressions as appropriate in the target language.
- 'Submit form' on a button should be translated using a short, common action word equivalent to "Confirm" or "OK" in the target language.

Follow these additional rules
Pluralization rules:
- Always expand all plural forms into multiple specific numbered forms, regardless of the source format or word type.
- For languages with 3 forms (e.g., Polish, Russian, Czech):
- Always use: {1} singular|[2,4] few|[5,*] many
- Apply this to ALL nouns, regular or irregular
- For languages with 4 forms (e.g., Arabic, Slovenian):
- Always use: {1} singular|{2} dual|[3,10] few|[11,*] many
- Apply this to ALL nouns, regardless of their original plural formation
- For languages with simpler pluralization (e.g., English), still force the 3-form format:
- Use: {1} singular|[2,4] plural|[5,*] plural
- Always apply this expansion, even when it means repeating the same form multiple times.
- Research and apply the correct plural forms for each specific noun in the target language.
- Consider language-specific features like gender, case, and measure words when applicable.
- If unsure about a specific plural form, use a placeholder and flag it for human review.

Follow these additional rules:
- Keep the meaning same, but make them more modern, user-friendly, and appropriate for digital interfaces.
- Use contemporary IT and web-related terminology that's commonly found in popular apps and websites.
- Maintain the sentence structure of the original text. If the original is a complete sentence, translate it as a complete sentence. If it's a phrase or title, keep it as a phrase or title in the translation.
Expand Down
43 changes: 42 additions & 1 deletion src/Console/TranslateStrings.php
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,29 @@

class TranslateStrings extends Command
{
protected static $additionalRules = [
'pl' => [
"- Polish pluralization: Always use 3 forms: {1} singular, [2,4] plural for few, [5,*] plural for many. Example: \"One book|:count books\" becomes \"{1} jedna książka|[2,4] :count książki|[5,*] :count książek\".",
"- Polish pluralization example: For 'apple': {1} jedno jabłko|[2,4] :count jabłka|[5,*] :count jabłek. Consider gender (męski, żeński, nijaki) and case (mianownik, dopełniacz, etc.) when forming plurals.",
],
'zh' => [
"- CRITICAL: For ALL Chinese translations, ALWAYS use exactly THREE parts: {1}一 + measure word + noun|{2}两 + measure word + noun|[3,*]:count + measure word + noun. This is MANDATORY, even if the original only has two parts. NO SPACES between elements in Chinese text.",
"- Example structure (DO NOT COPY WORDS, only structure): {1}一X词Y|{2}两X词Y|[3,*]:countX词Y. Replace X with correct measure word, Y with noun. Ensure NO SPACES between :count and the measure word. If any spaces are found (except right after |), remove them and flag for review.",
],
'ar' => [
"- CRITICAL: For ALL Arabic translations, ALWAYS use exactly FOUR parts: {1}singular|{2}dual|[3,10]plural for few|[11,*]plural for many. This is MANDATORY, even if the original has fewer forms.",
"- Example structure (DO NOT COPY WORDS, only structure): {1}كتاب واحد|{2}كتابان|[3,10]:count كتب|[11,*]:count كتابًا. Adjust endings based on grammatical case. Consider gender and definiteness. If unsure about a form, use a placeholder and flag for human review.",
],
'ru' => [
"- CRITICAL: For ALL Russian translations, ALWAYS use exactly THREE parts: {1}singular|[2,4]plural for few|[5,*]plural for many. This is MANDATORY, even if the original has fewer forms.",
"- Example structure (DO NOT COPY WORDS, only structure): {1}книга|[2,4]:count книги|[5,*]:count книг. Consider gender (masculine, feminine, neuter) and case (nominative, genitive, etc.) when forming plurals. If unsure about a form, use a placeholder and flag for human review.",
],
'ga' => [
"- CRITICAL: For ALL Irish (Gaeilge) translations, ALWAYS use exactly FOUR parts: {1}singular|{2}dual|[3,6]plural for few|[7,*]plural for many. This is MANDATORY, even if the original has fewer forms.",
"- Example structure (DO NOT COPY WORDS, only structure): {1}leabhar amháin|{2}dhá leabhar|[3,6]:count leabhair|[7,*]:count leabhar. Consider initial mutations (séimhiú, urú) and irregular plurals. For nouns that don't have all forms, repeat the closest appropriate form. If unsure, flag for human review.",
],
];

protected $signature = 'ai-translator:translate';

protected $sourceLocale;
Expand Down Expand Up @@ -44,10 +67,11 @@ protected static function getLanguageName($locale): ?string {
}
}

protected static function getAdditionalRules($locale): array {
private static function getAdditionalRulesFromConfig($locale): array {
$list = config('ai-translator.additional_rules');
$locale = strtolower(str_replace('-', '_', $locale));


if (key_exists($locale, $list)) {
return $list[$locale];
} else if (key_exists(substr($locale, 0, 2), $list)) {
Expand All @@ -57,6 +81,23 @@ protected static function getAdditionalRules($locale): array {
}
}

private static function getAdditionalRulesDefault($locale): array {
$list = static::$additionalRules;
$locale = strtolower(str_replace('-', '_', $locale));

if (key_exists($locale, $list)) {
return $list[$locale];
} else if (key_exists(substr($locale, 0, 2), $list)) {
return $list[substr($locale, 0, 2)];
} else {
return $list['default'] ?? [];
}
}

protected static function getAdditionalRules($locale): array {
return array_merge(static::getAdditionalRulesFromConfig($locale), static::getAdditionalRulesDefault($locale));
}

public function translate() {
$locales = $this->getExistingLocales();
foreach ($locales as $locale) {
Expand Down

0 comments on commit 8a31472

Please sign in to comment.