Skip to content

HPO Japanese Translation

Yasunori edited this page Oct 13, 2015 · 41 revisions

Participants: Hiroyuki Mishima, Terue Takatsuki, Yasunori Yamamoto, Soichi Ogishima, Toshiaki Katayama

Day 1

Survey of the online resources

HPO has 11,425 unique concepts, and has a conceptually hierarchical structure using the rdfs:subClassOf predicate. Some concepts have multiple paths from the root, and the following statistics shows the numbers of the concepts (left) at each depth (right).

      4 1
     41 2
    236 3
    924 4
   2401 5
   4779 6
   6699 7
   9296 8
  15181 9
  17314 10
  17306 11
   8618 12
   2209 13
    219 14
     36 15

Elements of Morphology: Standard Terminology (EMST) Japanese edition has 397 concepts and 119 synonyms, which are based on EMST English edition. The English edition is in the special issue of American Journal of Medical Genetics Volume 149A, Issue 1 (2009).

Day 2

Life Science Dictionary (LSD) includes an English-Japanese dictionary in Life Sciences. We used its RDF edition to match it to HPO using Silk, and obtained 2,061 matches. The following shows how many concepts at each depth have corresponding Japanese one based on case-insensitive exact match. Third and fourth columns show the ratio to the total numbers of the concepts at each depth.

      0 1        0/4   0
     12 2      12/41  29
     44 3     44/236  19
    221 4    221/924  24
    643 5   643/2401  27
    903 6   903/4779  19
    830 7   830/6699  12
    621 8   621/9296 6.7
    358 9  358/15181 2.4
    135 10 135/17314 .78
     22 11  22/17306 .13
      2 12    2/8618 .02
      1 13    1/2209 .05
      0 14     0/219   0
      0 15      0/36   0

In addition, we obtained 10,307 HPO words that partially match at least one LSD word. An example is that the LSD word "juxtaglomerular cell" matches the HPO word "Renal juxtaglomerular cell hypertrophy/hyperplasia".

Other than LSD, Mammalian phenotype (MP) has its Japanese translation, which has 9,085 concepts. Of these concepts, HPO has 588 case-insensitive exact matches and 1,665 partial matches.

      1 1
      8 3
     85 4
    199 5
    236 6
    287 7
    236 8
    124 9
     39 10
      7 11
      2 12

Day 3

Developing a perl script for dictionary match

To improve partial match of LSD words to HPO, a perl script has been developed. It matches LSD words as much as possible to cover each HPO word if no LSD word exactly matches it. For example, HPO has "Abnormality of body height" (HP:0000002), but LSD doesn't have it. LSD has "Abnormality" and "body height", and the script looks for them. Another example is "Abnormal delayed hypersensitivity skin test" (HP:0002963), which LSD doesn't have, either. It looks for LSD words and partially matches them to it in order of the word count. LSD has the following words:

  • Abnormal
  • delayed
  • delayed hypersensitivity
  • hypersensitivity
  • skin
  • skin test
  • test

As a result, the script matches the following words.

Each word has its Japanese translation as follows: "異常", "遅延型過敏症", "皮膚テスト".

Using this script, we got 2,034 exactly matches and 9,181 partial/compositional matches. Since HPO has 11,516 words, 97.3% ( =(2,034 + 9,181)/11,516 ) of them are matched anyway.

Issues

  • We need to lemmatize words, but Yasunori failed to install a module.
    Due to this issue, the script cannot match "infections" to "infection". -> fixed! by obtaining a portion of the source code of the module.
  • We have to ask domain experts to see if the compositionally translated words are appropriate.

Day 4

Generated a matching result using LSD

The following files are put on a github repository.

  • A script to match LSD words to HPO terms
  • A script to print a matching result
  • A resultant file

https://github.com/yayamamo/BH15

Issues

  • Some words match inappropriately such as matching of "or" to "オッズ比" ("Odds Ratio" in English).

Day 5

Summary

Progress of development of HPO-Japanese

(11425 terms in total)

The Japanese Association of Medical Sciences Medical Term Dictionary is an authorized medical terminology in Japanese. This dictionary and the Elements of Morphology: Standard Terminology should be used for HPO-Japanese.

Evidence codes

Evidence of translation will be distinguished by evidence codes: medical-society approved dictionary, expert curated, general dictionary, no corresponding concept (really exist?), and so on.

Plan to release HPO-Japanese

The remaining terms are planned to be curated by experts using social curation platform; e.g., phenodisquss software (will be developed until the end of this November by Orphanet)(Tudor). We will plan to hold a small-scale development jamboree for HPO-Japanese focusing on the terms in the targeted field by Japan IRUD project.

HPO-Japanese will be linked to Mammalian Phenotype (MP). Link to MP will contribute to design of experiment using experimental animals. MP will also contribute to translation of the remaining terms (Terue).

We are plan to publish the first release of HPO-Japanese in March 2016. At the same time, we will publish the paper about release of HPO-Japanese.

Applications using HPO

  • Patient Archive (Australia UDP)(Tudor)