-
Notifications
You must be signed in to change notification settings - Fork 3
HPO Japanese Translation
Participants: Hiroyuki Mishima, Terue Takatsuki, Yasunori Yamamoto, Soichi Ogishima, Toshiaki Katayama
HPO has 11,425 unique concepts, and has a conceptually hierarchical structure using the rdfs:subClassOf predicate. Some concepts have multiple paths from the root, and the following statistics shows the numbers of the concepts (left) at each depth (right).
4 1
41 2
236 3
924 4
2401 5
4779 6
6699 7
9296 8
15181 9
17314 10
17306 11
8618 12
2209 13
219 14
36 15
Elements of Morphology: Standard Terminology (EMST) Japanese edition has 397 concepts and 119 synonyms, which are based on EMST English edition. The English edition is in the special issue of American Journal of Medical Genetics Volume 149A, Issue 1 (2009).
Life Science Dictionary (LSD) includes an English-Japanese dictionary in Life Sciences. We used its RDF edition to match it to HPO using Silk, and obtained 2,061 matches. The following shows how many concepts at each depth have corresponding Japanese one based on case-insensitive exact match. Third and fourth columns show the ratio to the total numbers of the concepts at each depth.
0 1 0/4 0
12 2 12/41 29
44 3 44/236 19
221 4 221/924 24
643 5 643/2401 27
903 6 903/4779 19
830 7 830/6699 12
621 8 621/9296 6.7
358 9 358/15181 2.4
135 10 135/17314 .78
22 11 22/17306 .13
2 12 2/8618 .02
1 13 1/2209 .05
0 14 0/219 0
0 15 0/36 0
In addition, we obtained 10,307 HPO words that partially match at least one LSD word. An example is that the LSD word "juxtaglomerular cell" matches the HPO word "Renal juxtaglomerular cell hypertrophy/hyperplasia".
Other than LSD, Mammalian phenotype (MP) has its Japanese translation, which has 9,085 concepts. Of these concepts, HPO has 588 case-insensitive exact matches and 1,665 partial matches.
1 1
8 3
85 4
199 5
236 6
287 7
236 8
124 9
39 10
7 11
2 12
To improve partial match of LSD words to HPO, a perl script has been developed. It matches LSD words as much as possible to cover each HPO word if no LSD word exactly matches it. For example, HPO has "Abnormality of body height" (HP:0000002), but LSD doesn't have it. LSD has "Abnormality" and "body height", and the script looks for them. Another example is "Abnormal delayed hypersensitivity skin test" (HP:0002963), which LSD doesn't have, either. It looks for LSD words and partially matches them to it in order of the word count. LSD has the following words:
- Abnormal
- delayed
- delayed hypersensitivity
- hypersensitivity
- skin
- skin test
- test
As a result, the script matches the following words.
Each word has its Japanese translation as follows: "異常", "遅延型過敏症", "皮膚テスト".
Using this script, we got 2,034 exactly matches and 9,181 partial/compositional matches. Since HPO has 11,516 words, 97.3% ( =(2,034 + 9,181)/11,516 ) of them are matched anyway.
Issues
- We need to lemmatize words, but Yasunori failed to install a module.
Due to this issue, the script cannot match "infections" to "infection". -> fixed! by obtaining a portion of the source code of the module. - We have to ask domain experts to see if the compositionally translated words are appropriate.
The following files are put on a github repository.
- A script to match LSD words to HPO terms
- A script to print a matching result
- A resultant file
https://github.com/yayamamo/BH15
Issues
- Some words match inappropriately such as matching of "or" to "オッズ比" ("Odds Ratio" in English).
Progress of development of HPO-Japanese
(11425 terms in total)
- Elements of Morphology: Standard Terminology (Hiroyuki) source ed: http://onlinelibrary.wiley.com/doi/10.1002/ajmg.a.v149a:1/issuetoc : ~400 terms (3.5%)
- The Japanese Association of Medical Sciences Medical Term Dictionary (Soichi) : 1807 terms (15.8%)
The Japanese Association of Medical Sciences Medical Term Dictionary is an authorized medical terminology in Japanese. This dictionary and the Elements of Morphology: Standard Terminology should be used for HPO-Japanese.
- Life Science Dictionary (LSD) (Yasunori) :
Evidence codes
Evidence of translation will be distinguished by evidence codes: medical-society approved dictionary, expert curated, general dictionary, no corresponding concept (really exist?), and so on.
Plan to release HPO-Japanese
The remaining terms are planned to be curated by experts using social curation platform; e.g., phenodisquss software (will be developed until the end of this November by Orphanet)(Tudor). We will plan to hold a small-scale development jamboree for HPO-Japanese focusing on the terms in the targeted field by Japan IRUD project.
HPO-Japanese will be linked to Mammalian Phenotype (MP). Link to MP will contribute to design of experiment using experimental animals. MP will also contribute to translation of the remaining terms (Terue).
We are plan to publish the first release of HPO-Japanese in March 2016. At the same time, we will publish the paper about release of HPO-Japanese.
- Patient Archive (Australia UDP)(Tudor)