Skip to content

Latest commit

 

History

History
195 lines (131 loc) · 6.73 KB

README_en.rst

File metadata and controls

195 lines (131 loc) · 6.73 KB

A tool for converting Chinese characters to pinyin (Python version)

Build GitHubAction Coverage PyPI version DOI

Takes Chinese characters and converts them to pinyin, zhuyin, and Cyrillic.

Based on hotoo/pinyin

  • Finds the most fitting pinyin based on phrase occurences.
  • Has support for characters with two or more readings (heteronyms).
  • Has support for simplified, traditional characters, and zhuyin (also known als bopomofo).
  • Has support for multiple styles of pinyin and zhuyin (e.g. tone conventions).
$ pip install pypinyin

Python 3 (For below Python 2, change '中心' to u'中心'):

>>> from pypinyin import pinyin, lazy_pinyin, Style
>>> pinyin('中心')
[['zhōng'], ['xīn']]
>>> pinyin('中心', heteronym=True)  # make use of heteronym mode
[['zhōng', 'zhòng'], ['xīn']]
>>> pinyin('中心', style=Style.FIRST_LETTER)  # set the pinyin style
[['z'], ['x']]
>>> pinyin('中心', style=Style.TONE2, heteronym=True)
[['zho1ng', 'zho4ng'], ['xi1n']]
>>> pinyin('中心', style=Style.TONE3, heteronym=True)
[['zhong1', 'zhong4'], ['xin1']]
>>> pinyin('中心', style=Style.BOPOMOFO)  # zhuyin mode
[['ㄓㄨㄥ'], ['ㄒㄧㄣ']]
>>> lazy_pinyin('中心')  # don't include tone information or heteronyms
['zhong', 'xin']

Please take note

  • Pinyin results will have no indicators for syllables with a neutral tone,

neither diacritics or numbers. (For the use of '5' for neutral tones, see article). * Lazy pinyin results will use 'v' for 'ü' (for using 'ü', see article).

Command line tools:

$ pypinyin 音乐
yīn yuè
$ pypinyin -h

For more details, see article

For project development related question, please refer to development documents.

A database of pinyin phrases are used to solve the heteronym problem. If there turns out to be a mistake, you can use custom pinyin phrases to adapt the database:

>>> from pypinyin import Style, pinyin, load_phrases_dict
>>> pinyin('步履蹒跚')
[['bù'], ['lǚ'], ['mán'], ['shān']]
>>> load_phrases_dict({'步履蹒跚': [['bù'], ['lǚ'], ['pán'], ['shān']]})
>>> pinyin('步履蹒跚')
[['bù'], ['lǚ'], ['pán'], ['shān']]

For more details, see article.

>>> from pypinyin import Style, pinyin
>>> pinyin('下雨天', style=Style.INITIALS)
[['x'], [''], ['t']]

Because according to the standard pinyin rules (《汉语拼音方案》), 'y', 'w', and 'ü' ('yu') are not counted as syllable initials.

** If this causes you inconvenience, please also be aware of characters without an initial like '啊' ('a'), '饿' ('e'), '按' ('an'), '昂' ('ang'), etc. In this case you might need 'FIRST_LETTER' mode.

—— @hotoo

reference: hotoo/pinyin#57, #22, #27, #44

If this is not the desired behaviour, that is if you want 'y' to be counted as an initial, use 'strict=False'.

>>> from pypinyin import Style, pinyin
>>> pinyin('下雨天', style=Style.INITIALS)
[['x'], [''], ['t']]
>>> pinyin('下雨天', style=Style.INITIALS, strict=False)
[['x'], ['y'], ['t']]

If you don't care too much about the correctness of pinyin, you can use the environmental parameters 'PYPINYIN_NO_PHRASES' and 'PYPINYIN_NO_DICT_COPY' to reduce internal memory load. For more details, see article

For more FAQ: FAQ