Skip to content

Latest commit

 

History

History
50 lines (36 loc) · 3.88 KB

README.en.md

File metadata and controls

50 lines (36 loc) · 3.88 KB

Diceware Chinese wordlists

Diceware Wubi wordlist and Pinyin wordlist consist of words from Lexicon of Common Words in Contemporary Chinese encoded/transliterated to English alphabet using Wubi and Pinyin (two most popular Chinese input methods) respectively. They can be used to generate complex, memorable Diceware passphrases.

阅读中文版本

Introduction to Diceware

Diceware, created by Arnold Reinhold, is a method to generate passphrases. To use Diceware, you need a wordlist and at least one dice. The wordlist should consist of 7776 (=65) distinct words, each corresponding to a unique five-digit "dice index" (e.g. "46134"). Every five dice throws can then randomly choose a word. Every 30 dice throws can generate a complex six-word passphrase. Such passphrases are typically more memorable than passwords (built from random characters) of equivalent complexity.

The original Diceware wordlist is in English, and contains many obscure words, making it unsuitable for most native Chinese speakers. Since then, many other versions of Diceware wordlist have been compiled and in various languages, but no complete Chinese wordlist can be found on the Internet as of June 2017. So I decided to compile one (or two) myself.

Details of Chinese wordlists

  • Diceware Chinese wordlists consist of three columns, namely dice index - Wubi/Pinyin encoding - Chinese characters, separated from each other by a tab. This is one column more than wordlists for most other languages.
  • When using Diceware Chinese wordlist, use the Wubi/Pinyin encoding to build your passphrase. Chinese characters are effectively only mnemonics.
  • Wubi wordlist consists of only two-character words, all encoded with only 4 English letters.
  • Pinyin wordlist consists of words encoded in 3 to 6 English letters. Average encoding length is 5.54.
  • Encodings of both wordlists form prefix codes, so there's no need to add a space or special character in between words.
  • All Chinese wordlist files are encoded in UTF-8.

8k wordlists and passphrase generator

Important, high-security passphrases should be generated with physical dice, but regular passphrases for average users may be generated on secure computers using appropriate apps without serious problems. However, to ensure uniform distribution of word choices, it's best to modify wordlists to have lengths of a power of 2. Wubi8k wordlist and Pinyin8k wordlist are designed with this purpose in mind.

Based on these two 8k wordlists, I wrote a cross-platform Chinese passphrase generator in C++, using Qt and libsodium. Randomness of word choices are provided through libsodium's random data generator.
Windows binary can be downloaded on the release page.

Sources

Licenses

Content of all wordlists (*.wordlist files) are licensed under Creative Commons Attribution 4.0 License. Source code of passphrase generator apps is licensed under GNU General Public License.