Skip to content

Ajatt-Tools/yomichan-bccwj-frequency-dictionary

ย 
ย 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

4 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

About

This repository contains the source code of a script which is used to generate a frequency dictionary for use with Yomichan. It uses the data from Balanced Corpus of Contemporary Written Japanese (BCCWJ), supporting both short and long unit words. The generated dictionary file does not contain part-of-speech information, as Yomichan does not currently support this.

Links

Usage

Prerequisites

This script uses a component from Yomichan's implementation, specifically the JapaneseUtil class from japanese-util.js.

This file must be manually copied into the same directory as main.js in order for the script to work.

Running

A node script is used to generate the dictionary data:

node main.js path/to/bccwj-data.tsv ./output [long-unit-words] [min-frequency]
  • [long-unit-words] (optional) - true if using the long unit words (LUW) list; false otherwise.
  • [min-frequency] (optional) - Integer representing the minimum number of occurrences. Default is 0.

The data can then be added to a .zip archive using any software. The example below uses the 7z command line executable to generate the archive:

7z a -tzip -mx=9 -mm=Deflate -mtc=off -mcu=on BCCWJ-SUW.zip ./output/*.json

About

๐Ÿ  Script to create a frequency dictionary for Rikaitan

Topics

Resources

License

Stars

Watchers

Forks

Languages

  • JavaScript 100.0%