Skip to content

A Go port of a fast converter between Japanese hankaku and zenkaku characters.

License

Notifications You must be signed in to change notification settings

rusq/gomojimoji

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

(go) mojimoji

Go Reference

This is a port of the excellent mojimoji library written in Python to Golang.

It provides two functions:

  • HanToZen - half-width to full-width character conversion.
  • ZenToHan - half-width to full-width character conversion.

Each of the functions allow the following options:

  • ASCII - enable or disable ASCII translation.
  • Digits - enable or disable Digits translation.
  • Kana - enable or disable Kana translation.

All options are enabled by default, see examples on their usage.

Logic is implemented as of commit aca2661.

Examples

HanToZen

fmt.Println(HanToZen("ニュージーランド"))
fmt.Println(HanToZen("ニュージーランド Auckland 6012", ASCII(true), Digits(false), Kana(false)))

// Output:
// ニュージーランド
// ニュージーランド Auckland 6012

ZenToHan

fmt.Println(ZenToHan("ニュージーランド"))
fmt.Println(ZenToHan("ニュージーランド Auckland 0123", Kana(false), Digits(true)))

// Output:
// ニュージーランド
// ニュージーランド Auckland 0123

Benchmark

Original library etc.

Original mojimoji, zenhan and unicodedata on my system, for comparison:

In [4]: s = u'ABCDEFG012345' * 10

In [5]: %time for n in range(1000000): mojimoji.zen_to_han(s)
CPU times: user 3.24 s, sys: 1.28 ms, total: 3.24 s
Wall time: 3.24 s

In [6]: %time for n in range(1000000): zenhan.z2h(s)
CPU times: user 26.2 s, sys: 16.3 ms, total: 26.2 s
Wall time: 26.2 s

In [7]: %time for n in range(1000000): unicodedata.normalize('NFKC', s)
CPU times: user 3.12 s, sys: 15.4 ms, total: 3.13 s
Wall time: 3.14 s

This library

ZenToHan and HanToZen use different approaches:

  • ZenToHan uses string.Builder, which is simpler to implement.
  • HanToZen uses direct slice operations to allow for seeking when needed.

ZenToHan:

mojimoji (master)> go test -bench=BenchmarkZenToHanConv
goos: darwin
goarch: amd64
pkg: github.com/rusq/gomojimoji
cpu: Intel(R) Core(TM) i9-9880H CPU @ 2.30GHz
BenchmarkZenToHanConv-16               1        2880823810 ns/op
--- BENCH: BenchmarkZenToHanConv-16
    mojimoji_test.go:98: 2.88079814s
PASS
ok      github.com/rusq/gomojimoji      2.977s

HanToZen:

mojimoji (master)> go test -bench=BenchmarkHanToZen    
goos: darwin
goarch: amd64
pkg: github.com/rusq/gomojimoji
cpu: Intel(R) Core(TM) i9-9880H CPU @ 2.30GHz
BenchmarkHanToZenConv-16               1        2712209539 ns/op
--- BENCH: BenchmarkHanToZenConv-16
    mojimoji_test.go:107: 2.712166151s
PASS
ok      github.com/rusq/gomojimoji      2.804s

About

A Go port of a fast converter between Japanese hankaku and zenkaku characters.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages