Skip to content

Commit

Permalink
other: improve data build workflow
Browse files Browse the repository at this point in the history
basically just merged package/data/ and the originally untracked tools/#data/ into data/
  • Loading branch information
DGCK81LNN committed Mar 23, 2024
1 parent a854666 commit ee2dab9
Show file tree
Hide file tree
Showing 20 changed files with 151,762 additions and 3 deletions.
10,687 changes: 10,687 additions & 0 deletions data/210430.tsv

Large diffs are not rendered by default.

10,878 changes: 10,878 additions & 0 deletions data/211205.tsv

Large diffs are not rendered by default.

10,884 changes: 10,884 additions & 0 deletions data/220708.tsv

Large diffs are not rendered by default.

10,888 changes: 10,888 additions & 0 deletions data/221206.tsv

Large diffs are not rendered by default.

10,887 changes: 10,887 additions & 0 deletions data/230529.tsv

Large diffs are not rendered by default.

11,010 changes: 11,010 additions & 0 deletions data/230823.tsv

Large diffs are not rendered by default.

11,010 changes: 11,010 additions & 0 deletions data/230901.tsv

Large diffs are not rendered by default.

11,082 changes: 11,082 additions & 0 deletions data/231021.tsv

Large diffs are not rendered by default.

11,082 changes: 11,082 additions & 0 deletions data/231116.tsv

Large diffs are not rendered by default.

11,055 changes: 11,055 additions & 0 deletions data/240204.tsv

Large diffs are not rendered by default.

11,058 changes: 11,058 additions & 0 deletions data/240205.tsv

Large diffs are not rendered by default.

11,058 changes: 11,058 additions & 0 deletions data/240206.tsv

Large diffs are not rendered by default.

11,072 changes: 11,072 additions & 0 deletions data/240319.tsv

Large diffs are not rendered by default.

File renamed without changes.
File renamed without changes.
9,074 changes: 9,074 additions & 0 deletions data/西丁.tsv

Large diffs are not rendered by default.

20 changes: 20 additions & 0 deletions package/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -469,3 +469,23 @@ interface Transcriber {
~~~

具有 `transcribe` 方法,其接受字符串作为第一个参数、返回 `TranscribeResult` 的类型。本模块中的各转写器类都实现该接口。

## 维护

新字表发布时:

1. 将字表以 TSV 格式(无标题行,第一列为汉字,第二列为希顶聊天字母)保存到 <code>data/<var>6位日期</var>.tsv</code>。

2. 执行 <code>ruby tools/sort.rb data/<var>xxxxxx</var>.tsv</code>,统一字表数据的排序(希顶拼写按希顶字母表顺序简单排序)。

3.`data` 目录中,执行 `node ../tools/append.mjs dict.tsv`:此过程可能会移除部分条目的汉希提示,程序会输出相应的提示,在下一步中可能需要将其补回到适当位置。

4. 检查 `data/dict.tsv``git diff`,补充缺少的注释、提示(如新增多音字时,填写各读音的释义或对应普通话读音等)。

* 使用 `tools/dictgitdiff.sh data/dict.tsv` 可省略 diff 中多余的上下文。当标准输出为控制台时,该脚本会自动使用环境中的 <code>python -m [pygments](https://pygments.org/)</code> 或 [`rougify`](https://rouge.jneen.net/) 为 diff 添加颜色。

5. 如果在注释中使用了数字表示拼音声调的写法(如 `pin1 yin1`),执行 `ruby tools/pinyin.rb data/dict.tsv data/dict.tsv`,将其转换为声调标号。

6. 更新 README 和 `site/app.vue` 中注明的字表版本以及更新记录。

7. 修改 `package/package.json` 中的包版本号,执行 `npm run build:package` 或其他合适的构建指令,为新的版本号创建 Git 标签,发布包更新。
4 changes: 2 additions & 2 deletions package/build/data.js
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ const startTime = Date.now()
const outStream = fs.createWriteStream(pr("../src/data.json"))

const convertPromise = new Promise(resolve => {
const inStream = fs.createReadStream(pr("../data/dict.tsv"))
const inStream = fs.createReadStream(pr("../../data/dict.tsv"))
const rl = readline.createInterface({
input: inStream,
terminal: false,
Expand Down Expand Up @@ -51,7 +51,7 @@ const convertPromise = new Promise(resolve => {
})
})

const miscPromise = fs.promises.readFile(pr("../data/misc.json")).then(b => {
const miscPromise = fs.promises.readFile(pr("../../data/misc.json")).then(b => {
return JSON.stringify(JSON.parse(b.toString()))
})

Expand Down
5 changes: 5 additions & 0 deletions tools/append.mjs
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,11 @@ for (const [h, xs] of entries) {
e.xh = "-"
if (!e.n?.includes("旧拼写"))
e.n = (e.n || "") + (e.n ? "," : "") + "旧拼写"
} else if (table[h][latest]?.includes(x) && e.hh === "-" && e.xh === "-") {
console.warn(`Entry ${h}${x} is no longer deprecated, removing hints`)
e.hh = ""
e.xh = ""
e.n = e.n.replace(/,?旧拼写/, "")
}
}
dictTable[h].sort(
Expand Down
11 changes: 10 additions & 1 deletion tools/dictgitdiff.sh
Original file line number Diff line number Diff line change
@@ -1,5 +1,14 @@
#!/usr/bin/env bash

function highlight {
if [ -t 1 ]; then
if python -c "import pygments" &> /dev/null; then exec python -m pygments -l diff
elif rougify --version &> /dev/null; then exec rougify -l diff
fi
fi
exec cat
}

git diff "$@" > "$0.tmp"
grep -P "^[ +-]($(grep -Po '(?<=^[+-])[^+-](?=\t)' "$0.tmp" | uniq | paste -sd'|'))" "$0.tmp"
grep -P "^[ +-]($(grep -Po '(?<=^[+-])[^+-](?=\t)' "$0.tmp" | sort -u | paste -sd'|'))" "$0.tmp" | highlight
rm "$0.tmp"

0 comments on commit ee2dab9

Please sign in to comment.