Skip to content

Commit

Permalink
initial release 4.8
Browse files Browse the repository at this point in the history
  • Loading branch information
huseinzol05 committed Jun 1, 2022
1 parent 68e0b23 commit 0cc9e73
Show file tree
Hide file tree
Showing 18 changed files with 5,649 additions and 17 deletions.
2 changes: 2 additions & 0 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -63,6 +63,8 @@ Contents:
:caption: Convert Module

load-phoneme
load-rumi-jawi
load-jawi-rumi

.. toctree::
:maxdepth: 2
Expand Down
303 changes: 303 additions & 0 deletions docs/load-jawi-rumi.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,303 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Jawi-to-Rumi"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<div class=\"alert alert-info\">\n",
"\n",
"This tutorial is available as an IPython notebook at [Malaya/example/jawi-rumi](https://github.com/huseinzol05/Malaya/tree/master/example/jawi-rumi).\n",
" \n",
"</div>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<div class=\"alert alert-info\">\n",
"\n",
"This module trained on both standard and local (included social media) language structures, so it is save to use for both.\n",
" \n",
"</div>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Explanation\n",
"\n",
"Originally from https://www.ejawi.net/converterV2.php?go=rumi able to convert Rumi to Jawi using heuristic method. So Malaya convert from heuristic and map it using deep learning model by inverse the dataset.\n",
"\n",
"`چوميل` -> `comel`"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 5.95 s, sys: 1.15 s, total: 7.1 s\n",
"Wall time: 9.05 s\n"
]
}
],
"source": [
"%%time\n",
"import malaya"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Use deep learning model\n",
"\n",
"Load LSTM + Bahdanau Attention Jawi to Rumi model.\n",
"\n",
"If you are using Tensorflow 2, make sure Tensorflow Addons already installed,\n",
"\n",
"```bash\n",
"pip install tensorflow-addons U\n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"```python\n",
"def deep_model(quantized: bool = False, **kwargs):\n",
" \"\"\"\n",
" Load LSTM + Bahdanau Attention Rumi to Jawi model.\n",
" Original size 11MB, quantized size 2.92MB .\n",
" CER on test set: 0.09239719040982326\n",
" WER on test set: 0.33811816744187656\n",
"\n",
" Parameters\n",
" ----------\n",
" quantized : bool, optional (default=False)\n",
" if True, will load 8-bit quantized model.\n",
" Quantized model not necessary faster, totally depends on the machine.\n",
"\n",
" Returns\n",
" -------\n",
" result: malaya.model.tf.Seq2SeqLSTM class\n",
" \"\"\"\n",
"```"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "530a47ea5c514ae9aa68c8a4e1e29d9c",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"HBox(children=(FloatProgress(value=0.0, description='Downloading', max=11034253.0, style=ProgressStyle(descrip…"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n"
]
}
],
"source": [
"model = malaya.jawi_rumi.deep_model()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Load Quantized model\n",
"\n",
"To load 8-bit quantized model, simply pass `quantized = True`, default is `False`.\n",
"\n",
"We can expect slightly accuracy drop from quantized model, and not necessary faster than normal 32-bit float model, totally depends on machine."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"Load quantized model will cause accuracy drop.\n"
]
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "6d1d22a65abd48a28f9a1eb62f2d0c4d",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"HBox(children=(FloatProgress(value=0.0, description='Downloading', max=2926859.0, style=ProgressStyle(descript…"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n"
]
}
],
"source": [
"quantized_model = malaya.jawi_rumi.deep_model(quantized = True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Predict\n",
"\n",
"```python\n",
"def predict(self, strings: List[str], beam_search: bool = False):\n",
" \"\"\"\n",
" Convert to target string.\n",
"\n",
" Parameters\n",
" ----------\n",
" strings : List[str]\n",
" beam_search : bool, (optional=False)\n",
" If True, use beam search decoder, else use greedy decoder.\n",
"\n",
" Returns\n",
" -------\n",
" result: List[str]\n",
" \"\"\"\n",
"```\n",
"\n",
"If want to speed up the inference, set `beam_search = False`."
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"['saya suka makan im',\n",
" 'eak ack kotok',\n",
" 'aisuk berthday saya, jegan lupa bawak hadiah']"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"model.predict(['ساي سوك ماكن ايم', 'اياق اچق كوتوق', 'ايسوق بيرثداي ساي، جڬن لوڤا باوق هديه'])"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"['saya suka makan im',\n",
" 'eak ack kotok',\n",
" 'aisuk berthday saya, jegan lopa bawak hadiah']"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"quantized_model.predict(['ساي سوك ماكن ايم', 'اياق اچق كوتوق', 'ايسوق بيرثداي ساي، جڬن لوڤا باوق هديه'])"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.7"
},
"varInspector": {
"cols": {
"lenName": 16,
"lenType": 16,
"lenVar": 40
},
"kernels_config": {
"python": {
"delete_cmd_postfix": "",
"delete_cmd_prefix": "del ",
"library": "var_list.py",
"varRefreshCmd": "print(var_dic_list())"
},
"r": {
"delete_cmd_postfix": ") ",
"delete_cmd_prefix": "rm(",
"library": "var_list.r",
"varRefreshCmd": "cat(var_dic_list()) "
}
},
"types_to_exclude": [
"module",
"function",
"builtin_function_or_method",
"instance",
"_Feature"
],
"window_display": false
}
},
"nbformat": 4,
"nbformat_minor": 4
}
Loading

0 comments on commit 0cc9e73

Please sign in to comment.