🏅Khmer natural language processing toolkit🏅

🎯TODO

💪Installation

pip install khmer-nltk

🏹 Quick tour

[Blog]

To get the evaluation result of khmer-nltk's functionalities, please refer the sub-modules's readme

Sentence tokenization

>>> from khmernltk import sentence_tokenize
>>> raw_text = "ខួបឆ្នាំទី២៨! ២៣ តុលា ស្មារតីផ្សះផ្សាជាតិរវាងខ្មែរនិងខ្មែរ ឈានទៅបញ្ចប់សង្រ្គាម នាំពន្លឺសន្តិភាព និងការរួបរួមជាថ្មី"
>>> print(sentence_tokenize(raw_text))
['ខួបឆ្នាំទី២៨!', '២៣ តុលា ស្មារតីផ្សះផ្សាជាតិរវាងខ្មែរនិងខ្មែរ ឈានទៅបញ្ចប់សង្រ្គាម នាំពន្លឺសន្តិភាព និងការរួបរួមជាថ្មី']

Word tokenization

>>> from khmernltk import word_tokenize
>>> raw_text = "ខួបឆ្នាំទី២៨! ២៣ តុលា ស្មារតីផ្សះផ្សាជាតិរវាងខ្មែរនិងខ្មែរ ឈានទៅបញ្ចប់សង្រ្គាម នាំពន្លឺសន្តិភាព និងការរួបរួមជាថ្មី"
>>> print(word_tokenize(raw_text, return_tokens=True))
['ខួប', 'ឆ្នាំ', 'ទី', '២៨', '!', ' ', '២៣', ' ', 'តុលា', ' ', 'ស្មារតី', 'ផ្សះផ្សា', 'ជាតិ', 'រវាង', 'ខ្មែរ', 'និង', 'ខ្មែរ', ' ', 'ឈាន', 'ទៅ', 'បញ្ចប់', 'សង្រ្គាម', ' ', 'នាំ', 'ពន្លឺ', 'សន្តិភាព', ' ', 'និង', 'ការរួបរួម', 'ជាថ្មី']

POS Tagging

Usage

>>> from khmernltk import pos_tag
>>> raw_text = "ខួបឆ្នាំទី២៨! ២៣ តុលា ស្មារតីផ្សះផ្សាជាតិរវាងខ្មែរនិងខ្មែរ ឈានទៅបញ្ចប់សង្រ្គាម នាំពន្លឺសន្តិភាព និងការរួបរួមជាថ្មី"
>>> print(pos_tag(raw_text))
[('ខួប', 'n'), ('ឆ្នាំ', 'n'), ('ទី', 'n'), ('២៨', '1'), ('!', '.'), (' ', 'n'), ('២៣', '1'), (' ', 'n'), ('តុលា', 'n'), (' ', 'n'), ('ស្មារតី', 'n'), ('ផ្សះផ្សា', 'n'), ('ជាតិ', 'n'), ('រវាង', 'o'), ('ខ្មែរ', 'n'), ('និង', 'o'), ('ខ្មែរ', 'n'), (' ', 'n'), ('ឈាន', 'v'), ('ទៅ', 'v'), ('បញ្ចប់', 'v'), ('សង្រ្គាម', 'n'), (' ', 'n'), ('នាំ', 'v'), ('ពន្លឺ', 'n'), ('សន្តិភាព', 'n'), (' ', 'n'), ('និង', 'o'), ('ការរួបរួម', 'n'), ('ជាថ្មី', 'o')]

✍️ Citation

@misc{hoang-khmer-nltk,
  author = {Phan Viet Hoang},
  title = {Khmer Natural Language Processing Tookit},
  year = {2020},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/VietHoang1512/khmer-nltk}}
}

Used in:

👨‍🎓 References

📜 Advisor

Prof. Huong Le Thanh

Name		Name	Last commit message	Last commit date
Latest commit History 67 Commits
.circleci		.circleci
bin		bin
docs		docs
khmernltk		khmernltk
samples		samples
scripts		scripts
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🏅Khmer natural language processing toolkit🏅

🎯TODO

💪Installation

🏹 Quick tour

Sentence tokenization

Word tokenization

POS Tagging

Usage

✍️ Citation

Used in:

👨‍🎓 References

📜 Advisor

About

Releases 2

Packages

Contributors 2

Languages

License

VietHoang1512/khmer-nltk

Folders and files

Latest commit

History

Repository files navigation

🏅Khmer natural language processing toolkit🏅

🎯TODO

💪Installation

🏹 Quick tour

Sentence tokenization

Word tokenization

POS Tagging

Usage

✍️ Citation

Used in:

👨‍🎓 References

📜 Advisor

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 2

Packages 0

Contributors 2

Languages

Packages