Releases: mesolitica/malaya
Releases · mesolitica/malaya
Version 5.1
- Purged Tensorflow, no longer needed it.
- Added Malay dictionary module, https://malaya.readthedocs.io/en/stable/dictionary-malay.html
- Syllable now use PyTorch LSTM, https://malaya.readthedocs.io/en/stable/load-tokenizer-syllable.html
- Pretrained Transformer now use PyTorch HuggingFace, https://malaya.readthedocs.io/en/stable/load-transformer.html
- Masked LM scorer now use PyTorch HuggingFace, https://malaya.readthedocs.io/en/stable/load-mlm.html
- Causal LM scorer now use PyTorch HuggingFace, https://malaya.readthedocs.io/en/stable/load-gpt2-lm.html
- Stemmer now use PyTorch LSTM, https://malaya.readthedocs.io/en/stable/load-stemmer.html
- Jawi now use T5 HuggingFace, support Rumi-to-Jawi and Jawi-to-Rumi, https://malaya.readthedocs.io/en/stable/load-jawi.html
- Kesalahan Tatabahasa now use T5 HuggingFace, https://malaya.readthedocs.io/en/stable/load-tatabahasa-tagging.html
- Emotion Analysis now use T5 Encoder HuggingFace, https://malaya.readthedocs.io/en/stable/load-emotion.html
- Sentiment Analysis now use T5 Encoder HuggingFace, https://malaya.readthedocs.io/en/stable/load-sentiment.html
- Added Embedding module, https://malaya.readthedocs.io/en/stable/load-embedding.html
- Added Reranker module, https://malaya.readthedocs.io/en/stable/load-reranker.html
- Semantic Similarity now use T5 Encoder HuggingFace, https://malaya.readthedocs.io/en/stable/load-similarity-semantic.html
- Entities Recognition now use T5 Encoder HuggingFace, https://malaya.readthedocs.io/en/stable/load-entities.html
- Part-of-Speech now use T5 Encoder HuggingFace, https://malaya.readthedocs.io/en/stable/load-pos.html
- Dependency Parsing now use T5 Encoder HuggingFace, https://malaya.readthedocs.io/en/stable/load-dependency.html
- Constituency Parsing now use T5 Encoder HuggingFace, https://malaya.readthedocs.io/en/stable/load-constituency.html
- Now Translation module use
from
andto
parameters, https://malaya.readthedocs.io/en/stable/load-translation.html - Zero-shot classification now use T5 Encoder HuggingFace, https://malaya.readthedocs.io/en/stable/load-zeroshot-classification.html
- Text-to-KG now use T5 HuggingFace, https://malaya.readthedocs.io/en/stable/text-to-kg.html
Version 5.0
- Started initial mixed language knowledge graph toolkit, https://malaya.readthedocs.io/en/latest/knowledge-graph-toolkit.html
- Released Abstractive Augmentation, able to convert standard structure to local / social media structure while maintaining the same polarity, standard EN -> local MS, standard MS -> local MS, https://malaya.readthedocs.io/en/latest/load-augmentation-abstractive.html
- Now Encoder based (WordVector, Encoder models) Augmentation will be under
malaya.augmentation.encoder
, https://malaya.readthedocs.io/en/latest/load-augmentation-encoder.html - Now Rules based Augmentation will be under
malaya.augmentation.rules
, https://malaya.readthedocs.io/en/latest/load-augmentation-rules.html - Released HuggingFace T5 models for True Case module, also able to infer mixed language, https://malaya.readthedocs.io/en/latest/load-augmentation-rules.html
- Released HuggingFace T5 models for Segmentation module, also able to infer mixed language, https://malaya.readthedocs.io/en/latest/load-segmentation-huggingface.html
- Released HuggingFace T5 models for Abstractive Normalizer, end-to-end Text Normalization, also able to infer mixed language, https://malaya.readthedocs.io/en/latest/load-segmentation-huggingface.html
- Now Rules based Normalizer will be under
malaya.normalizer.rules
, https://malaya.readthedocs.io/en/latest/load-normalizer.html - Released HuggingFace T5 models for Kesalahan Tatabahasa, https://malaya.readthedocs.io/en/latest/load-tatabahasa-tagging-huggingface.html
- Now Prefix Text Generator will be under
malaya.generator.prefix
, https://malaya.readthedocs.io/en/latest/load-prefix-generator.html - Now Isi Penting Text Generator will be under
malaya.generator.isi_penting
, https://malaya.readthedocs.io/en/latest/load-isi-penting-generator.html - Released HuggingFace T5 models for Isi Penting Generator, with Article style and isi penting also can be mixed language, https://malaya.readthedocs.io/en/latest/load-isi-penting-generator-huggingface-article-style.html
- Released HuggingFace T5 models for Isi Penting Generator, with News Headline style and isi penting also can be mixed language, https://malaya.readthedocs.io/en/latest/load-isi-penting-generator-huggingface-headline-news-style.html
- Released HuggingFace T5 models for Isi Penting Generator, with Karangan style and isi penting also can be mixed language, https://malaya.readthedocs.io/en/latest/load-isi-penting-generator-huggingface-karangan-style.html
- Released HuggingFace T5 models for Isi Penting Generator, with News style and isi penting also can be mixed language, https://malaya.readthedocs.io/en/latest/load-isi-penting-generator-huggingface-news-style.html
- Released HuggingFace T5 models for Isi Penting Generator, with Product Description style and isi penting also can be mixed language, https://malaya.readthedocs.io/en/latest/load-isi-penting-generator-huggingface-product-description-style.html
- Released HuggingFace T5 models for Paraphrase module, https://malaya.readthedocs.io/en/latest/load-paraphrase-huggingface.html
- Now Doc2Vec based text similarity will be under
malaya.similarity.doc2vec
, https://malaya.readthedocs.io/en/latest/load-doc2vec-similarity.html - Now Semantic text similarity will be under
malaya.similarity.semantic
, https://malaya.readthedocs.io/en/latest/load-semantic-similarity.html - Released HuggingFace T5 models for Semantic Similarity, also able to infer mixed language, https://malaya.readthedocs.io/en/latest/load-semantic-similarity-huggingface.html
- Released HuggingFace T5 models for Dependency Parsing, https://malaya.readthedocs.io/en/latest/load-dependency-huggingface.html
- Released HuggingFace T5 models for Abstractive Summarization, also able to infer mixed language, https://malaya.readthedocs.io/en/latest/load-abstractive-huggingface.html
- Released HuggingFace T5 models for MS-EN Translation, https://malaya.readthedocs.io/en/latest/load-translation-ms-en-huggingface.html
- Released HuggingFace T5 models for noisy MS-EN Translation, end-to-end mixed language translation to EN, https://malaya.readthedocs.io/en/latest/load-translation-noisy-ms-en-huggingface.html
- Released HuggingFace T5 models for EN-MS Translation, https://malaya.readthedocs.io/en/latest/load-translation-en-ms-huggingface.html
- Released HuggingFace T5 models for noisy EN-MS Translation, end-to-end mixed language translation to MS, https://malaya.readthedocs.io/en/latest/load-translation-noisy-en-ms-huggingface.html
- Now Extractive QA will be under
malaya.qa.extractive
, https://malaya.readthedocs.io/en/latest/load-qa-extractive.html - Released HuggingFace T5 models for Extractive QA, also able to infer mixed language, https://malaya.readthedocs.io/en/latest/load-qa-extractive-huggingface.html
- Released HuggingFace T5 models for ZeroShot Classification, also able to infer mixed language, https://malaya.readthedocs.io/en/latest/load-zeroshot-classification-huggingface.html
- Released HuggingFace T5 models for ZeroShot Entity Recognition, also able to infer mixed language, https://malaya.readthedocs.io/en/latest/zeroshot-ner.html
- Now Decomposition based Topic Modeling will be under
malaya.topic_model.decomposition
, https://malaya.readthedocs.io/en/latest/load-topic-model-decomposition.html - Now LDA2Vec based Topic Modeling will be under
malaya.topic_model.lda2vec
, https://malaya.readthedocs.io/en/latest/load-topic-model-lda2vec.html - Now Transformer based Topic Modeling will be under
malaya.topic_model.transformer
, https://malaya.readthedocs.io/en/latest/load-topic-model-transformer.html - Added BERTopic inferface for Topic Modeling, https://malaya.readthedocs.io/en/latest/load-topic-model-bertopic.html
- Released HuggingFace T5 models for Abstractive Keyword, also able to infer mixed language, https://malaya.readthedocs.io/en/latest/load-abstractive-keyword-huggingface.html
- Now Extractive Keyword will be under
malaya.keyword.extractive
, https://malaya.readthedocs.io/en/latest/load-keyword-extractive.html - Released HuggingFace interface for Transformer, https://malaya.readthedocs.io/en/latest/load-transformer-huggingface.html
Version 4.9.2
- Released Masked language model text scoring, https://malaya.readthedocs.io/en/latest/load-mlm.html
- Released GPT2 language model text scoring, https://malaya.readthedocs.io/en/latest/load-gpt2-lm.html
- Compare spelling correction results using KenLM, Masked LM and GPT2 LM, https://malaya.readthedocs.io/en/latest/load-gpt2-lm.html
- Added deep learning based for syllable tokenizer, with WER accuracy is 4.3% while rules based WER accuracy is 9.01%, https://malaya.readthedocs.io/en/latest/load-tokenizer-syllable.html
- Starting 4.9.2,
pytorch
andtransformers
are necessary libraries for Malaya.
Version 4.9.1
- Added pretrained KenLM models, trained on https://github.com/huseinzol05/malay-dataset/tree/master/dumping/clean, https://malaya.readthedocs.io/en/latest/load-kenlm.html
- Improved spelling correction interface, under
malaya.spelling_correction.*
. - Improved JamSpell spelling correction interface, https://malaya.readthedocs.io/en/latest/load-spelling-correction-jamspell.html
- Improved speed and accuracy Probability spelling correction, https://malaya.readthedocs.io/en/latest/load-spelling-correction-probability.html
- Added Probability LM, probability + KenLM spelling correction, a better scoring based on sentence context, https://malaya.readthedocs.io/en/latest/load-spelling-correction-probability-lm.html
- Improved Spylls spelling correction interface, https://malaya.readthedocs.io/en/latest/load-spelling-correction-probability-lm.html
- Improved SymSpeller spelling correction interface, https://malaya.readthedocs.io/en/latest/load-spelling-correction-symspell.html
- Improved Transformer Encoder spelling correction interface, https://malaya.readthedocs.io/en/latest/load-spelling-correction-encoder-transformer.html
- Improved Seq2Seq Transformer spelling correction interface, https://malaya.readthedocs.io/en/latest/load-spelling-correction-transformer.html
- Added Syllable tokenizer, https://malaya.readthedocs.io/en/latest/load-tokenizer-syllable.html
- Added stemmer trained on noisy dataset to achieve better stemming for local language structure, https://malaya.readthedocs.io/en/latest/load-stemmer.html#Sensitive-towards-local-language-structure
- Improved normalizer, now able to add stemmer and add more parameters, https://malaya.readthedocs.io/en/latest/load-normalizer.html
Version 4.9
- Released EN-MS translation alignment using Eflomal, https://malaya.readthedocs.io/en/latest/alignment-en-ms-eflomal.html
- Released EN-MS translation alignment using HuggingFace, https://malaya.readthedocs.io/en/latest/alignment-en-ms-huggingface.html
- Released MS-EN translation alignment using Eflomal, https://malaya.readthedocs.io/en/latest/alignment-ms-en-eflomal.html
- Released MS-EN translation alignment using HuggingFace, https://malaya.readthedocs.io/en/latest/alignment-ms-en-huggingface.html
- Now preprocessing able to use NMT, https://malaya.readthedocs.io/en/latest/load-preprocessing.html#Load-translation
- Released Demoji module, https://malaya.readthedocs.io/en/latest/load-demoji.html
- Added transformer for Rumi-Jawi converter, https://malaya.readthedocs.io/en/latest/load-rumi-jawi.html, BASE size model WER 0.043%
- Added transformer for Jawi-Rumi converter, https://malaya.readthedocs.io/en/latest/load-rumi-jawi.html, BASE size model WER 0.3%
- Added substring language detection combined rules based and deep learning model, https://malaya.readthedocs.io/en/latest/language-detection-words.html
- Added EN-MS translation trained on noisy dataset to have better translation on local context, https://malaya.readthedocs.io/en/latest/load-translation-noisy-en-ms.html
- Added EN-MS translation using HuggingFace trained on noisy dataset to have better translation on local context, https://malaya.readthedocs.io/en/latest/load-translation-noisy-en-ms-huggingface.html
- Added MS-EN translation trained on noisy dataset to have better translation on local context, https://malaya.readthedocs.io/en/latest/load-translation-noisy-ms-en.html
- Added EN-MS translation using HuggingFace trained on noisy dataset to have better translation on local context, https://malaya.readthedocs.io/en/latest/load-translation-noisy-ms-en-huggingface.html
- Now normalizer able to translate, https://malaya.readthedocs.io/en/latest/load-normalizer.html#Use-translator
- Now normalizer able to group similar subword languages and translate to get better local context translation, https://malaya.readthedocs.io/en/latest/load-normalizer.html#Problem-with-single-word-translation
- Now normalizer able to segment words, https://malaya.readthedocs.io/en/latest/load-normalizer.html#Use-segmenter
- Now normalizer able to normalize emoji, https://malaya.readthedocs.io/en/latest/load-normalizer.html#Normalize-emoji
Version 4.8
- Released Grapheme-to-Phoneme module, https://malaya.readthedocs.io/en/latest/load-phoneme.html, phonetic get from https://prpm.dbp.gov.my/ Glosari Dialek.
- Released Rumi-to-Jawi module, https://malaya.readthedocs.io/en/latest/load-rumi-jawi.html
- Released Jawi-to-Rumi module, https://malaya.readthedocs.io/en/latest/load-jawi-rumi.html
Version 4.7.5
- Improved Word Tokenizer, https://malaya.readthedocs.io/en/latest/load-tokenizer.html
- Improved Normalizer for better speech synthesis, https://malaya.readthedocs.io/en/latest/load-normalizer.html
- By default use HuggingFace as backend repository.
Version 4.7.4
- Full support HuggingFace for pretrained and finetuned models, check how to use HuggingFace as model repository, https://malaya.readthedocs.io/en/latest/huggingface-repository.html
- Added full unit tests for pretrained and finetuned models at https://github.com/huseinzol05/malaya/tree/master/tests
Version 4.7.3
- Improved Regex for urls.
- Now
predict_words
able to do in Jupyter Notebook.
Version 4.7.2
- Improved sentiment module, now default label is
['negative', 'neutral', 'positive']
, and use better dataset iterate using active learning, https://malaya.readthedocs.io/en/latest/load-sentiment.html - Dataset can get at https://github.com/huseinzol05/malay-dataset/tree/master/sentiment/semisupervised-twitter-3class, label studio labelling for general tweets at https://label.malaysiaai.ml/projects/12/data, label studio labelling for political tweets at https://label.malaysiaai.ml/projects/16/data, get access at https://github.com/malaysia-ai/label-studio#how-to-get-access