New Features
- Exceptional handling of
emoji
,hashtag
andmention
tokens by word tokenizers. Refer tosadedegel config
for details.- Options also into
Text2Doc
text to sadedegelDocument
converter
- Options also into
- [Incomplete]
HashVectorizer
(Works far better than TfIdf or BM25 vectorization for majority of the prebuilt models) unary
option foridf
Datasets
We do keep adding new datasets with this new release. Refer to Dataset ReadMe for details.
- Customer Review dataset
- Telco (Turkcell) Sentiment dataset
- Movie Sentiment dataset
- Hotel Sentiment dataset
- Categorized Product Sentiment dataset
Prebuilt Models
We do keep adding new prebuilt models with this new release. Refer to Prebuilt Model ReadMe for details.
- Turkish Movie Review Sentiment Classification
- Telco Brand Tweet Sentiment Classification
- Turkish Customer Reviews Classification
Others
- Lazy evaluation of word
shape
property
Behavioural Changes
- Significant behavior change on
tokens
property. Previously property returnsList[str]
, nowList[Token]
- Sentence Tokenizer is renamed to be Sentence Boundary Detector to prevent confusion with Word Tokenizer
Contribution
- Welcome our new contributor @ertugruldemir