text-normalization
Here are 47 public repositories matching this topic...
Cryptocurrency Market Analysis and Question Answering System
-
Updated
Sep 24, 2024 - Python
Inneall aistriúcháin atá taobh thiar de Chaighdeánaitheoir na Gaeilge, agus aistritheoirí Gàidhlig/Gaelg→Gaeilge
-
Updated
Sep 14, 2024 - Perl
JS / Python3 / PHP Lib to work with UTF8 polytonic greek and latin
-
Updated
Sep 11, 2024 - JavaScript
A JavaScript library for accent-insensitive text processing, including accent folding and search term highlighting
-
Updated
Oct 14, 2024 - JavaScript
An online text normalization tool for Chinese-English mixed text-to-speech system
-
Updated
Aug 23, 2024 - Python
Useful String extensions to save you time in production.
-
Updated
Aug 22, 2024 - Dart
Clipboard Translator is a lightweight desktop application built with PyQt5 that automatically translates text copied to the clipboard into Persian using the Google Translate API. The application features a modern and minimalistic UI, custom styling, and real-time text normalization and tokenization.
-
Updated
Jul 31, 2024 - Python
📢 Tha (ថា) - A Khmer Text Normalization and Verbalization Toolkit
-
Updated
Jul 26, 2024 - Python
Code, models, and data for "Exploiting Dialect Identification in Automatic Dialectal Text Normalization". ArabicNLP 2024, ACL.
-
Updated
Jul 6, 2024 - Python
This repository provides a complete workflow for text processing using Hugging Face Transformers and NLTK. It includes modules for sentence normalization, spelling correction, word embedding generation, positional encoding computation, and English-to-French translation
-
Updated
Jun 18, 2024 - Jupyter Notebook
Simple tool to check if Unicode text files are Unicode-normalized
-
Updated
May 31, 2024 - Python
Predict emotions (happiness, anger, sadness) from WhatsApp chat data using machine learning and deep learning models. Includes text normalization, vectorization (TF-IDF, BoW, Word2Vec, GloVe), and model evaluation.
-
Updated
May 28, 2024 - Jupyter Notebook
Accurate categorization of eCommerce products improves user experience and boosts search engine visibility. The project goal is to classify products into 14 predefined categories using their descriptions sourced from an eCommerce platform.
-
Updated
May 19, 2024 - Jupyter Notebook
Twitter Sentiment Analysis using Natural Language Processing(NLP)
-
Updated
May 17, 2024 - Jupyter Notebook
This python module is an easy-to-use port of the text normalization used in the paper "Not low-resource anymore: Aligner ensembling, batch filtering, and new datasets for Bengali-English machine translation". It is intended to be used for normalizing / cleaning Bengali and English text.
-
Updated
May 7, 2024 - Python
Japanese text normalizer for mecab-neologd
-
Updated
May 2, 2024 - Cython
Extract text content from an HTML page, process it, and extract unique words from the processed text. This notebook utilizes various text processing techniques including cleaning, normalization, tokenization, lemmatization or stemming, and stop words removal.
-
Updated
Apr 5, 2024 - Jupyter Notebook
Proper categorization of e-commerce products enhances the user experience and achieves better results with external search engines. The objective of the project is to classify a product into four given categories, based on its description available on an e-commerce platform.
-
Updated
Mar 13, 2024 - Jupyter Notebook
Improve this page
Add a description, image, and links to the text-normalization topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the text-normalization topic, visit your repo's landing page and select "manage topics."