A repository of images of hand-written Cyrillic and Latin alphabet letters for machine learning applications.
The repository currently consists of 28,000+ 278x278 png images representing all 33 letters of the Russian alphabet and the 26 letters of the English alphabet. These images have been hand-written on touch screen through crowd-sourcing.
The dataset will be regularly extended with more data as the collection progresses
An API that reads words in images
CoMNIST also makes available a web service that reads drawing and identifies the word/letter you have drawn. On top of an image you can submit an expected word and get back the original image with mismtaches highlighted (for educational purposes)
The API is available at this address: http://35.187.34.5:5002/api/word It is accessible via a POST request with following input expected:
{
'img': Mandatory b64 encoded image, with letters in black on a white background
'word': Optional string, the expected word to be read
'lang': Mandatory string, either 'en' or 'ru', respectively for Latin or Cyrillic (russian) alphabets
'nb_output': Mandatory integer, the "tolerance" of the engine
}
The return information is the following:
{
'img': b64 encoded image, if a word was supplied as an input, then modified version of that image highlighting mismatches
'word': string, the word that was read by the API
}
The objective is to gather at least 1000 images of each class, therefore your contribution is more that welcome! One minute of your time is enough, and don't hesitate to ask your friends and family to participate as well.
English version - Draw Latin only + common to cyrillic and latin
French version - Draw Latin only + common to cyrillic and latin
Russian version - Draw Cyrillic only
Find out more about CoMNIST on my blog
A big thanks to all the contributors!
These images have been crowd-sourced thanks to the great web-design by Anna Migushina available on her github.
CoMNIST logo by Sophie Valenina
CoMNIST by Gregory Vial is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.