Integrating with HuggingFace Transformer #41

octalpixel · 2020-03-15T08:42:10Z

Hi,
Could you give me some insights whether is possible to plug in inltk with huggingface transformer library

parmarsuraj99 · 2020-04-09T05:37:40Z

I was looking for the same. Maybe we can use multi-lingual transformers. But the question is how to tokenize Indian Languages which have different structure. Is there any way to break them for BPE.
I am eager to work on this and contribute.

goru001 · 2020-04-10T19:09:46Z

@octalpixel , @parmarsuraj99 Thanks for reaching out. Currently, it isn't straightforward/possible to integrate it with the transformers library. I'll be happy have contributions from the community to help with it.

parmarsuraj99 · 2020-04-11T05:38:05Z

So, we just need a tokenizer trained on Indian languages separately and then we just plug it directly to a LM? Maybe Hindi on SentencePiece attached to HuggingFace BERT. Should I go this way?

goru001 · 2020-04-11T07:17:43Z

@parmarsuraj99 yes you can use sentencepiece or Huggingface's tokenizers (https://github.com/huggingface/tokenizers) library.
I've been working on training BERT Hindi model using the tokenizers and transformers library from Huggingface.

parmarsuraj99 · 2020-04-12T12:30:11Z

@goru001 I am really excited to work on that. I believe a trained Hindi model would be really efficient to grasp other regional languages as well, as most are similar.
Really looking forward for it.

goru001 added the enhancement New feature or request label Apr 10, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Integrating with HuggingFace Transformer #41

Integrating with HuggingFace Transformer #41

octalpixel commented Mar 15, 2020

parmarsuraj99 commented Apr 9, 2020

goru001 commented Apr 10, 2020 •

edited

Loading

parmarsuraj99 commented Apr 11, 2020

goru001 commented Apr 11, 2020

parmarsuraj99 commented Apr 12, 2020

Integrating with HuggingFace Transformer #41

Integrating with HuggingFace Transformer #41

Comments

octalpixel commented Mar 15, 2020

parmarsuraj99 commented Apr 9, 2020

goru001 commented Apr 10, 2020 • edited Loading

parmarsuraj99 commented Apr 11, 2020

goru001 commented Apr 11, 2020

parmarsuraj99 commented Apr 12, 2020

goru001 commented Apr 10, 2020 •

edited

Loading