Skip to content

Latest commit

 

History

History
22 lines (16 loc) · 1.6 KB

README.md

File metadata and controls

22 lines (16 loc) · 1.6 KB

Decoder_model

Decoder model for language modelling

Put your text in "input.txt" and use python 3.* + numpy framework for running Main_loop.py. Also remember about making "parameters" folder.

Be careful with high learning rate: overflowing may occur in exp().

I took some code snippets of Main_loop.py from Andrej Karpathy's RNN_Char_Level.py. If you still haven't seen it or the original article, then I highly recommend do it: the article and the code have not just become very popular.

Some notes

This architecture doesn't work efficiently on char-level: it's unclear, how to distribute attention between letters. The model achieves much better results on word-level modelling (Byte pair encode also can improve performance).

Some useful links

Possible improvements

  • more efficient MH_attention_mechanism and LayerNorm for evaluation phase (or STOP recalculating existing values for previous tokens!)
  • correspond module for eval phase in Decoder_model class
  • multiprocessing feature for Circle operations (e.g. head's calculating)