Skip to content

Latest commit

 

History

History
131 lines (106 loc) · 6.52 KB

README.md

File metadata and controls

131 lines (106 loc) · 6.52 KB

NLP-Projects

Natural Language Processing projects, which includes concepts and scripts about:

Concepts

1. Attention

  • Attention == weighted averages
  • The attention review 1 and review 2 summarize attention mechanism into several types:
    • Additive vs Multiplicative attention
    • Self attention
    • Soft vs Hard attention
    • Global vs Local attention

2. CNNs, RNNs and Transformer

  • Parallelization [1]

    • RNNs
      • Why not good ?
      • Last step's output is input of current step
    • Solutions
      • Simple Recurrent Units (SRU)
        • Perform parallelization on each hidden state neuron independently
      • Sliced RNNs
        • Separate sequences into windows, use RNNs in each window, use another RNNs above windows
        • Same as CNNs
    • CNNs
      • Why good ?
      • For different windows in one filter
      • For different filters
  • Long-range dependency [1]

    • CNNs
      • Why not good ?
      • Single convolution can only caputure window-range dependency
    • Solutions
      • Dilated CNNs
      • Deep CNNs
        • N * [Convolution + skip-connection]
        • For example, window size=3 and sliding step=1, second convolution can cover 5 words (i.e., 1-2-3, 2-3-4, 3-4-5)
    • Transformer > RNNs > CNNs
  • Position [1]

    • CNNs

      • Why not good ?
      • Convolution preserves relative-order information, but max-pooling discards them
    • Solutions

      • Discard max-pooling, use deep CNNs with skip-connections instead
      • Add position embedding, just like in ConvS2S
    • Transformer

      • Why not good ?
      • In self-attention, one word attends to other words and generate the summarization vector without relative position information
  • Semantic features extraction [2]

    • Transformer > CNNs == RNNs

3. Pattern of DL in NLP models [3]

  • Data

    • Preprocess
    • Pre-training (e.g., ELMO, BERT)
    • Multi-task learning
    • Transfer learning, ref_1, ref_2
      • Use source task/domain S to increase target task/domain T
    • If S has a zero/one/few instances, we call it zero-shot, one-shot, few-shot learning, respectively
  • Model

    • Encoder
      • CNNs, RNNs, Transformer
    • Structure
      • Sequential, Tree, Graph
  • Learning (change loss definition)

    • Adversarial learning
    • Reinforcement learning

References

Awesome public apis

Awesome packages

Chinese

English

Future directions