Skip to content

Latest commit

 

History

History
134 lines (117 loc) · 5.01 KB

README.md

File metadata and controls

134 lines (117 loc) · 5.01 KB

Natural-Language-Processing

1- Introduction to language processing – tokens, sentences, paragraphs

1.1 Introduction
1.2 Probablity and NLP
1.3 Vector Space models
1.4 Sequence Learning
1.5 Machine Translation
1.6 Preprocessing

2- Statistical Properties of Words

2.1 Incidence Matrix
2.2 Term-Document Binary Incidence Matrix
2.3 IR Using Binary Incidence Matrix
2.4 Term Frequency
2.5 Multiple weighing words TF
2.6 Bag Of Words
2.7 Type Token Ratio
2.8 Inverse Document Frequency
2.9 TF-IDF
2.91 ZIPF'S law
2.92 Heap's Law

3- Regular expressions - extraction of information using Regex

3.1.Vector Space Models for NLP
3.2.Document Similarity - Demo, Inverted index, Exercise
3.3 Vector Representation of words
3.4 Contextual understanding of text
3.5 Co-occurence matrix, n-grams
3.6 Collocations, Dense word Vectors
3.7 SVD, Dimensionality reduction, Demo
3.8 Query Processing
3.9 Topic Modeling

4- Document Similarity measures - Cosine and cluster measures

4.1 Examples for word prediction
4.2 Introduction to Probability in the context of NLP
4.3 Joint and conditional probabilities, independence with examples
4.4 The definition of probabilistic language model
4.5 Chain rule and Markov assumption
4.6 Generative Models
4.7 Bigram and Trigram Language models -peeking indide the model building
4.8 Out of vocabulary words and curse of dimensionality
4.9 Naive-Bayes, classification

5- Spelling correction - Edit distance

5.1 Machine learning, perceptron, linearly separable
5.2 Linear Models for Claassification
5.3 Biological Neural Network
5.4 Perceptron
5.5 Perceptron Learning
5.6 Logical XOR
5.7 Activation Functions
5.8 Gradient Descent

6- Information retrieval, extraction

6.1 Feedforward and Backpropagation Neural Network
6.2 Why Word2Vec?
6.3 What are CBOW and Skip-Gram Models?
6.4 One word learning architecture
6.5 Forward pass for Word2Vec
6.6 Matrix Operations Explained
6.7 CBOW and Skip Gram Models

7- Document Classification, Clustering, topic modeling techniques

7.1 Building Skip-gram model using Python
7.2 Reduction of complexity - sub-sampling, negative sampling
7.3 Binay tree, Hierarchical softmax
7.4 Mapping the output layer to Softmax
7.5 Updating the weights using hierarchical softmax
7.6 Discussion on the results obtained from word2vec
7.7 Recap and Introduction
7.8 ANN as a LM and its limitations
7.9 Sequence Learning and its applications

8- Vector Space Model - word vectors, GloVe/Word2Vec model, word embedding

8.1 Introuduction to Recurrent Neural Network
8.2 Unrolled RNN
8.3 RNN - Based Language Model
8.4 BPTT - Forward Pass
8.5 BPTT - Derivatives for W,V and U
8.6 BPTT - Exploding and vanishing gradient
8.7 LSTM
8.8 Truncated BPTT
8.9 GRU

9- Text Classification, Clustering, and Summarization

 9.1 Introduction and Historical Approaches to Machine Translation
 9.2 What is SMT?
 9.3 Noisy Channel Model, Bayes Rule, Language Model
 9.4 Translation Model, Alignment Variables
 9.5 Alignments again!
 9.6 IBM Model 1
 9.7 IBM Model 2

10 - Machine Learning, Perceptron

 10.1 Introduction to Phrase-based translation
 10.2 Symmetrization of alignments
 10.3 Extraction of Phrases
 10.4 Learning/estimating the phrase probabilities using another Symmetrization example
 10.5 Introduction to evaluation of Machine Translation
 10.6 BLEU - "A short Discussion of the seminal paper"
 10.7 BLEU Demo using NLTK and other metrics

11 : Back Propagation, Recurrent Neural network relevant to NLP

 11.1 Encoder-Decoder model for Neural Machine Translation
 11.2 RNN Based Machine Translation
 11.3 Recap and Connecting Bloom Taxonomy with Machine Learning
 11.4 Introduction to Attention based Translation
 11.5 Neural machine translation by jointly learning to align and translate
 11.6 Typical NMT architecture architecture and models for multi-language translation
 11.7 Beam Search
 11.8 Variants of Gradient Descend

12 Machine Translation, Language Generation

 12.1 Introduction to Conversation Modeling
 12.2 Few examples in Conversation Modeling
 12.3 Element IR-based Conversation Modeling
 12.4 Ideas on Question Answering
 12.5 Applications – Sentiment Analysis, Spam Detection, Resume Mining, AInstein 
 12.6 Hyperspace Analogue to Language - HAL
 12.7 Correlated Occurence Analogue to Lexical Semantic - COALS
 12.8 Global Vectors - Glove
 12.9 Evaluation of Word vectors

Kaggle Work

  1. Preprocessing and Word2Vec :- https://www.kaggle.com/mvanshika/natural-language-processing

Resources

  1. https://towardsdatascience.com/your-guide-to-natural-language-processing-nlp-48ea2511f6e1
  2. https://towardsdatascience.com/introduction-to-word-embedding-and-word2vec-652d0c2060fa