S/N | Domain | Estimated Duration |
---|---|---|
1 | Lectures | 1 week |
2 | Tutorials | 1 week |
3 | Assessment | 2 weeks |
- Stanford Natural Language Processing Slides
- Stanford Natural Language Processing Lectures (YouTube)
- Introduction (1.1)
- Basic Text Processing (2.1 to 2.5)
- Minimum Edit Distance (3.1 to 3.3)
- Language Modeling (4.1 to 4.6)
- Text Classification (6.1 to 6.9)
- Sentiment Analysis (7.1 to 7.5)
Information Extraction andNamed Entity Recognition (9.1 to 9.3)Relation Extraction (10.1)- Part-Of-Speech Tagging (12.1 and 12.2)
- Information Retrieval (18.1 to 18.3)
- Ranked Information Retrieval (19.1 to 19.5)
Semantics (20.1 to 20.5)Question Answering (21.1 to 21.3)Summarization (22.1)
- You can find the backup to the slides and videos here
- The following uses NLTK, but other toolkits that cover these topics are acceptable
- Stemmers and Lemmatisation (section 3.6)
- Tokenising and Segmentation (sections 3.7 and 3.8)
- POS Tagging (sections 1 and 2)
- NLTK
- scikit-learn
- fastText
- Gensim
- spaCy
This section is a hands-on assessment that requires practitioners to attempt the Kaggle Avito Duplicate Ads Detection prediction competition. You are expected to write your own code from scratch using concepts learnt from Kaggle Titanic and NLP methods (e.g. word2vec). Consequently, you'll present your work to your mentor/supervisor.
- Avito Duplicate Ads Detection, Winners' Interview: 2nd Place, Team TheQuants
- You can find the backup to the article here