- You can access the slide deck that covers Pytorch Here
- You can access the slide deck that covers various concepts related to Transformers Here
- It is recommended to read the slide decks before using the following colab notebooks
- Once you get a good grip on the first four modules, you can easily walk through the documentation or other code to build an application. I will keep updating this repository.
- Recorded videos
-
The Fuel: Tensors
- Difficulty Level: Easy if you have prior experience using Numpy or TensorFlow
- Understand the Pytorch architecture
- Create Tensors of 0d,1d,2d,3d,... (a multidimensional array in numpy)
- Understand the attributes:
storage, stride, offset, device
- Manipulate tensor dimensions
- Operations on tensors
-
The Engine: Autograd
- Difficulty Level: Hard, requires a good understanding of backprop algorithm. However, you can skip this and still follow the subsequent notebooks easily.
- A few more attributes of tensor :
requires_grad, grad, grad_fn, _saved_tensors, backward, retain_grad, zero_grad
- Computation graph: Leaf node (parameters) vs non-leaf node (intermediate computation)
- Accumulate gradient and update with context manager (torch.no_grad)
- Implementing a neural network from scratch
-
The factory: nn.Module, Data Utils
- Difficulty Level: Medium
- Brief tour into the source code of nn.Module
- Everything is a module (layer in other frameworks)
- Stack modules by subclassing nn.Module and build any neural network
- Managing data with
dataset
class andDataLoader
class
-
Convolutional Neural Network Image Classification
- Difficulty Level: Medium
- Using torchvision for datasets
- build CNN and move it to GPU
- Train and test
- Transfer learning
- Image segmentation
-
Recurrent Neural Network Sequence classification
- Difficulty Level: Hard for pre-processing part, Medium for model building part
- torchdata
- torchtext
- Embedding for words
- Build RNN
- Train,test, infer
Please take a look at the official tutorial series if you want to perform distributed training using a multi-GPU or multi-node setup in PyTorch (requires minimal modifications to the existing code). It covers various approaches, including:
- Distributed Data-Parallel (DDP)
- Fully Sharded Data Parallel (FSDP)
- Model, Tenosr and PipeLine parallelism
Now, let's move on to the Hugging Face library, which further simplifies these training strategies
- Using pre-trained models Notebook
- Difficulty Level: Easy
- AutoTokenizer
- AutoModel
- Fine-Tuning Pre-Trained Models Notebook
- Difficulty Level: Medium
- datasets
- tokenizer
- data collator with padding
- Trainer
- Loading Datasets Notebook
- Difficulty Level: Easy
- Dataset from local data files
- Dataset from Hub
- Preprocessing the dataset: Slice, Select, map, filter, flatten, interleave, concatenate
- Loading from external links
- Build a Custom Tokenizer for translation task Notebook
- Difficulty Level: Medium
- Translation dataset as running example
- Building the tokenizer by encapsulating the Normalizer, pre-tokenizer and tokenization algorithm (BPE)
- Locally Save and Load the tokenizer
- Using it in the Transformer module
- Exercise: Build a Tokenizer with shared vocabulary.
- Training Custom Seq2Seq model using Vanilla Transformer Architecture Notebook
- Difficulty Level: Medium, if you know how to build models in PyTorch.
- Build Vanilla Transformer architecture in Pytorch
- Create a configuration file for a model using PretrainedConfig class
- Wrap it by HF PreTrainedModel class
- Use the custom tokenizer built in the previous notebook
- Use Trainer API to train the model
- Gradient Accumulation - Continual Pre-training Notebook
- Difficulty Level: Easy
- Understand the memory requirement for training and inference
- Understand how gradient accumulation overcomes the limited memory