Transfer learning involves taking a pre-trained neural network and adapting the neural network to a new, different data set.
Depending on both:
The size of the new data set, and The similarity of the new data set to the original data set The approach for using transfer learning will be different. There are four main cases:
- New data set is small, new data is similar to original training data.
- New data set is small, new data is different from original training data.
- New data set is large, new data is similar to original training data.
- New data set is large, new data is different from original training data.
If the new data set is small and similar to the original training data:
- Slice off the end of the neural network
- Add a new fully connected layer that matches the number of classes in the new data set
- Randomize the weights of the new fully connected layer; freeze all the weights from the pre-trained network
- Train the network to update the weights of the new fully connected layer.
If the new data set is small and different from the original training data:
- Slice off all but some of the pre-trained layers near the beginning of the network
- Add to the remaining pre-trained layers a new fully connected layer that matches the number of classes in the new data set
- Randomize the weights of the new fully connected layer; freeze all the weights from the pre-trained network
- Train the network to update the weights of the new fully connected layer
If the new data set is large and similar to the original training data:
- Remove the last fully connected layer and replace with a layer matching the number of classes in the new data set
- Randomly initialize the weights in the new fully connected layer
- Initialize the rest of the weights using the pre-trained weights
- Re-train the entire neural network
If the new data set is large and different from the original training data:
- Remove the last fully connected layer and replace with a layer matching the number of classes in the new data set
- Retrain the network from scratch with randomly initialized weights
- Alternatively, you could just use the same strategy as the "large and similar" data case