- An Introduction and exploration of Meta learning architucture specifically few-shot-learning.
- Our goal is to be able to classify new objects never seen it in the training data with very few examples.
- Motivation for Meta learning
- General idea about few-shot-learning
- Defining common terms for few-shot-learning
- How the model learn ??
- Contacts
- Rsources
- License
Learning visual models of object categories requires thousands of training examples; this is due to the diversity and richness of object appearance which requires models containing hundreds of parameters.
- Informal observation tells us that Humans can learn a new category of animals both fast and easy
- Human can learn to recognize an animal that they have never seen before by seeing 2 or 3 images of it.
- In The other side: Computer requires thousands if not tens of thousands of training images to be able to reconize a new category.
- Could computer vision algorithms be similarly efficient?
- One possible explanation of human efficiency is that when learning a new category we take advantage of prior experience.
- The appearance of the categories we know and, more importantly, the variability in their appearance, gives us important information on what to expect in a new category. This may allow us to learn new categories from few(er) training examples.
- And here we will try to explain how we can some how give the copmuter the ability to mimic what Humans do.
Defination: Few-shot learning is the problem of making predictions based on a limited number of samples.
- Few-shot learning is different from standard supervised learning.
- The goal of Here is not to let the model recognize the images in the training set and then generalize to the test set.
- Instead, the goal is to learn. Learn to learn
- In supervised learnig the goal is to build a model around class recognition.
- So the goal isn't to recognize the cat or dog the goal is to know what is the diffrences between them
- In other simple words:
- The goal is to learn how to measure the similarity and differences between 2 classes, here we accomplish the learn to learn.
- That's how we can recognize unseen categories in the training set.
- (e.g) Training into (Dogs, Cats, and Tigers) classes and then use the same model to recognize a new class like Elephant by providing it with additional information a support set (few shot).
- We will get to the technical details of how the model can do this later.
-
What is a support set ?
- After training the similarity function of our model we want to classify new image (Query img), this image as we know from above doesn't belong to any of our classes so the support set will contain a set of images and at least one of them belong to the same class as the query img, so the model could compute a similarity_score for every img in the support set and pick the highest score as the class of the new img.
(e.g) our model has three classes (Cats, dogs, and tigers) and we want to query it using an image of elephant, so we'll need to provide a support set as our additional info for the model (hence the name few shot, the small dataset) them the model will compare the query img with every single image and will provide similarity score, and we will choose the highest sim_score as our result the below image shows an example of what i did explain.
- After training the similarity function of our model we want to classify new image (Query img), this image as we know from above doesn't belong to any of our classes so the support set will contain a set of images and at least one of them belong to the same class as the query img, so the model could compute a similarity_score for every img in the support set and pick the highest score as the class of the new img.
-
What is this mean K-way, n-shot, support set (4way-3shots support set)
- k-way: are the number of classes in the support set, 4way means we have four classes (cat, dog, pegion, elephant).
- N-shot: are the number of samples per class, 3shots means we have a 3 imgs per class in the support set.
- We'll be using the Siamase learning network to train our model, and there is two methods we can use to train the model.
- You can read more about Siamase network by refering to the Rsources section.
- The below mind map explains the learning process.
- The algorithm explaination
-
Preprocessing step:
- We divide the images into a set of pairs to feed them to the network e.g
(img_1, image_2, 1)
this means we have two images (positive pair) belong to the same class assigned label 1 for them,(image_a, image_b, 0)
those two images are (negative pair) and we assigned 0 to them.
- We divide the images into a set of pairs to feed them to the network e.g
-
Then feeding the set of pairs to The same ConveNet for extracting our feature vector
-
Then we get the absolute diff between the two feature vectors
-
Feeding the result to dense layers getting a scaler
-
The scaler goes into a sigmoide getting a value between 0 and 1
-
this value represnets the prediction of similarity between the two images
-
Calculating our loss and gradient
-
Then back-propagate the results to update the params of the dense layers and the ConvNet
-
Repeat Until we converge.
-
- In predictions
- Using the support set and the query image I can compare the query image with every image in the support set
- Getting a value between 0 and 1
- Then the maximum value is our predicted class
note that it may be all the images in the support set don't belong to any of the classes that we trained the model on.
-
Preprocessing step:
- We divide the images into a set of three pairs to feed them to the network e.g
(anchor_img, positive_img, negative_img)
Anchor_img
a randomly selected img that can belong to any of our classesPositive_img
a randomly selected img that must have the same class as anchorNegative_img
a randomly selected img that belongs to any class Excluding the anchor class
- We divide the images into a set of three pairs to feed them to the network e.g
-
Algorithm Explanation:
-
Here's a graph that gives a quick idea:
-
We'll be using the same idea as in pairwise For feature extraction, but here for three images to get a feature vector for every image
-
Then, will calculate the D+ as the squared absolute difference between the
anchor feature vector and the positive feature vector
, andD- as the squared absolute difference between the anchor feature vector and the negative feature vector
-
We need
D+
to be as small as possible andD-
to be as large as possible -
Then we feed those results to our loss function
-
Triplet Loss Function:
- As we can extract from the above we need D+ to be smaller than D-
D+ < D-
- So we will try to make our whole NN to work in this, starting by the loss function
- But by how much we need this difference ->
Alpha
our most important hyper-parameter here Such that - If the above equation is true so the loss will be zero.
- If not:
- So our final Loss_function will be
- The feature space of the triplet loss will look like the below Image
- As we can extract from the above we need D+ to be smaller than D-
-
Then we will use gradient descent and back-propagate the results to update our parameters.
- First we train our siamase network on a large data set.
- Dividing the data into a two or three paris based on the method we choose for learning
- In predictions we give the model a Query_Img and a support_set (k-way n-shot)
- Note that the Query_Img not belongs to the classes we trained our network on.
- k-way means k classes, n-shot means number of samples for every class, The prediction will be more effecient If k is small and n is large.
- The model will output a similarity_score or a distance_value based on the method of learning.
You can reach out for me in twitter @amshrbo or via mail
amshrbo@gmail.com
- You can Watch Shusen Wang videos on YouTube.
- You can find the slides associated with videos here Just go to this repos and search for Meta Learning.
- One-Shot Learning of Object Categories Lei Fei-Fei, Fergus, Perona, IEE, 2006
- Few shot learnig article from Analytics vedhya
- Siamese neural network in wikipedia
- Siamese Neural Networks for One-Shot Image Recognition
- FaceNet: A Unified Embedding for Face Recognition and Clustering
- online-latex-equation-editor for generating image equations