Use this bibtex to cite this repository:
@INPROCEEDINGS{9663762,
author={PatrΓcio, Cristiano and Neves, JoΓ£o C.},
booktitle={Proceedings of the IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS)},
title={ZSpeedL - Evaluating the Performance of Zero-Shot Learning Methods using Low-Power Devices},
year={2021},
pages={1-8},
doi={10.1109/AVSS52988.2021.9663762}}
- Datasets
- Methods
- Special Case: LAD dataset
- Extracting Custom Features
- Optimizing TensorFlow Models using TensorRT
- Evaluating the Computational Performance of ZSL methods
1. Datasets β¬οΈ
The datasets used to evaluate the ZSL methods can be downloaded here. (MD5)
Dataset | No. Classes | No. Instances | No. Attributes | Download Images |
---|---|---|---|---|
AWA1 | 50 | 30,475 | 85 | Not available |
AWA2 | 50 | 37,322 | 85 | AWA2 Images |
CUB | 200 | 11,788 | 312 | CUB Images |
SUN | 717 | 14,340 | 102 | SUN Images |
APY | 32 | 15,339 | 64 | a-Yahoo Images / a-Pascal Images |
LAD | 230 | 78,017 | 359 | LAD Images |
2. Methods β¬οΈ
We select six state-of-the-art ZSL methods, including projection-based methods (ESZSL, SAE, and DEM), and generative methods (f-CLSWGAN, TF-VAEGAN, and CE-GZSL).
π Paper: http://proceedings.mlr.press/v37/romera-paredes15.pdf
Class: ESZSL(args)
Arguments:
<dataset> : {AWA1, AWA2, CUB, SUN, APY, LAD}
<dataset_path> : {'./datasets/}
<filename> : name of the features file (without the file extension)
<alpha> : int value [-3,3]
<gamma> : int value [-3,3]
<att_split> : for the LAD dataset, specify which split is to be evaluated, in the following format "_{i}", i = {0,1,2,3,4}
How to Run:
In the main scope of main.py
, insert the following code:
ESZSL(dataset="AWA1", filename="MobileNetV2")
Hyperparameters:
Dataset | Hyperparameter |
---|---|
AWA1 | Alpha=3, Gamma=0 |
AWA2 | Alpha=3, Gamma=0 |
CUB | Alpha=2, Gamma=0 |
SUN | Alpha=2, Gamma=2 |
APY | Alpha=3, Gamma=-1 |
LAD | Alpha=3, Gamma=1 |
π Paper: https://arxiv.org/pdf/1704.08345.pdf
Class: SAE(args)
Arguments:
<dataset> : {AWA1, AWA2, CUB, SUN, APY, LAD}
<dataset_path> : {'./datasets/}
<filename> : name of the features file (without the file extension)
<lamb_ZSL> : float value, default=2
<lamb_GZSL> : float value, default=2
<setting> : Type of evaluation {V2S, S2V}
<att_split> : for the LAD dataset, specify which split is to be evaluated, in the following format "_{i}", i = {0,1,2,3,4}
How to Run:
In the main scope of main.py
, insert the following code:
SAE(dataset="AWA1", filename="MobileNetV2")
Hyperparameters:
Dataset | Setting | Lambda (ZSL) | Lambda (GZSL) |
---|---|---|---|
AWA1 | V2S | 3.0 | 3.2 |
AWA2 | V2S | 0.6 | 0.8 |
CUB | V2S | 100 | 80 |
SUN | V2S | 0.32 | 0.32 |
aPY | V2S | 2.0 | 0.16 |
LAD | V2S | 51.2 | 51.2 |
Dataset | Setting | Lambda (ZSL) | Lambda (GZSL) |
---|---|---|---|
AWA1 | S2V | 0.8 | 0.8 |
AWA2 | S2V | 0.2 | 0.2 |
CUB | S2V | 0.2 | 0.2 |
SUN | S2V | 0.16 | 0.08 |
aPY | S2V | 4.0 | 2.56 |
LAD | S2V | 6.4 | 6.4 |
π Paper: https://arxiv.org/pdf/1611.05088.pdf
Class: DEM(args)
Arguments:
<dataset> : {AWA1, AWA2, CUB, SUN, aPY}
<dataset_path> : {'./datasets/}
<filename> : name of the features file (without the file extension)
<lamb> : float value, default=1e-3
<lr> : float value, default=1e-4
<batch_size> : batch size, default=64
<hidden_dim> : Dimension of the hidden layer, default=1600
<att_split> : for the LAD dataset, specify which split is to be evaluated, in the following format "_{i}", i = {0,1,2,3,4}
How to Run:
In the main scope of main.py
, insert the following code:
DEM(dataset="AWA1", filename="MobileNetV2")
Hyperparameters:
Dataset | Hidden Dim | Lambda | Learning Rate |
---|---|---|---|
AWA1 | 1600 | 1e-3 | 1e-4 |
AWA2 | 1600 | 1e-3 | 1e-4 |
CUB | 1600 | 1e-2 | 1e-4 |
SUN | 1600 | 1e-5 | 1e-4 |
aPY | 1600 | 1e-4 | 1e-4 |
LAD | 1600 | 1e-4 | 1e-4 |
π Paper: https://arxiv.org/pdf/1712.00981.pdf
- Original version (f-CLSWGAN/orig/)
Run instructions:
python original_f-CLSWGAN.py --download_mode
python original_f-CLSWGAN.py --train_classifier
python original_f-CLSWGAN.py --train_WGAN
- Modified version π
Class: f_CLSWGAN(args)
Arguments:
<dataset> : {AWA1, AWA2, CUB, SUN, APY, LAD}
<dataroot> : {'./datasets/}
<image_embedding> : name of the features file (without the file extension)
<class_embedding> : name of the class embedding ("att" by default)
<split_no> : for the LAD dataset, specify which split is to be evaluated, in the following format "_{i}", i = {0,1,2,3,4}
<attSize> : size of the attribute annotations
<resSize> : size of the features
<nepoch> : number of epochs
<lr> : learning rate
<beta1> : beta1 for Adam optimizer
<batch_size> : input batch size
<cls_weight> : weight of the classification loss
<syn_num> : number of features to generate per class
<ngh> : size of the hidden units in generato
<ndh> : size of the hidden units in discriminator
<lambda1> : gradient penalty regularizer, following WGAN-GP
<classifier_checkpoint> : tells which ckpt file of tensorflow model to load
<nz> : size of the latent z vector
How to Run:
In the main scope of main.py
, insert the following code:
f_CLSWGAN(dataset="AWA1", filename="MobileNetV2")
Hyperparameters:
Dataset | lambda1 | cls_weight |
---|---|---|
AWA1 | 10 | 0.01 |
AWA2 | 10 | 0.01 |
CUB | 10 | 0.01 |
SUN | 10 | 0.01 |
APY | 10 | 0.01 |
LAD | 10 | 0.01 |
π Paper: https://www.ecva.net/papers/eccv_2020/papers_ECCV/papers/123670477.pdf
- Python 3.6
- Pytorch 0.3.1
- torchvision 0.2.0
- h5py 2.10
- scikit-learn 0.22.1
- scipy=1.4.1
- numpy 1.18.1
- numpy-base 1.18.1
- pillow 5.1.0
- Original version
Note: The requirements to run the script are present in the scope of the main
function in original_tf-vaegan.py
script.
Run instructions:
python original_tf-vaegan.py --download_mode
python original_tf-vaegan.py --train
- Modified version π
Class: TF_VAEGAN(args)
Arguments:
<dataset> : {AWA1, AWA2, CUB, SUN, APY, LAD}
<dataroot> : {'./datasets/}
<gammaD> : weight on the W-GAN loss
<gammaG> : weight on the W-GAN loss
<image_embedding> : name of the features file (without the file extension)
<class_embedding> : name of the class embedding ("att" by default)
<syn_num> : number of features to generate per class
<ngh> : size of the hidden units in generator
<ndh> : size of the hidden units in discriminator
<lambda1> : gradient penalty regularizer, following WGAN-GP
<nclass_all> : number of all classes
<split> : for the LAD dataset, specify which split is to be evaluated, in the following format "_{i}", i = {0,1,2,3,4}
<batch_size> : input batch size
<nz> : size of the latent z vector
<latent_size> : size of the latent units in discriminator
<attSize> : size of semantic features
<resSize> : size of visual features
<lr> : learning rate to train GANs
<classifier_lr> : learning rate to train softmax classifier
<recons_weight> : recons_weight for decoder
<feed_lr> : learning rate to train GANs
<dec_lr> : learning rate to train GANs
<feedback_loop> : iterations on feedback loop
<a1> : weight of the feedback layers
<a2> : weight of the feedback layers
How to Run:
In the main scope of main.py
, insert the following code:
TF_VAEGAN(dataset="AWA1", filename="MobileNetV2")
Hyperparameters:
Dataset | Hidlrden | syn_num | gammaD | gammaG | classifier_lr | recons_weight | feed_lr | dec_lr | a1 | a2 |
---|---|---|---|---|---|---|---|---|---|---|
AWA1 | 1e-5 | 1800 | 10 | 10 | 1e-3 | 0.1 | 1e-4 | 1e-4 | 0.01 | 0.01 |
AWA2 | 1e-5 | 2400 | 10 | 10 | 13-e | 0.1 | 1e-4 | 1e-4 | 0.01 | 0.01 |
CUB | 1e-4 | 2400 | 10 | 10 | 1e-3 | 0.01 | 1e-5 | 1e-4 | 1 | 1 |
SUN | 1e-5 | 400 | 1 | 10 | 5e-4 | 0.01 | 1e-4 | 1e-4 | 0.1 | 0.01 |
APY | 1e-5 | 300 | 10 | 10 | 1e-3 | 0.1 | 1e-4 | 1e-4 | 0.01 | 0.01 |
LAD | 1e-5 | 1800 | 10 | 10 | 1e-3 | 0.1 | 1e-4 | 1e-4 | 0.01 | 0.01 |
π Paper: https://arxiv.org/pdf/2103.16173.pdf
- Python 3.6
- Pytorch 1.2.0
- scikit-learn
- Original version:
Note: The requirements to run the script are present in the scope of the main
function in original_ce-gzsl.py
script.
Run instructions:
python original_ce-gzsl.py --download_mode
python original_ce-gzsl.py --train
- Modified version π
Class: CE_GZSL(args)
Arguments:
<dataset> : {AWA1, AWA2, CUB, SUN, APY, LAD}
<dataroot> : {'./datasets/}
<image_embedding> : name of the features file (without the file extension)
<class_embedding> : name of the class embedding ("att" by default)
<split> : for the LAD dataset, specify which split is to be evaluated, in the following format "_{i}", i = {0,1,2,3,4}
<batch_size> : the number of the instances in a mini-batch
<nepoch> : number of epochs
<attSize> : size of semantic features
<resSize> : size of visual features
<nz> : size of the Gaussian noise
<lr> : learning rate to train GANs
<embedSize> : size of embedding h
<syn_num> : number synthetic features for each class
<outzSize> : size of non-liner projection z
<nhF> : size of the hidden units comparator network F
<ins_weight> : weight of the classification loss when learning G
<cls_weight> : weight of the score function when learning G
<ins_temp> : temperature in instance-level supervision
<cls_temp> : temperature in class-level supervision
<nclass_all> : number of all classes
<nclass_seen> : number of seen classes
How to Run:
In the main scope of main.py
, insert the following code:
CE_GZSL(dataset="AWA1", filename="MobileNetV2")
Hyperparameters:
Dataset | syn_num | ins_temp | cls_temp | batch-size |
---|---|---|---|---|
AWA1 | 1800 | 0.1 | 0.1 | 4096 |
AWA2 | 2400 | 10 | 1 | 4096 |
CUB | 300 | 0.1 | 0.1 | 2048 |
SUN | 400 | 0.1 | 0.1 | 1024 |
APY | 300 | 0.1 | 0.1 | 1024 |
LAD | 1800 | 0.1 | 0.1 | 1024 |
3. Special Case: LAD dataset β¬οΈ
LAD dataset must be evaluated on each of the five available splits (att_splits_0.mat, att_splits_1.mat, att_splits_2.mat, att_splits_3.mat, att_splits_4.mat). This means that each ZSL method should be executed for each of the provided splits.
For example, if we want to evaluate LAD dataset with SAE method, we need to run the following code:
# Run SAE algorithm for LAD dataset
SAE(dataset="LAD", filename="MobileNetV2", att_split="_0") # att_splits_0
SAE(dataset="LAD", filename="MobileNetV2", att_split="_1") # att_splits_1
SAE(dataset="LAD", filename="MobileNetV2", att_split="_2") # att_splits_2
SAE(dataset="LAD", filename="MobileNetV2", att_split="_3") # att_splits_3
SAE(dataset="LAD", filename="MobileNetV2", att_split="_4") # att_splits_4
After execute the above code, we end up with ten .txt
files containing the predictions for each evaluated split.
# ZSL
preds_SAE_ZSL_att_0.txt
preds_SAE_ZSL_att_1.txt
preds_SAE_ZSL_att_2.txt
preds_SAE_ZSL_att_3.txt
preds_SAE_ZSL_att_4.txt
# GZSL
preds_SAE_GZSL_att_0.txt
preds_SAE_GZSL_att_1.txt
preds_SAE_GZSL_att_2.txt
preds_SAE_GZSL_att_3.txt
preds_SAE_GZSL_att_4.txt
After evaluating each of the five splits with the chosen ZSL algorithm, the final classification is performed by averaging the results of the five super-classes.
First, compute the Top-1 accuracy for LAD:
# Evaluate LAD
compute_zsl_acc_lad(split=0, att_split="att_splits_0", preds_file="preds_SAE_att_0.txt")
compute_zsl_acc_lad(split=1, att_split="att_splits_1", preds_file="preds_SAE_att_1.txt")
compute_zsl_acc_lad(split=2, att_split="att_splits_2", preds_file="preds_SAE_att_2.txt")
compute_zsl_acc_lad(split=3, att_split="att_splits_3", preds_file="preds_SAE_att_3.txt")
compute_zsl_acc_lad(split=4, att_split="att_splits_4", preds_file="preds_SAE_att_4.txt")
# The above code returns a file named results_ZSL.txt, containing the accuracy for each of the 5 super-classes.
evaluate_LAD_ZSL("results_ZSL.txt")
And then, the Harmonic mean is calculated:
compute_harmonic_SAE_LAD(split=0, att_split="att_splits_0", preds="preds_SAE_GZSL_att_0.txt")
compute_harmonic_SAE_LAD(split=1, att_split="att_splits_1", preds="preds_SAE_GZSL_att_1.txt")
compute_harmonic_SAE_LAD(split=2, att_split="att_splits_2", preds="preds_SAE_GZSL_att_2.txt")
compute_harmonic_SAE_LAD(split=3, att_split="att_splits_3", preds="preds_SAE_GZSL_att_3.txt")
compute_harmonic_SAE_LAD(split=4, att_split="att_splits_4", preds="preds_SAE_GZSL_att_4.txt")
# The above code returns two files named results_seen.txt and results_unseen.txt, containing the accuracy for seen classes and the accuracy for unseen classes, respectively.
evaluate_LAD_GZSL(filename_seen="results_seen.txt", filename_unseen="results_unseen.txt")
However, for the remaining ZSL methods, the results on the generalized setting are obtained through the use of compute_harmonic_acc_lad(split, att_split, preds_seen_file, preds_unseen_file, seen)
instead of compute_harmonic_SAE_LAD(split, att_split, preds=)
.
4. Extracting Custom Features β¬οΈ
In order to extract features for the datasets using a custom CNN architecture, run the following code:
python feature_extraction.py --model "MobileNet" --dataset_path "/path/to/dataset/*.jpg"
The output from above execution of the previous script is a .npy
file containing the features.
After extracting the features, it is necessary to create a Matlab dictionary so that it can be evaluated along the ZSL methods. To do that, run the following code:
python custom_features.py --dataset AWA2 --dataroot "/path/to/dataset/" --features "MobileNet-AWA2-features" --features_path "/path/to/npy/features/file"
The output from the execution of the previous script is a .mat
file that can be passed as a parameter for evaluating the ZSL methods using custom features.
5. Optimizing TensorFlow Models with TensorRT β¬οΈ
When running deep learning models on low-power devices such as the Jetson Nano, it is desirable that the models are optimized to take advantage of GPU capabilities. Thus, we provide two python scripts to optimize TensorFlow models (1.x and 2.x) using TensorRT optimization.
- TF 1.x models:
In the case of TensorFlow 1.x models, please refer to the Convert_TF1.x_Models_to_TensorRT.py (TF_models_optimization/Convert_TF1.x_Models_to_TensorRT.py
).
Note: The TF 1.x model to be optimized must be saved as follows:
saver = tf.train.Saver()
sess = tf.Session()
sess.run(tf.global_variables_initializer())
...
# Save Model
saver.save(sess, "model_xpto_tf1")
- TF 2.x models:
In the case of TensorFlow 2.x models, please refer to the Convert_TF2.x_Models_to_TensorRT.py (TF_models_optimization/Convert_TF2.x_Models_to_TensorRT.py
).
Note: The TF 2.x model to be optimized must be saved using:
tf.saved_model.save(model, 'xpto_saved_model')
6. Evaluating the Computational Performance of ZSL methods β¬οΈ
- To measure the time consumed in the visual feature extraction, run the following code snippet:
python feature_extraction_inference_time/main.py
- To measure the time consumed by different ZSL methods for classifying a test image, run the following code snippet:
Note: You need to download and decompress this file (MD5) into the zsl_methods_inference_time/
directory so that DEM model can be evaluated.
python zsl_methods_inference_time/main.py