This folder contains APIs in MATLAB for evaluating stored test results (see ../development for storing test results).
nlvd_evaluation
computes the performance metrics described in DBNet, which are summarized as follows:
- For localization tasks, it computes recalls (or, accuracies) and mean/median IoUs (IoU: intersection over union between detected region and the ground truth region), where the level-0 query set is used.
- For detection tasks, it computes detection mean AP (mAP) and global AP (gAP) for the level-0,1,2 query sets. AP: average precision.
Remark: please refer to the paper of PASCAL VOC challenge (The PASCAL Visual Object Classes Challenge: A Retrospective) if you are not familiar with the IoU and AP.
The evaluation command line in MATLAB is
nlvd_evaluation(test_title, dataset_name, subset_name, parallel)
The function computes the performance metrics for all results stored in [toolbox_folder]/results/[dataset_name]/[test_title]
. 'vg_v1'
is the default value for dataset_name
and the only choice in the current version. If level-0.txt
exists, it computes the localization metrics and detection APs on the level-0 query set. For level-?.txt
(?>0
), detection APs are computed.
subset_name
need to be specified to test
(the default) or val
, so that the function can load the annotations accordingly.
parallel
should be either true
or false
(the default). When it is set to true
, MATLAB parallel toolbox is used for faster processing (with more memory usage). If memory allows, parallel computation is recommended. The non-parallel evaluation can take hours to finish.
Remark: for an argument with default value, the empty array []
or no input means using the default value. This rule also holds for functions mentioned later in this README.md file.
[subset_name]_ACC.txt
: This file saved in folder results/[dataset_name]/[test_title]
, is for the localization performance metrics (only for the level-0 query set). It contains the localization recall for the top-1 predictions under the IoU thresholds [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7]
and the mean/median IoU. Here is an example file (DBNet with VGGNet-16):
0.559 0.468 0.390 0.314 0.241 0.169 0.101 topOverlap median: 0.162 topOverlap mean: 0.263
Note that, when multiple ground truth regions are presented on a single image, the best matched one is used. That is why the mean and median IoUs are noted as "topOverlap mean" and "topOverlap median".
[subset_name]_level?_AP.txt
?=0,1,2
: This file saved in results/[dataset_name]/[test_title]
, is for the detection performance metrics. It contains the gAP and mAP for the IoU thresholds [0.3, 0.5, 0.7]. Here are some example files:
- DBNet with VGGNet-16 on the level-0 query set
threhshold: 0.3 gAP: 0.235691 mAP: 0.487341 threhshold: 0.5 gAP: 0.109252 mAP: 0.304111 threhshold: 0.7 gAP: 0.021150 mAP: 0.117586
- DBNet with VGGNet-16 on the level-1 query set
threhshold: 0.3 gAP: 0.205841 mAP: 0.453660 threhshold: 0.5 gAP: 0.096192 mAP: 0.285768 threhshold: 0.7 gAP: 0.018961 mAP: 0.111969
- DBNet with VGGNet-16 on the level-2 query set
threhshold: 0.3 gAP: 0.080186 mAP: 0.269943 threhshold: 0.5 gAP: 0.038967 mAP: 0.177297 threhshold: 0.7 gAP: 0.008379 mAP: 0.074776
Note that you do not have to do this for using the evaluation code. This section just gives you an example of the actual stored results and its evaluation procedure.
You can download our example test results by running the following command:
[toolbox_folder]/results/vg_v1/dbnet_vgg16/download.sh
It downloads the test results for DBNet based on VGGNet-16. If you have problems running the script, you can obtain it via this link and extract it in [toolbox_folder]/results/vg_v1/dbnet_vgg16/
.
After obtaining the example test results, you can compute the performance metrics by running the following command in MATLAB:
nlvd_evaluation('dbnet_vgg16', 'vg_v1', false)
Another set of example test results are available via [toolbox_folder]/results/vg_v1/dbnet_resnet101/download.sh
or this link
In addition to the standard evaluation interface, we also provides more flexible APIs for computing the performance metrics.
[topOverlap, ACC] = eval_localization(test_output, subset_name, test_title, dataset_name, save_results, iouT, rankT)
-
Input arguments:
test_output
: The path to the test output txt file (e.g.,[toolbox_folder]/results/[dataset_name]/[test_title]/level_0.txt
). Note that localization is only applicable to the level-0 query set.subset_name
: should be'train'
,'val'
or'test'
(default). It specifies which subset to load for the ground truth annotations.test_title
: the title of the experiment. The default value is'sample_test'
.dataset_name
:'vg_v1'
is the only choice in the current version, and the default value is'vg_v1'
.save_results
:true
orfalse
, specifying whether to save the localization accuracy in mat files ([subset_name]_ACC.mat
and[subset_name]_topOverlap.mat
) in the folderresults/[dataset_name]/[test_title]
. The default value isfalse
.iouT
: IoU thresholds for localization recalls. It should be a vector of float numbers between 0 and 1, and the default value is[0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7]
.rankT
:rankT=k
means computing the localization metrics based on the best out of the top-k
predictions. By default, it outputs results for for the ranks[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
.
-
Returns:
topOverlap
: the overlap (IoU) between the best matched prediction and the ground truth. Note that there are multiple predicted regions when the rank is greater than 1, and there may be multiple ground truth regions on a single image.ACC
: a matrix of localization accuracy (or, recall), with each column corresponding to a rank and each row corresponding to an IoU threshold.
-
Display:
- The first column of
ACC
and mean/median IouS are printed out. The results are also saved in the folderresults/[dataset_name]/[test_title]
with file name[subset_name]_ACC.txt
- The first column of
For example, given a test result file for the level-0 query set stored in [toolbox_folder]/results/vg_v1/sample_test/level_0.txt
, you can use the following commands to run the default evaluation on the test set:
test_output='[toolbox_folder]/results/vg_v1/sample_test/level_0.txt'
eval_localization(test_output)
Then test_ACC.txt
will be saved in folder [toolbox_folder]/results/vg_v1/sample_test
.
[gAP, global_PREC_REC, mAP, text_PREC_REC] = eval_detection(test_output, level_id, parallel, subset_name, dataset_name, save_results, DETECTION_IoU_THRESHOLD)
-
Input arguments:
test_output
: The path to the test output txt file (e.g.,[toolbox_path]/results/[dataset_name]/[test_title]/level_[level_id].txt
).level_id
: can be0
(the default),1
, or2
. For more details about test difficulty levels, see here.parallel
: Use MATLAB parallel computing toolbox or not. It should betrue
orfalse
(the default). The parallelism can potentially lead to a very significant speedup, so it is highly recommended to enable it if possible.subset_name
: should be'train'
,'val'
or'test'
(default). It specifies which subset to load for the ground truth annotations. Note that, in the current version, the level-1 and level-2 query sets are available only for the test set.test_title
: the title of the experiment. The default value is'sample_test'
.dataset_name
:'vg_v1'
is the only choice in the current version, and the default value is'vg_v1'
.save_results
:true
orfalse
, specifying whether to save the global precision-recall (PR) curve into the file[subset_name]_level[level_id]_th[detection_iou]_global_PR.mat
and the PR curves for the 100 most frequent text phrases (the 100 text phrase with the lowest text IDs) into[subset_name]_level[level_id]_th[detection_iou]_text_PR.mat
in the folderresults/[dataset_name]/[test_title]
. The default value isfalse
.DETECTION_IoU_THRESHOLD
: a vector of float numbers between 0 and 1 for specifying the IoU threshold for determining positive detections. The default value is[0.3, 0.5, 0.7]
.
-
Returns:
gAP
: a vector of global APs. Each element corresponds to a detection threshold (note that the inputDETECTION_IoU_THRESHOLD
is a vector of detection threshold). For each detection threshold, average precision (AP) is calculated over all test cases across all images.global_PREC_REC
: a cell vector of global PR curves, where each cell corresponds to a detection threshold. More concretely, each cell is a struct with fieldsPREC
andREC
.mAP
: a vector of mean APs. Each element is the mean average precision under a detection threshold. For each detection threshold, the APs are first calculated for each query phrases, and then the mean value of the APs over all query phrases are computed.text_PREC_REC
: a cell vector, where each cell corresponds to a detection threshold. Each cell contains a struct array with 100 elements, each of which includes the text ID and PR curve for the 100 most frequent query phrases.
-
Display:
- gAP and mAP for each detection threshold are printed out. The results are also saved in the folder
results/[dataset_name]/[test_title]
with file name[subset_name]_level[level_id]_AP.txt
.
- gAP and mAP for each detection threshold are printed out. The results are also saved in the folder
For example, given a test result file for the level-0 query set stored in [toolbox_folder]/results/vg_v1/sample_test/level_0.txt
, you can use the following commands to run the default evaluation on the test set:
test_output='[toolbox_folder]/results/vg_v1/sample_test/level_0.txt'
eval_detection(test_output)
To use parallel processing, you can run
eval_detection(test_output, [], true)
where []
means using the default value for the second argument level_id
.
Then test_level0_AP.txt
will be saved in folder [toolbox_folder]/results/vg_v1/sample_test
.