This is the official repo for WACV 2023 paper "MixVPR: Feature Mixing for Visual Place Recognition"
This paper introduces MixVPR, a novel all-MLP feature aggregation method that addresses the challenges of large-scale Visual Place Recognition, while remaining practical for real-world scenarios with strict latency requirements. The technique leverages feature maps from pre-trained backbones as a set of global features, and integrates a global relationship between them through a cascade of feature mixing, eliminating the need for local or pyramidal aggregation. MixVPR achieves new state-of-the-art performance on multiple large-scale benchmarks, while being significantly more efficient in terms of latency and parameter count compared to existing methods.
[WACV OpenAccess] [ArXiv]
All models have been trained on GSV-Cities dataset (https://github.com/amaralibey/gsv-cities).
Backbone | Output dimension |
Pitts250k-test | Pitts30k-test | MSLS-val | DOWNLOAD |
||||||
---|---|---|---|---|---|---|---|---|---|---|---|
R@1 | R@5 | R@10 | R@1 | R@5 | R@10 | R@1 | R@5 | R@10 | |||
ResNet50 | 4096 | 94.3 | 98.2 | 98.9 | 91.6 | 95.5 | 96.4 | 88.2 | 93.1 | 94.3 | LINK |
ResNet50 | 512 | 93.2 | 97.9 | 98.6 | 90.7 | 95.5 | 96.3 | 84.1 | 91.8 | 93.7 | LINK |
ResNet50 | 128 | 88.7 | 95.8 | 97.4 | 87.8 | 94.3 | 95.7 | 78.5 | 88.2 | 90.4 | LINK |
Code to load the pretrained weights is as follows:
from main import VPRModel
# Note that images must be resized to 320x320
model = VPRModel(backbone_arch='resnet50',
layers_to_crop=[4],
agg_arch='MixVPR',
agg_config={'in_channels' : 1024,
'in_h' : 20,
'in_w' : 20,
'out_channels' : 1024,
'mix_depth' : 4,
'mlp_ratio' : 1,
'out_rows' : 4},
)
state_dict = torch.load('./LOGS/resnet50_MixVPR_4096_channels(1024)_rows(4).ckpt')
model.load_state_dict(state_dict)
model.eval()
@inproceedings{ali2023mixvpr,
title={{MixVPR}: Feature Mixing for Visual Place Recognition},
author={Ali-bey, Amar and Chaib-draa, Brahim and Gigu{\`e}re, Philippe},
booktitle={Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision},
pages={2998--3007},
year={2023}
}