-
Paper:Swin Transformer: Hierarchical Vision Transformer using Shifted Windows
-
Origin Repo:microsoft/Swin-Transformer
-
Code:swin.py
-
Evaluate Transforms:
# backend: pil # input_size: 224x224 transforms = T.Compose([ T.Resize(248, interpolation='bicubic'), T.CenterCrop(224), T.ToTensor(), T.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]) ]) # backend: pil # input_size: 384x384 transforms = T.Compose([ T.Resize((384, 384), interpolation='bicubic'), T.ToTensor(), T.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]) ])
-
Model Details:
Model Model Name Params (M) FLOPs (G) Top-1 (%) Top-5 (%) Pretrained Model Swin-tiny swin_ti 28 4.5 81.19 95.51 Download Swin-small swin_s 50 8.7 83.18 96.24 Download Swin-base swin_b 88 15.4 83.42 96.45 Download Swin-base-384 swin_b_384 88 47.1 84.47 96.95 Download
-
Citation:
@article{liu2021Swin, title={Swin Transformer: Hierarchical Vision Transformer using Shifted Windows}, author={Liu, Ze and Lin, Yutong and Cao, Yue and Hu, Han and Wei, Yixuan and Zhang, Zheng and Lin, Stephen and Guo, Baining}, journal={arXiv preprint arXiv:2103.14030}, year={2021} }