-
Paper:Token Labeling: Training a 85.4% Top-1 Accuracy Vision Transformer with 56M Parameters on ImageNet
-
Origin Repo:zihangJiang/TokenLabeling
-
Code:lvvit.py
-
Evaluate Transforms:
# backend: pil # input_size: 224x224 transforms = T.Compose([ T.Resize(248, interpolation='bicubic'), T.CenterCrop(224), T.ToTensor(), T.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]) ]) # backend: pil # input_size: 384x384 transforms = T.Compose([ T.Resize(384, interpolation='bicubic'), T.CenterCrop(384), T.ToTensor(), T.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]) ]) # backend: pil # input_size: 448x448 transforms = T.Compose([ T.Resize(448, interpolation='bicubic'), T.CenterCrop(448), T.ToTensor(), T.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]) ])
-
Model Details:
Model Model Name Params (M) FLOPs (G) Top-1 (%) Top-5 (%) Pretrained Model LV-ViT-S lvvit_s 26.2 6.6 83.17 95.87 Download LV-ViT-M lvvit_m 55.8 16.0 83.88 96.05 Download LV-ViT-S-384 lvvit_s_384 26.3 22.2 84.56 96.39 Download LV-ViT-M-384 lvvit_m_384 56.0 42.2 85.34 96.72 Download LV-ViT-M-448 lvvit_m_448 56.1 61.0 85.47 96.82 Download LV-ViT-L-448 lvvit_l_448 150.5 157.2 86.09 96.85 Download
-
Citation:
@article{jiang2021token, title={Token Labeling: Training a 85.5% Top-1 Accuracy Vision Transformer with 56M Parameters on ImageNet}, author={Jiang, Zihang and Hou, Qibin and Yuan, Li and Zhou, Daquan and Jin, Xiaojie and Wang, Anran and Feng, Jiashi}, journal={arXiv preprint arXiv:2104.10858}, year={2021} }