LV ViT

Paper：Token Labeling: Training a 85.4% Top-1 Accuracy Vision Transformer with 56M Parameters on ImageNet
Origin Repo：zihangJiang/TokenLabeling
Code：lvvit.py

Evaluate Transforms：

# backend: pil
# input_size: 224x224
transforms = T.Compose([
    T.Resize(248, interpolation='bicubic'),
    T.CenterCrop(224),
    T.ToTensor(),
    T.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
])

# backend: pil
# input_size: 384x384
transforms = T.Compose([
    T.Resize(384, interpolation='bicubic'),
    T.CenterCrop(384),
    T.ToTensor(),
    T.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
])

# backend: pil
# input_size: 448x448
transforms = T.Compose([
    T.Resize(448, interpolation='bicubic'),
    T.CenterCrop(448),
    T.ToTensor(),
    T.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
])

Model Details：

Model	Model Name	Params (M)	FLOPs (G)	Top-1 (%)	Top-5 (%)	Pretrained Model
LV-ViT-S	lvvit_s	26.2	6.6	83.17	95.87	Download
LV-ViT-M	lvvit_m	55.8	16.0	83.88	96.05	Download
LV-ViT-S-384	lvvit_s_384	26.3	22.2	84.56	96.39	Download
LV-ViT-M-384	lvvit_m_384	56.0	42.2	85.34	96.72	Download
LV-ViT-M-448	lvvit_m_448	56.1	61.0	85.47	96.82	Download
LV-ViT-L-448	lvvit_l_448	150.5	157.2	86.09	96.85	Download

Citation：

@article{jiang2021token,
title={Token Labeling: Training a 85.5% Top-1 Accuracy Vision Transformer with 56M Parameters on ImageNet},
author={Jiang, Zihang and Hou, Qibin and Yuan, Li and Zhou, Daquan and Jin, Xiaojie and Wang, Anran and Feng, Jiashi},
journal={arXiv preprint arXiv:2104.10858},
year={2021}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

lvvit.md

lvvit.md

LV ViT

Files

lvvit.md

Latest commit

History

lvvit.md

File metadata and controls

LV ViT