TransXNet: Learning Both Global and Local Dynamics with a Dual Dynamic Token Mixer for Visual Recognition

This is an official PyTorch implementation of "TransXNet: Learning Both Global and Local Dynamics with a Dual Dynamic Token Mixer for Visual Recognition".

Introduction

TransXNet is a CNN-Transformer hybrid vision backbone that can model both global and local dynamics with a Dual Dynamic Token Mixer (D-Mixer), achieving superior performance over both CNN and Transformer-based models.

Image Classification

1. Requirements

We highly suggest using our provided dependencies to ensure reproducibility:

# Environments:
cuda==11.6
python==3.8.15
# Packages:
mmcv==1.7.1
timm==0.6.12
torch==1.13.1
torchvision==0.14.1

2. Data Preparation

ImageNet with the following folder structure, you can extract ImageNet by this script.

│imagenet/
├──train/
│  ├── n01440764
│  │   ├── n01440764_10026.JPEG
│  │   ├── n01440764_10027.JPEG
│  │   ├── ......
│  ├── ......
├──val/
│  ├── n01440764
│  │   ├── ILSVRC2012_val_00000293.JPEG
│  │   ├── ILSVRC2012_val_00002138.JPEG
│  │   ├── ......
│  ├── ......

3. Main Results on ImageNet with Pretrained Models

Models	Input Size	FLOPs (G)	Params (M)	Top-1 Acc.(%)	Download
TransXNet-T	224x224	1.8	12.8	81.6	model
TransXNet-S	224x224	4.5	26.9	83.8	model
TransXNet-B	224x224	8.3	48.0	84.6	model

4. Train

To train TransXNet models on ImageNet-1K with 8 gpus (single node), run:

bash scripts/train_tiny.sh # train TransXNet-T
bash scripts/train_small.sh # train TransXNet-S
bash scripts/train_base.sh # train TransXNet-B

5. Validation

To evaluate TransXNet on ImageNet-1K, run:

MODEL=transxnet_t # transxnet_{t, s, b}
python3 validate.py \
/path/to/imagenet \
--model $MODEL -b 128 \
--pretrained # or --checkpoint /path/to/checkpoint

Object Detection and Semantic Segmentation

Object Detection
Semantic Segmentation

Citation

If you find this project useful for your research, please consider citing:

@article{lou2023transxnet,
  title={TransXNet: Learning Both Global and Local Dynamics with a Dual Dynamic Token Mixer for Visual Recognition},
  author={Lou, Meng and Zhou, Hong-Yu and Yang, Sibei and Yu, Yizhou},
  journal={arXiv preprint arXiv:2310.19380},
  year={2023}
}

Acknowledgment

Our implementation is mainly based on the following codebases. We gratefully thank the authors for their wonderful works.

poolformer
pytorch-image-models
mmdetection
mmsegmentation

Contact

If you have any questions, please feel free to create issues or contact me at lmzmm.0921@gmail.com.

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
assets		assets
models		models
object_detection		object_detection
scripts		scripts
semantic_segmentation		semantic_segmentation
README.md		README.md
train.py		train.py
validate.py		validate.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TransXNet: Learning Both Global and Local Dynamics with a Dual Dynamic Token Mixer for Visual Recognition

Introduction

Image Classification

1. Requirements

2. Data Preparation

3. Main Results on ImageNet with Pretrained Models

4. Train

5. Validation

Object Detection and Semantic Segmentation

Citation

Acknowledgment

Contact

About

Releases 1

Packages

Languages

LMMMEng/TransXNet

Folders and files

Latest commit

History

Repository files navigation

TransXNet: Learning Both Global and Local Dynamics with a Dual Dynamic Token Mixer for Visual Recognition

Introduction

Image Classification

1. Requirements

2. Data Preparation

3. Main Results on ImageNet with Pretrained Models

4. Train

5. Validation

Object Detection and Semantic Segmentation

Citation

Acknowledgment

Contact

About

Topics

Resources

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages