CLE-ViT: Contrastive Learning Encoded Transformer for Ultra-Fine-Grained Visual Categorization

Official PyTorch implementation of CLE-ViT: Contrastive Learning Encoded Transformer for Ultra-Fine-Grained Visual Categorization (IJCAI 2023).

If you use the code in this repo for your work, please cite the following bib entries:

Abstract

Ultra-fine-grained visual classification (ultra-FGVC) targets at classifying sub-grained categories of fine-grained objects. This inevitably requires discriminative representation learning within a limited training set. Exploring intrinsic features from the object itself, e.g. , predicting the rotation of a given image, has demonstrated great progress towards learning discriminative representation. Yet none of these works consider explicit supervision for learning mutual information at instance level. To this end, this paper introduces CLE-ViT, a novel contrastive learning encoded transformer, to address the fundamental problem in ultra-FGVC. The core design is a self-supervised module that performs self-shuffling and masking and then distinguishes these altered images from other images. This drives the model to learn an optimized feature space that has a large inter-class distance while remaining tolerant to intra-class variations. By incorporating this self-supervised module, the network acquires more knowledge from the intrinsic structure of the input data, which improves the generalization ability without requiring extra manual annotations. CLE-ViT demonstrates strong performance on 7 publicly available datasets, demonstrating its effectiveness in the ultra-FGVC task.

Create Environment

Please use the command below to create the environment for CLE-ViT.

  $ conda env create -f env.yaml

Download Google pre-trained ViT models

Get models in this link: Swin-B, Swin-S...

wget https://github.com/SwinTransformer/storage/releases/download/v1.0.0/swin_base_patch4_window7_224_22k.pth

Dataset

You can download the datasets from the links below:

Run the experiments.

Using the scripts on scripts directory to train the model, e.g., train on SoybeanGene dataset.

$ sh scripts/run_gene.sh

Download Trained Models

Trained model BaiDuNetDisk

Password: r5zr

Acknowledgment

Our project references the codes in the following repos. Thanks for thier works and sharing.

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
.idea		.idea
configs		configs
data		data
figures		figures
kernels/window_process		kernels/window_process
models		models
scripts		scripts
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
config.py		config.py
env.yaml		env.yaml
extract_features.py		extract_features.py
logger.py		logger.py
lr_scheduler.py		lr_scheduler.py
main.py		main.py
optimizer.py		optimizer.py
test.py		test.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CLE-ViT: Contrastive Learning Encoded Transformer for Ultra-Fine-Grained Visual Categorization

Abstract

Create Environment

Download Google pre-trained ViT models

Dataset

Run the experiments.

Download Trained Models

Acknowledgment

About

Releases

Packages

Languages

License

Markin-Wang/CLEViT

Folders and files

Latest commit

History

Repository files navigation

CLE-ViT: Contrastive Learning Encoded Transformer for Ultra-Fine-Grained Visual Categorization

Abstract

Create Environment

Download Google pre-trained ViT models

Dataset

Run the experiments.

Download Trained Models

Acknowledgment

About

Topics

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages