Laplacian-Former: Overcoming the Limitations of Vision Transformers in Local Texture Detection
MICCAI 2023
Vision Transformer (ViT) models have demonstrated a breakthrough in a wide range of computer vision tasks. However, compared to the Convolutional Neural Network (CNN) models, it has been observed that the ViT models struggle to capture high-frequency components of images, which can limit their ability to detect local textures and edge information. As abnormalities in human tissue, such as tumors and lesions, may greatly vary in structure, texture, and shape, high-frequency information such as texture is crucial for effective semantic segmentation tasks. To address this limitation in ViT models, we propose a new technique, Laplacian-Former, that enhances the self-attention map by adaptively re-calibrating the frequency information in a Laplacian pyramid. More specifically, our proposed method utilizes a dual attention mechanism via efficient attention and frequency attention while the efficient attention mechanism reduces the complexity of self-attention to linear while producing the same output, selectively intensifying the contribution of shape and texture features. Furthermore, we introduce a novel efficient enhancement multi-scale bridge that effectively transfers spatial information from the encoder to the decoder while preserving the fundamental features.
@inproceedings{azad2023laplacian,
title={Laplacian-Former: Overcoming the Limitations of Vision Transformers in Local Texture Detection},
author={Azad, Reza and Kazerouni, Amirhossein and Azad, Babak and Khodapanah Aghdam, Ehsan and Velichko, Yury and Bagci, Ulas and Merhof, Dorit},
booktitle={International Conference on Medical Image Computing and Computer-Assisted Intervention},
pages={736--746},
year={2023},
organization={Springer}
}
- May 25, 2023: Early Accepted in MICCAI 2023! 🥳🔥
-
Download the Synapse Dataset from here.
-
Run the following code to install the requirements.
pip install -r requirements.txt
-
Run the below code to train the model on the synapse dataset.
If you want to train the compact version of the model with only three encoders, replace the
train.py
withtrain_compact.py
.
python train.py --root_path ./data/Synapse/train_npz --test_path ./data/Synapse/test_vol_h5 --batch_size 24 --eval_interval 20 --max_epochs 400 --dst_fast --resume --model_path [MODEL PATH]
--root_path [Train data path]
--test_path [Test data path]
--eval_interval [Evaluation epoch]
--dst_fast [Optional] [Load all data into RAM for faster training]
--resume [Optional] [Resume from checkpoint]
--model_path [Optional] [Provide the path to the latest checkpoint file for loading the model.]
For information regarding training the skin dataset, please refer to this link.
-
Download the learned weights from the below link:
Dataset Model Download link Synapse Laplacian-Former Download
-
Run the below code to test the model on the synapse dataset.
python test.py --test_path ./data/Synapse/test_vol_h5 --is_savenii --pretrained_path './best_model.pth'
--test_path [Test data path] --is_savenii [Whether to save results during inference] --pretrained_path [Pretrained model path]
For evaluating the performance of the proposed method, two challenging tasks in medical image segmentation have been considered: Synapse Dataset and ISIC 2018 Dataset. The proposed Laplacian-Former achieves superior segmentation performance.
Our results in the table are updated according to the model weight.