Skip to content

Nofear18/ID_VL_Pruning

Repository files navigation

Exploring Intrinsic Dimension for Vision-Language Model Pruning

This is the official implementation of ICML'24 paper Exploring Intrinsic Dimension for Vision-Language Model Pruning.

🏃‍♂️ TL;DR

The Intrinsic Dimension (ID) of vision representations spans a wider and greater range than that of language representations, which we attribute to the heightened sensitivity of vision models to pruning. In contrast, language models exhibit greater robustness despite containing more redundant weights.

Example Image

🔨 Installation

This code is tested on Pytorch==1.11.0, cuda==11.5, and python==3.9.0. Install the dependencies with:

conda install --yes --file requirements.txt

📐 Evaluation of IDs

  • CPU Mode
    python ComputeID.py -n 2000 --Path ID/Blip_coco --cpu
  • GPU Mode
    python ComputeID.py -n 2000 --Path ID/Blip_coco --gpu 0

✂️ Pruning

Image Captioning on the COCO Caption Dataset with BLIP

  • Dataset & Annotation

    1. Download the COCO2014 dataset and unzip it under the datasets folder. Update the image_root in config.
    2. Download all-in-one annotations from this link, unzip it under the coco/annotation folder, and update the annotation in config.
  • Pruning

    1. Download the uncompressed model from this link and place it in the pretrained folder. Update the pretrained in config.
    2. To prune BLIP by 80% and add Intrinsic Dimension to the importance score metric during pruning, run:
      python -m torch.distributed.run --nproc_per_node=2 --master_port=29505 train_caption.py --final_threshold 0.2 --model_dir coco/PLATON80 --pruner_name PLATON --useID
  • Evaluation

    1. Place the pruned model in the output folder and update the --pretrained in the scripts.
    2. To evaluate the pruned model, run:
      python -m torch.distributed.run --nproc_per_node=2 --master_port=29505 train_caption.py --pruner_name PLATON --pruned output/pruned_model_path --evaluate

Visual Reasoning on the NLVR2 Dataset with BLIP

  • Dataset & Annotation

    1. Download the NLVR2 dataset and unzip it under the datasets folder. Update the image_root in config.
    2. Download all-in-one annotations from this link, unzip it under the nlvr/annotation folder, and update the annotation in config.
  • Pruning

    1. Download the uncompressed model from this link and place it in the pretrained folder. Update the pretrained in config.
    2. To prune BLIP by 80% and add Intrinsic Dimension to the importance score metric during pruning, run:
      python -m torch.distributed.run --nproc_per_node=2 --master_port=29505 train_nlvr.py --final_threshold 0.2 --model_dir nlvr/PLATON80 --pruner_name PLATON --useID
  • Evaluation

    1. Place the pruned model in the output folder and update the --pretrained in the scripts.
    2. To evaluate the pruned model, run:
      python -m torch.distributed.run --nproc_per_node=2 --master_port=29505 train_nlvr.py --pruner_name PLATON --pruned output/pruned_model_path --evaluate

💐 Acknowledgments

This code is built upon IntrinsicDimDeep, BLIP and PLATON, and we sincerely appreciate their contributions.

🌸 Citation

If you find this work useful, please consider citing our paper:

@inproceedings{
wang2024exploring,
title={Exploring Intrinsic Dimension for Vision-Language Model Pruning},
author={Hanzhang Wang, Jiawen Zhang, and Qingyuan Ma},
booktitle={Forty-first International Conference on Machine Learning},
year={2024},
url={https://openreview.net/forum?id=xxL7CEWuxz}
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages