- CLIP trained on Flickr8k + Flickr30k for 200 epochs
# e.g.,
python3 linear_classification.py\
--ckpt_path="../clip_flickr.pth"\
--data_dir="../imagenet-mini/"\
--n_epochs=64\
--batch_size=128\
--n_cpus=4 # Optional
- Top-5 accuracy on validation set: 5.8%
# e.g.,
python3 zero_shot_classification.py\
--ckpt_path="../clip_flickr.pth"\
--data_dir="../imagenet-mini/"\
--batch_size=16\
--n_cpus=4\ # Optional
--max_len=128\ # Optional
--k=10 # Optional
- Top-10 accuracy on train + validation set: 3.0%
- Temperature와 관련한 부분은 구현하지 않았습니다.
- "The learnable temperature parameter was clipped to prevent scaling the logits by more than 100 which we found necessary to prevent training instability."