-
Video Swin Transformer, [Paper]
-
AdaMAE: Adaptive Masking for Efficient Spatiotemporal Learning with Masked Autoencoders, [Paper]
-
A-ViT: Adaptive Tokens for Efficient Vision Transformer [Paper]
-
AdaViT: Adaptive Vision Transformers for Efficient Image Recognition [Paper]
-
DynamicViT: Efficient Vision Transformers with Dynamic Token Sparsification [Paper]
-
A Survey on In-context Learning [Paper]
-
Learning To Retrieve Prompts for In-Context Learning [Paper]