Skip to content

Reimplementation of Attention Rollout. Future work might experiment with SOTA ViT using Attention Rollout's successors

License

Notifications You must be signed in to change notification settings

5thGenDev/AttentionRollout_ReImplementation

Repository files navigation

AttentionRollout ReImplementation

Other Attention in ViT:

Note that d_model = embed_dim already where d_model = number of tokens, head_dim = d_model/num_heads

  • Hydra Attention argues for num_heads = embed_dim to get linear complexity. Have 2 Hydra Attention-Encoder block at the back improved accuracy while reduced FLOPs and runtime. Reimplemented by robflynnyh. Unfortunately, visualize Hydra Attention needed a different math so we will rely on their (figure 3 + appendix) to discuss different pretrained model
  • Dilated-Self Attention used for LongNet: Also linear complexity. Reimplemented by https://github.com/alexisrozhkov/dilated-self-attention

Other than Attention Rollout

  • Attention Rollout
  • Gradient-based Attention Rollout
  • ????

About

Reimplementation of Attention Rollout. Future work might experiment with SOTA ViT using Attention Rollout's successors

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published