-
Notifications
You must be signed in to change notification settings - Fork 119
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
code mismatch with the theory #9
Comments
This is after conv_proj_q, conv_proj_k and conv_proj_v. But I'm not sure why the authors still use the pointwise projections after the conv projections. |
@askerlee, I think it is part of depthwise separable convolutions. Depthwise convolutions followed by pointwise projections. |
I want to know the code how to call the get_cls_model function in the cls_cvt.py |
Hi @diaodeyi , Good luck.. |
By the way , the |
@diaodeyi It's the single linear layer (with the same in/out dimension) right after the attention calculation. The FFN in this code is class MLP (line 53). |
Thanks, there are so many linear projections that aren't be mentioned by paper. |
@diaodeyi Yes. I think they have left them out with the presumption that the reader has a prior good understanding of basic transformer architecture. |
No, I think the proj_q\k\v are exactly the things the paper does not mention. |
Hi, the seperable depth conv contains two parts: depth-wise conv and point-wise conv. The author implemented the point-wise conv via the linear layer, maybe because it's convenience for the ablation study. The only difference between them is the bias term. |
Hi All,
Thanks for providing the code.
I come across the mismatch between the code and the theory you proposed for the transformer block. The paper says "Instead, we propose to replace the original position-wise linear projection for Multi-Head Self-Attention (MHSA)", but lines 198-200 in https://github.com/leoxiaobin/CvT/blob/main/lib/models/cls_cvt.py still projects q,k,v through linear layers. Have you missed an else statement there? why are you projecting q,k,v values twice?
Please correct me if I have misunderstood it.
Thanks,
Basavaraj
The text was updated successfully, but these errors were encountered: