code mismatch with the theory #9

basavaraj-hampiholi · 2021-07-29T14:29:45Z

Hi All,

Thanks for providing the code.
I come across the mismatch between the code and the theory you proposed for the transformer block. The paper says "Instead, we propose to replace the original position-wise linear projection for Multi-Head Self-Attention (MHSA)", but lines 198-200 in https://github.com/leoxiaobin/CvT/blob/main/lib/models/cls_cvt.py still projects q,k,v through linear layers. Have you missed an else statement there? why are you projecting q,k,v values twice?

Please correct me if I have misunderstood it.

Thanks,
Basavaraj

askerlee · 2021-08-06T13:52:28Z

This is after conv_proj_q, conv_proj_k and conv_proj_v. But I'm not sure why the authors still use the pointwise projections after the conv projections.

basavaraj-hampiholi · 2021-08-17T07:58:12Z

@askerlee, I think it is part of depthwise separable convolutions. Depthwise convolutions followed by pointwise projections.

diaodeyi · 2021-09-21T01:26:25Z

I want to know the code how to call the get_cls_model function in the cls_cvt.py

basavaraj-hampiholi · 2021-09-21T09:53:17Z

Hi @diaodeyi ,
In the present code, get_cls_model function is called by the registry.py. You can use build_model in build.py to call the model. Otherwise, you can remove the registry and directly call get_cls_model function. Both way should work

Good luck..

diaodeyi · 2021-10-03T02:15:10Z

By the way , theself.proj = nn.Linear(dim_out, dim_out)Means FFN only projection with same dimension?

basavaraj-hampiholi · 2021-10-06T07:46:26Z

@diaodeyi It's the single linear layer (with the same in/out dimension) right after the attention calculation. The FFN in this code is class MLP (line 53).

diaodeyi · 2021-10-06T08:02:24Z

Thanks, there are so many linear projections that aren't be mentioned by paper.

basavaraj-hampiholi · 2021-10-06T08:30:50Z

@diaodeyi Yes. I think they have left them out with the presumption that the reader has a prior good understanding of basic transformer architecture.

diaodeyi · 2022-03-09T08:24:37Z

@askerlee, I think it is part of depthwise separable convolutions. Depthwise convolutions followed by pointwise projections.

No, I think the proj_q\k\v are exactly the things the paper does not mention.

Markin-Wang · 2022-10-01T21:20:55Z

@askerlee, I think it is part of depthwise separable convolutions. Depthwise convolutions followed by pointwise projections.

No, I think the proj_q\k\v are exactly the things the paper does not mention.

Hi, the seperable depth conv contains two parts: depth-wise conv and point-wise conv. The author implemented the point-wise conv via the linear layer, maybe because it's convenience for the ablation study. The only difference between them is the bias term.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

code mismatch with the theory #9

code mismatch with the theory #9

basavaraj-hampiholi commented Jul 29, 2021

askerlee commented Aug 6, 2021

basavaraj-hampiholi commented Aug 17, 2021

diaodeyi commented Sep 21, 2021

basavaraj-hampiholi commented Sep 21, 2021

diaodeyi commented Oct 3, 2021

basavaraj-hampiholi commented Oct 6, 2021

diaodeyi commented Oct 6, 2021

basavaraj-hampiholi commented Oct 6, 2021

diaodeyi commented Mar 9, 2022

Markin-Wang commented Oct 1, 2022 •

edited

Loading

code mismatch with the theory #9

code mismatch with the theory #9

Comments

basavaraj-hampiholi commented Jul 29, 2021

askerlee commented Aug 6, 2021

basavaraj-hampiholi commented Aug 17, 2021

diaodeyi commented Sep 21, 2021

basavaraj-hampiholi commented Sep 21, 2021

diaodeyi commented Oct 3, 2021

basavaraj-hampiholi commented Oct 6, 2021

diaodeyi commented Oct 6, 2021

basavaraj-hampiholi commented Oct 6, 2021

diaodeyi commented Mar 9, 2022

Markin-Wang commented Oct 1, 2022 • edited Loading

Markin-Wang commented Oct 1, 2022 •

edited

Loading