-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added so the embeddings are actually dropped #9
base: main
Are you sure you want to change the base?
Conversation
yea, there is some wasteful projections, even for the tokens that are masked out, but that's how most transformers are trained these days |
Correct, but the mask isn't applied everywhere :/
I've tried using the cond_drop_prob but I had very little success until I did the changes in this commit. Image if you have a full transformer that takes tokens and have the mask to hide some of the tokens and the decoder should reconstruct the full sequence. If some of the unmasked data would leak into the model then the training poisoned. |
the mask should be applied or that would be a pretty big bug in x transformers let me check tonight, out with 🐕 |
oh shoot, maybe the condition dropping isn't being applied to cached text embeddings 🤦♂️ will fix if not! |
It's applied later on in he code but the k and v value at the start is not.
|
later masking is fine, gradients cannot propagate to the point you are concerned about |
https://github.com/lucidrains/meshgpt-pytorch/blob/main/meshgpt_pytorch/meshgpt_pytorch.py#L1362 the issue is because cached text embeds are passed through here and never cond dropped, will fix! thank you Marcus 🙏 |
I'm not quite sure if I'm misunderstanding but doesn't it matter that k and v which have gotten the full context?
I don't think that's my problem since when cond_drop_prob is None the conditioners it will use the self.cond_drop_prob which is 0.0 for me. |
it doesn't matter if they are masked out later |
You are probably correct, sorry I thought that the linear layer pays attention to the other dims in the length. Little off topic, is there any mode or what not that can input (b, length, dim) and output another length, e.g. (b, 400,dim) to (b, 900, dim) ? |
@MarcusLoppe just doubled checked the code and couldn't find any errors i think it should be working |
you can pad a dimension, in your example, it would be |
Hey, i was just thinking if it was possible to modify the autoencoder to encode point clouds and then decode into a 3D mesh. I was thinking of using a point cloud autoencoder to encode a point cloud and then modify the mesh-autoencoder so it inputs the embeddings of cloud point and decodes into 3D mesh. |
At the moment the TextEmbeddingReturner just returns a mask where it given the % chance using cond_drop_prob, is setting some mask values to false.
This works great if the model you are working with respects the mask, however x-transformers attention does not do this since it takes the context and uses the raw context and feeds it to the k and v linear layers.
https://github.com/lucidrains/x-transformers/blob/0c6266ee44ea99a4449cd9201ba55924a6a7eae7/x_transformers/x_transformers.py#L944