Added so the embeddings are actually dropped #9

MarcusLoppe · 2024-06-07T19:53:12Z

At the moment the TextEmbeddingReturner just returns a mask where it given the % chance using cond_drop_prob, is setting some mask values to false.

This works great if the model you are working with respects the mask, however x-transformers attention does not do this since it takes the context and uses the raw context and feeds it to the k and v linear layers.

https://github.com/lucidrains/x-transformers/blob/0c6266ee44ea99a4449cd9201ba55924a6a7eae7/x_transformers/x_transformers.py#L944

kv_input = default(context, x)

q_input = x
k_input = kv_input
v_input = kv_input
r_input = x 

q = self.to_q(q_input)
k = self.to_k(k_input)
v = self.to_v(v_input) if exists(self.to_v) else k
r = self.to_r(r_input) if exists(self.to_r) else None

lucidrains · 2024-06-07T19:59:22Z

oh, the text mask is actually sent here and then applied here

lucidrains · 2024-06-07T19:59:56Z

yea, there is some wasteful projections, even for the tokens that are masked out, but that's how most transformers are trained these days

MarcusLoppe · 2024-06-07T20:22:48Z

oh, the text mask is actually sent here and then applied here

Correct, but the mask isn't applied everywhere :/

yea, there is some wasteful projections, even for the tokens that are masked out, but that's how most transformers are trained these days

I've tried using the cond_drop_prob but I had very little success until I did the changes in this commit.
I think it's due to these wasteful projections since that will leak information that it shouldn't have seem and poisoning the output.

Image if you have a full transformer that takes tokens and have the mask to hide some of the tokens and the decoder should reconstruct the full sequence. If some of the unmasked data would leak into the model then the training poisoned.

lucidrains · 2024-06-07T20:44:03Z

the mask should be applied or that would be a pretty big bug in x transformers

let me check tonight, out with 🐕

lucidrains · 2024-06-07T20:50:39Z

oh shoot, maybe the condition dropping isn't being applied to cached text embeddings 🤦‍♂️ will fix if not!

MarcusLoppe · 2024-06-07T20:51:36Z

the mask should be applied or that would be a pretty big bug in x transformers

let me check tonight, out with 🐕

It's applied later on in he code but the k and v value at the start is not.
If you call it with the context and context_mask, the kv_input will be default context, then the next lines the kv_input value is pass to k and v linear layers.
This means that they get the full context, I don't see the context in previous steps that the mask get applied.

https://github.com/lucidrains/x-transformers/blob/0c6266ee44ea99a4449cd9201ba55924a6a7eae7/x_transformers/x_transformers.py#L944

    def forward(
        self,
        x,
        context = None,
        mask = None,
        context_mask = None,
        attn_mask = None,
        rel_pos = None,
        rotary_pos_emb = None,
        prev_attn = None,
        mem = None,
        mem_mask = None,
        return_intermediates = False,
        cache: Intermediates | None = None,
    ):
        b, n, h, kv_h, head_scale, num_mem_kv, device, has_context = x.shape[0], x.shape[1], self.heads, self.kv_heads, self.head_scale, self.num_mem_kv, x.device, exists(context)

        kv_input = default(context, x)

        q_input = x
        k_input = kv_input
        v_input = kv_input
        r_input = x

        if exists(mem):
            k_input, mem_packed_shape = pack([mem, k_input], 'b * d')
            v_input, _ = pack([mem, v_input], 'b * d')

        q = self.to_q(q_input)
        k = self.to_k(k_input)
        v = self.to_v(v_input) if exists(self.to_v) else k

    r = self.to_r(r_input) if exists(self.to_r) else None

lucidrains · 2024-06-07T20:56:24Z

later masking is fine, gradients cannot propagate to the point you are concerned about

lucidrains · 2024-06-07T21:00:01Z

https://github.com/lucidrains/meshgpt-pytorch/blob/main/meshgpt_pytorch/meshgpt_pytorch.py#L1362 the issue is because cached text embeds are passed through here and never cond dropped, will fix! thank you Marcus 🙏

MarcusLoppe · 2024-06-07T21:03:26Z

later masking is fine, gradients cannot propagate to the point you are concerned about

I'm not quite sure if I'm misunderstanding but doesn't it matter that k and v which have gotten the full context?

k = self.to_k(k_input)
v = self.to_v(v_input)

https://github.com/lucidrains/meshgpt-pytorch/blob/main/meshgpt_pytorch/meshgpt_pytorch.py#L1362 the issue is because cached text embeds are passed through here and never cond dropped, will fix! thank you Marcus 🙏

I don't think that's my problem since when cond_drop_prob is None the conditioners it will use the self.cond_drop_prob which is 0.0 for me.
However it might be good to change that just in case.

lucidrains · 2024-06-07T21:09:09Z

it doesn't matter if they are masked out later

MarcusLoppe · 2024-06-08T00:23:24Z

it doesn't matter if they are masked out later

You are probably correct, sorry I thought that the linear layer pays attention to the other dims in the length.

Little off topic, is there any mode or what not that can input (b, length, dim) and output another length, e.g. (b, 400,dim) to (b, 900, dim) ?
I know it works with autoregression but I'm not sure if there is any solution that can output in another length in a single pass.

lucidrains · 2024-06-08T00:45:26Z

@MarcusLoppe just doubled checked the code and couldn't find any errors

i think it should be working

lucidrains · 2024-06-08T00:47:09Z

you can pad a dimension, in your example, it would be F.pad(tensor, (0, 0, 0, 900 - 400)) if you wished it for the right hand side

MarcusLoppe · 2024-06-09T17:03:41Z

you can pad a dimension, in your example, it would be F.pad(tensor, (0, 0, 0, 900 - 400)) if you wished it for the right hand side

Hey, i was just thinking if it was possible to modify the autoencoder to encode point clouds and then decode into a 3D mesh.
But since point clouds and triangle count isn't the same I wanted to know if it's possible to modify the output length.

I was thinking of using a point cloud autoencoder to encode a point cloud and then modify the mesh-autoencoder so it inputs the embeddings of cloud point and decodes into 3D mesh.
The only solution i can think of is using a layer that can predict the length based on encoding and the reshape the tensor, but that seems like a bad solution. What do you think?

added so the embeddings are actually dropped

1da295a

Merge branch 'lucidrains:main' into emb_drop

866b5e8

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added so the embeddings are actually dropped #9

Added so the embeddings are actually dropped #9

MarcusLoppe commented Jun 7, 2024

lucidrains commented Jun 7, 2024

lucidrains commented Jun 7, 2024

MarcusLoppe commented Jun 7, 2024

lucidrains commented Jun 7, 2024

lucidrains commented Jun 7, 2024

MarcusLoppe commented Jun 7, 2024

lucidrains commented Jun 7, 2024

lucidrains commented Jun 7, 2024

MarcusLoppe commented Jun 7, 2024

lucidrains commented Jun 7, 2024

MarcusLoppe commented Jun 8, 2024

lucidrains commented Jun 8, 2024

lucidrains commented Jun 8, 2024

MarcusLoppe commented Jun 9, 2024 •

edited

Loading

Added so the embeddings are actually dropped #9

Are you sure you want to change the base?

Added so the embeddings are actually dropped #9

Conversation

MarcusLoppe commented Jun 7, 2024

lucidrains commented Jun 7, 2024

lucidrains commented Jun 7, 2024

MarcusLoppe commented Jun 7, 2024

lucidrains commented Jun 7, 2024

lucidrains commented Jun 7, 2024

MarcusLoppe commented Jun 7, 2024

lucidrains commented Jun 7, 2024

lucidrains commented Jun 7, 2024

MarcusLoppe commented Jun 7, 2024

lucidrains commented Jun 7, 2024

MarcusLoppe commented Jun 8, 2024

lucidrains commented Jun 8, 2024

lucidrains commented Jun 8, 2024

MarcusLoppe commented Jun 9, 2024 • edited Loading

MarcusLoppe commented Jun 9, 2024 •

edited

Loading