The number of tokens are inconsistent in get_image_embeds #29

CharlesGong12 · 2024-11-06T13:16:23Z

Hi,

Thanks for your good work!

In get_image_embeds of Adapter, the token length will be 256 when image_pil or image_tensor is input, while it will be 64 when image_embeds is given. The similar thing was discussed in issue #14 .

SEED-X/src/models/detokenizer/adapter_modules.py

Line 108 in 65292a6

image_embeds = self.visual_encoder(image_tensor)

image_embeds's shape[1] will be 256 from code above, when image_tensor or image_pil is given. This is the case when we directly use eval_seed_x_detokenizer.py.

SEED-X/src/models/detokenizer/adapter_modules.py

Line 116 in 65292a6

image_embeds = torch.cat([image_embeds, image_embeds_neg], dim=0)

However, when image_embeds is given, image_embeds's shape[1] will be 64 from code above. Because the llm's IMG tokens are set to 64. This is the case when we use llm's output to decode and get an image.

Indeed, the shape[1] of both conditions above will be 64 later, since self.resampler is called in self.encode_image_embeds, whose num_queries is 64.

Will the differences between the two cases have any influence? Or, whether 64 or 256 is used in training here

SEED-X/src/models/detokenizer/adapter_modules.py

Line 39 in 65292a6

    
           def forward(self, noisy_latents, timesteps, image_embeds, text_embeds, noise, time_ids):

?

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The number of tokens are inconsistent in get_image_embeds #29

The number of tokens are inconsistent in get_image_embeds #29

CharlesGong12 commented Nov 6, 2024 •

edited

Loading

The number of tokens are inconsistent in get_image_embeds #29

The number of tokens are inconsistent in get_image_embeds #29

Comments

CharlesGong12 commented Nov 6, 2024 • edited Loading

CharlesGong12 commented Nov 6, 2024 •

edited

Loading