You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Okay, so I just kind of forced a different latent embedding size. I wanted one second of output, so I divided the original latent dimension (256) by 10 and then rounded up.
def prepare_latents(self, batch_size, inference_scheduler, num_channels_latents, dtype, device):
# EDIT: they are hardcoding the latent size here!! to 256! I want to change this!
shape = (batch_size, num_channels_latents, 256, 16)
shape = (batch_size, num_channels_latents, 26, 16) # scaled to one second???
Indeed, the inference script now outputs audio files that are 1 second in length. Is this....okay??
I suppose duration could be introduced as a training argument, and then saved as part of the training config and used in this way to adjust the lengths of the audio generated during the inference process...
Hi, This is really a lovely repository. But how can I change the duration of the generated audio?
Thanks!!
The text was updated successfully, but these errors were encountered: