Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUDA out of memory. Tried to allocate 64.72 GiB. #86

Open
HolyFishhh opened this issue Dec 7, 2023 · 2 comments
Open

CUDA out of memory. Tried to allocate 64.72 GiB. #86

HolyFishhh opened this issue Dec 7, 2023 · 2 comments

Comments

@HolyFishhh
Copy link

CODE:

from TTS.api import TTS
import torch
import os
# Load the model to GPU
# Bark is really slow on CPU, so we recommend using GPU.
os.environ["SUNO_USE_SMALL_MODELS"] = "True"
os.environ["SUNO_OFFLOAD_CPU"] = "True"
CUDA_VISIBLE_DEVICES=0,1

device = "cuda" if torch.cuda.is_available() else "cpu"
tts = TTS("tts_models/multilingual/multi-dataset/bark").to(device)

# Cloning a new speaker
# This expects to find a mp3 or wav file like `bark_voices/new_speaker/speaker.wav`
# It computes the cloning values and stores in `bark_voices/new_speaker/speaker.npz`
tts.tts_to_file(text="我家的后面有一个很大的园,相传叫作百草园。现在是早已并屋子一起卖给朱文公的子孙了,连那最末次的相见也已经隔了七八年,其中似乎确凿只有一些野草;但那时却是我的乐园。",
                file_path="output.wav",
                voice_dir="videos/bark_voices",
                speaker="new_speaker")

result:

 > tts_models/multilingual/multi-dataset/bark is already downloaded.
TTS.tts.configs bark_config
TTS.vocoder.configs bark_config
TTS.encoder.configs bark_config
TTS.vc.configs bark_config
 > Using model: bark
/home/chatglm/miniconda3/envs/VideoReTalking/lib/python3.8/site-packages/torch/nn/utils/weight_norm.py:30: UserWarning: torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.
  warnings.warn("torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.")
 > Text splitted to sentences.
['我家的后面有一个很大的园,相传叫作百草园。', '现在是早已并屋子一起卖给朱文公的子孙了,连那最末次的相见也已经隔了七八年,其中似乎确凿只有一些野草;但那时却是我的乐园。']
Some weights of the model checkpoint at facebook/hubert-base-ls960 were not used when initializing HubertModel: ['encoder.pos_conv_embed.conv.weight_g', 'encoder.pos_conv_embed.conv.weight_v']
- This IS expected if you are initializing HubertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing HubertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of HubertModel were not initialized from the model checkpoint at facebook/hubert-base-ls960 and are newly initialized: ['encoder.pos_conv_embed.conv.parametrizations.weight.original0', 'encoder.pos_conv_embed.conv.parametrizations.weight.original1']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Traceback (most recent call last):
  File "bark_test.py", line 34, in <module>
    tts.tts_to_file(text="我家的后面有一个很大的园,相传叫作百草园。现在是早已并屋子一起卖给朱文公的子孙了,连那最末次的相见也已经隔了七八年,其中似乎确凿只有一些野草;但那时却是我的乐园。",
  File "/home/chatglm/ppt_and_human/TTS/api.py", line 403, in tts_to_file
    wav = self.tts(text=text, speaker=speaker, language=language, speaker_wav=speaker_wav, **kwargs)
  File "/home/chatglm/ppt_and_human/TTS/api.py", line 341, in tts
    wav = self.synthesizer.tts(
  File "/home/chatglm/ppt_and_human/TTS/utils/synthesizer.py", line 374, in tts
    outputs = self.tts_model.synthesize(
  File "/home/chatglm/ppt_and_human/TTS/tts/models/bark.py", line 219, in synthesize
    history_prompt = load_voice(self, speaker_id, voice_dirs)
  File "/home/chatglm/ppt_and_human/TTS/tts/layers/bark/inference_funcs.py", line 81, in load_voice
    generate_voice(audio=audio_path, model=model, output_path=output_path)
  File "/home/chatglm/ppt_and_human/TTS/tts/layers/bark/inference_funcs.py", line 145, in generate_voice
    semantic_vectors = hubert_model.forward(audio[0], input_sample_hz=model.config.sample_rate)  File "/home/chatglm/miniconda3/envs/VideoReTalking/lib/python3.8/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/chatglm/ppt_and_human/TTS/tts/layers/bark/hubert/kmeans_hubert.py", line 71, in forward
    outputs = self.model.forward(
  File "/home/chatglm/miniconda3/envs/VideoReTalking/lib/python3.8/site-packages/transformers/models/hubert/modeling_hubert.py", line 1091, in forward
    encoder_outputs = self.encoder(
  File "/home/chatglm/miniconda3/envs/VideoReTalking/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/chatglm/miniconda3/envs/VideoReTalking/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/chatglm/miniconda3/envs/VideoReTalking/lib/python3.8/site-packages/transformers/models/hubert/modeling_hubert.py", line 738, in forward
    layer_outputs = layer(
  File "/home/chatglm/miniconda3/envs/VideoReTalking/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/chatglm/miniconda3/envs/VideoReTalking/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/chatglm/miniconda3/envs/VideoReTalking/lib/python3.8/site-packages/transformers/models/hubert/modeling_hubert.py", line 589, in forward
    hidden_states, attn_weights, _ = self.attention(
  File "/home/chatglm/miniconda3/envs/VideoReTalking/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/chatglm/miniconda3/envs/VideoReTalking/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/chatglm/miniconda3/envs/VideoReTalking/lib/python3.8/site-packages/transformers/models/hubert/modeling_hubert.py", line 488, in forward
    attn_weights = torch.bmm(query_states, key_states.transpose(1, 2))
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 64.72 GiB. GPU 0 has a total capacty of 23.48 GiB of which 9.36 GiB is free. Process 221492 has 2.96 GiB memory in use. Process 271045 has 720.00 MiB memory in use. Including non-PyTorch memory, this process has 10.42 GiB memory in use. Of the allocated memory 5.45 GiB is allocated by PyTorch, and 4.69 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

I downloaded the model after running the coqui TTS program directly, I think the gpu memory is unreasonable, is there a problem with my Settings? Can you tell me what's wrong, please

Copy link

dagshub bot commented Dec 7, 2023

@gustrd
Copy link

gustrd commented Dec 7, 2023

Try https://huggingface.co/suno/bark-small at the TTS.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants