Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Support for 'cuda' device in NexaVoiceInference class #143

Open
nmandic78 opened this issue Oct 3, 2024 · 3 comments
Open

[FEATURE] Support for 'cuda' device in NexaVoiceInference class #143

nmandic78 opened this issue Oct 3, 2024 · 3 comments
Assignees
Labels
💡 feature request New feature or request

Comments

@nmandic78
Copy link

nmandic78 commented Oct 3, 2024

I noticed that the NexaVoiceInference class hardcodes the device to "cpu", making it impossible to use a GPU for inference. I suggest adding a device argument to allow switching between "cpu" and "cuda". Here’s the proposed change:

    def __init__(self, model_path, local_path=None, device='cpu', **kwargs):
        self.model_path = model_path
        self.downloaded_path = local_path
        self.device = device   # this line added
        self.params = DEFAULT_VOICE_GEN_PARAMS

and here:

self.model = WhisperModel(
    self.downloaded_path,
    device=self.device,  # Change this line
    compute_type=self.params["compute_type"],
)

Would you be open to a pull request for this change?

Similar Features or References

No response

@nmandic78 nmandic78 added the 💡 feature request New feature or request label Oct 3, 2024
@zhiyuan8
Copy link
Contributor

zhiyuan8 commented Oct 3, 2024

Sure, we will add the option to support using huggingface transformer style usge of CUDA, such as cuda:0 in our next release. Now all GPUs are used by defaults, if you use CUDA compilation options.

@nmandic78
Copy link
Author

I did use CUDA compilation options (CMAKE_ARGS="-DGGML_CUDA=ON -DSD_CUBLAS=ON"). NexaTextInference, for example, do use GPU by default, but this is in NexaVoiceInference class:

            self.model = WhisperModel(
                self.downloaded_path,
                device="cpu",
                compute_type=self.params["compute_type"],
            )

@zhycheng614
Copy link
Collaborator

I did use CUDA compilation options (CMAKE_ARGS="-DGGML_CUDA=ON -DSD_CUBLAS=ON"). NexaTextInference, for example, do use GPU by default, but this is in NexaVoiceInference class:

            self.model = WhisperModel(
                self.downloaded_path,
                device="cpu",
                compute_type=self.params["compute_type"],
            )

We hard-coded "cpu" here because to enable cuda for faster-whisper, wither cuBLAS or cuDNN is needed on your machine. Currently we cannot build this into our sdk. If we change "cpu" to "cuda" or "auto", it won't work because of lack of dependency.

However, you can achieve to run on cuda by doing this:

  1. Refer to the faster whisper official github to know how to install cuBLAS or cuDNN as the dependency required by GPU running.
  2. Change our python source code (on your machine, not through pull request) either in your environment packages or through pip install -e ., locate this issue and change "cpu" to "auto" or "cuda". And it should work for you then.

Thank you for your question and we will be committed to thoroughly solve this problem in the near future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
💡 feature request New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants