WhisperLive-TensorRT

We have only tested the TensorRT backend in docker so, we recommend docker for a smooth TensorRT backend setup. Note: We use tensorrt_llm==0.15.0.dev2024111200

Installation

Install docker
Install nvidia-container-toolkit
Run WhisperLive TensorRT in docker

docker run -p 9090:9090 --runtime=nvidia --gpus all --entrypoint /bin/bash -it ghcr.io/collabora/whisperlive-tensorrt:latest

Whisper TensorRT Engine

We build small.en and small multilingual TensorRT engine as examples below. The script logs the path of the directory with Whisper TensorRT engine. We need that model_path to run the server.

# convert small.en
bash build_whisper_tensorrt.sh /app/TensorRT-LLM-examples small.en        # float16
bash build_whisper_tensorrt.sh /app/TensorRT-LLM-examples small.en int8   # int8 weight only quantization
bash build_whisper_tensorrt.sh /app/TensorRT-LLM-examples small.en int4   # int4 weight only quantization

# convert small multilingual model
bash build_whisper_tensorrt.sh /app/TensorRT-LLM-examples small

Run WhisperLive Server with TensorRT Backend

# Run English only model
python3 run_server.py --port 9090 \
                      --backend tensorrt \
                      --trt_model_path "/app/TensorRT-LLM-examples/whisper/whisper_small_en_float16"

# Run Multilingual model
python3 run_server.py --port 9090 \
                      --backend tensorrt \
                      --trt_model_path "/app/TensorRT-LLM-examples/whisper/whisper_small_float16" \
                      --trt_multilingual

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TensorRT_whisper.md

TensorRT_whisper.md

WhisperLive-TensorRT

Installation

Whisper TensorRT Engine

Run WhisperLive Server with TensorRT Backend

Files

TensorRT_whisper.md

Latest commit

History

TensorRT_whisper.md

File metadata and controls

WhisperLive-TensorRT

Installation

Whisper TensorRT Engine

Run WhisperLive Server with TensorRT Backend