How to deploy Llava in triton? #913

forrestjgq · 2024-01-19T01:07:48Z

Hello:

Glad to see that Llava is supported now. We're trying to deploy it in triton, how to do that?

byshiue · 2024-01-19T08:18:53Z

You could refer the document of Triton backend https://github.com/triton-inference-server/tensorrtllm_backend/blob/main/docs/llama.md and replace the engine and tokenizer by Llava if you already have a Llava engine.

isaac-vidas · 2024-01-31T01:29:18Z

Does tensorrt-llm-backend have support for multimodal? Is there an example for passing prompt and and an image through a request?

byshiue · 2024-02-19T03:17:55Z

There is no such example now.

DefTruth · 2024-03-19T08:39:55Z

same question. need some docs about how to deploy multimodal model (such as LLaVA) via triton server tensorrtllm_backend.

Iven2132 · 2024-04-18T14:13:57Z

@DefTruth Did you figure it out? I'm looking for the same

JeremySun1224 · 2024-12-20T08:55:34Z

There is no such example now.

byshiue self-assigned this Jan 19, 2024

byshiue added question Further information is requested triaged Issue has been triaged by maintainers labels Jan 19, 2024

Provide feedback