How to configure max_num_tokens and max_batch_size as runtime params? #2490

JoJoLev · 2024-11-24T03:38:39Z

I noticed that these values can be configured at runtime, does this mean I must define them in the config.pbtxt if running on Triton?
Or should I pass the values as env variables of some sort since I am running this on Sagemaker?

Thanks!

nv-guomingz · 2024-12-17T09:59:25Z

Hi @Tabrizian would u please take a look this question?

Tabrizian · 2024-12-17T14:37:53Z

@JoJoLev For runtime values that are supported in model configuration you need to specify them in Triton's config.pbtxt file. There is documentation for the parameters that are supported here:

https://github.com/triton-inference-server/tensorrtllm_backend/blob/main/docs/model_config.md#common-inputs

hello-11 added Triton Backend question Further information is requested labels Nov 25, 2024

nv-guomingz assigned Tabrizian Dec 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to configure max_num_tokens and max_batch_size as runtime params? #2490

How to configure max_num_tokens and max_batch_size as runtime params? #2490

JoJoLev commented Nov 24, 2024

nv-guomingz commented Dec 17, 2024

Tabrizian commented Dec 17, 2024

How to configure max_num_tokens and max_batch_size as runtime params? #2490

How to configure max_num_tokens and max_batch_size as runtime params? #2490

Comments

JoJoLev commented Nov 24, 2024

nv-guomingz commented Dec 17, 2024

Tabrizian commented Dec 17, 2024