batch option doesn't optimize latency , it just trade off latency to throughput isn't it? #3307
Unanswered
KimSoungRyoul
asked this question in
Show and tell
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
when bentoml server (--production) under on same load and same resource, Server that didn't use batch option was always faster.
I want to know that scenario can exist in which latency can be improved when batch option is enabled?
(like a more resource to runner process(
bentoml start-runner-server
) than api-server process(bentoml start-http-server
))if not,
batch option doesn't optimize latency , it just trade off latency to throughput isn't it?
batch: False
Latency : (330 ~480(ms) 95%) throughpu avg: 141
batch: True
max_batch_size: 100
Latency: (3200 ~ 3700(ms) 95%) but throghput is improved (140~174)
Beta Was this translation helpful? Give feedback.
All reactions