-
The integration of gRPC communication in BentoML improves the efficiency of tensor transmission after microservices deployment. However, I recently encountered an issue where dynamic GPU resource scheduling cannot be performed after containerized deployment. The current environment consists of four GPUs, each deploying a model, and each model has added business logic processing and been transformed into a microservice. However, the model loaded on the third GPU is too large, causing |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 2 replies
-
Dynamic GPU resource scheduling mentioned in the discussion is not supported BentoML. BentoML currently let's user to directly control GPU device scheduling. |
Beta Was this translation helpful? Give feedback.
Will the Bento framework have research and implementations in this area in the future?