Support for ensemble inference #996
-
Hi all, do you know if there's a way to perform ML inference with 2 models asynchronously and then combine the output of them? For example taking the average of the 2 results. |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 6 replies
-
I know it's recommended to decouple the models in 2 different services but in order to optimise the performances reducing communication overhead, it could be useful to keep both models behind the same endpoint |
Beta Was this translation helpful? Give feedback.
-
Hi @Rasen-wq Great question! I have considered this use case when initially designing BentoML. Besides running computations in parallel, it is also useful for users that need to fetch additional data from feature store or third-party APIs (e.g. a fraud detection service that needs to fetch credit score from another service provider while other pre-processing computation can run in parallel). We did prototype a version with Celery and it worked really well, although it was before we introduced micro-batching and I assume that branch no longer works. I'm not sure if Celery is still the best choice here but I think we do want to expose an API for users to define multiple steps that can run in parallel within an |
Beta Was this translation helpful? Give feedback.
Hi @Rasen-wq
Great question! I have considered this use case when initially designing BentoML. Besides running computations in parallel, it is also useful for users that need to fetch additional data from feature store or third-party APIs (e.g. a fraud detection service that needs to fetch credit score from another service provider while other pre-processing computation can run in parallel).
We did prototype a version with Celery and it worked really well, although it was before we introduced micro-batching and I assume that branch no longer works. I'm not sure if Celery is still the best choice here but I think we do want to expose an API for users to define multiple steps that can run i…