Support for ensemble inference #996

elia-secchi · 2020-08-17T18:57:22Z

elia-secchi
Aug 17, 2020

Hi all, do you know if there's a way to perform ML inference with 2 models asynchronously and then combine the output of them? For example taking the average of the 2 results.
I can only think about doing something similar using 2 python threads in the predict function, I don't know if something similar was already implemented using bentoml.

Answered by parano

Aug 17, 2020

Hi @Rasen-wq

Great question! I have considered this use case when initially designing BentoML. Besides running computations in parallel, it is also useful for users that need to fetch additional data from feature store or third-party APIs (e.g. a fraud detection service that needs to fetch credit score from another service provider while other pre-processing computation can run in parallel).

We did prototype a version with Celery and it worked really well, although it was before we introduced micro-batching and I assume that branch no longer works. I'm not sure if Celery is still the best choice here but I think we do want to expose an API for users to define multiple steps that can run i…

View full answer

elia-secchi · 2020-08-17T19:06:37Z

elia-secchi
Aug 17, 2020
Author

I know it's recommended to decouple the models in 2 different services but in order to optimise the performances reducing communication overhead, it could be useful to keep both models behind the same endpoint

4 replies

yubozhao Aug 17, 2020

Hi @Rasen-wq Good question

I think you can find this discussion #928 useful for your use cases.

From your description, I think to bundle both models in the same BentoService would be a good option. Here is what I have in mind

@bentoml.artifacts([PyTorchModelArtifact('model1'), PyTorchModelArtifact('model2')])
class MyPredictionSerivce(BentoService):

      @bentoml.api(input=DataframeInput()).  # You can also expose this as a stand alone API endpoint 
      def predict_one(self, df):
             return self.artifacts.model1.predict(df)

      def predict_two(self, df):
             return self.artifacts.model2.predict(df)

      @bentoml.api(input=DataframeInput(), batch=True)
      def predict(self, df):
            result_from_model_one = self.predict_one(df)
            result_from_model_two = self.predict_two(df)
            # additional processing if required ...
            return {
                  'model_one_result': result_from_model_one, 
                  'model_two_result': result_from_model_two,
            }

elia-secchi Aug 17, 2020
Author

Hi @yubozhao, thanks for your response! I had something similar in mind :) but my understanding is that this way here the execution of predict_two will happen after predict_one. I would like to run the 2 functions in parallel, that's why I was thinking about using 2 parallel threads

yubozhao Aug 17, 2020

Yes, you are right, the predict_two will happen after predict_one.
I am curious, what's the main concern, that drives you to run the two functions in parallel? I can think performance is one, any other?

elia-secchi Aug 17, 2020
Author

Just performances 😀 I imagine the benefit for parallelism increases as the model number in the ensemble increase

parano · 2020-08-17T19:47:17Z

parano
Aug 17, 2020
Maintainer

Hi @Rasen-wq

Great question! I have considered this use case when initially designing BentoML. Besides running computations in parallel, it is also useful for users that need to fetch additional data from feature store or third-party APIs (e.g. a fraud detection service that needs to fetch credit score from another service provider while other pre-processing computation can run in parallel).

We did prototype a version with Celery and it worked really well, although it was before we introduced micro-batching and I assume that branch no longer works. I'm not sure if Celery is still the best choice here but I think we do want to expose an API for users to define multiple steps that can run in parallel within an @api function. This is something we want to eventually support, once we have more dev cycle or someone in the community is interested in contributing.

2 replies

elia-secchi Aug 17, 2020
Author

That would be awesome, thanks so much!

danield137 Sep 29, 2020

@parano , one thing I would note here, bento has a very clean and easy API. brining async / parallel flows will most likely break it. I would consider using python's async capabilities to provide a clean interface (Just a suggestion)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BentoML

Support for ensemble inference #996

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 2 comments 6 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

BentoML

Support for ensemble inference #996

elia-secchi Aug 17, 2020

Replies: 2 comments · 6 replies

elia-secchi Aug 17, 2020 Author

yubozhao Aug 17, 2020

elia-secchi Aug 17, 2020 Author

yubozhao Aug 17, 2020

elia-secchi Aug 17, 2020 Author

parano Aug 17, 2020 Maintainer

elia-secchi Aug 17, 2020 Author

danield137 Sep 29, 2020

elia-secchi
Aug 17, 2020

Replies: 2 comments 6 replies

elia-secchi
Aug 17, 2020
Author

elia-secchi Aug 17, 2020
Author

elia-secchi Aug 17, 2020
Author

parano
Aug 17, 2020
Maintainer

elia-secchi Aug 17, 2020
Author