Add support for MLFlow #90

mgoeminne · 2020-01-08T13:03:22Z

In order to improve the usability of FADI when deployed for Machine Learning / Data Science projects, a support for MLFlow should be added.

MLFlow is a relatively recent, open source project from Databricks for storing and managing metrics that relate to ML models. Due to its loose coupling, this tool can be used with a large set of ML libraries.

From the user's point of view, MLFlow is essentially a REST API for submitting quality metrics, plus a Web application for managing them.

Is your feature request related to a problem? Please describe.
No, it's a suggestion for an extension improving the functional coverage of FADI instances.

Describe the solution you'd like
Helm charts should be added to FADI, in order to be able to deploy and exploit an instance of MLFlow.

Describe alternatives you've considered
KubeFlow looks like a "natural" alternative, but it only focuses on the Tensorflow framework, which makes it more specific.

Additional context
N/A

mgoeminne · 2020-01-08T13:14:24Z

@Maher-badri

Sellto · 2020-02-06T16:28:51Z

Back on the integration of MLFlow in FADI.

Tests were carried out with an existing helm chart from MLFlow, but this did not offer certain essential configurations for its integration into FADI. We have improved this chart to meet our requirements. (It is now available in the CETIC helm repository).

The use case that we deployed is the use of MLFlow with the following modules present in FADI: a PostgreSQL database (saving of metrics), Minio (saving of artifacts), jupyterHub (for launching experiments) and OpenLdap (for user management). Several observations can be made:

The integration is functional: It was possible to carry out a simple experiment in jupyterhub, and to recover the metrics and the artifacts.

But some rather negative points deserve to be raised:

MLFlow does not have user management. Can we imagine that this is a security breach?
It is essential to define the S3 credentials in jupyterHub in the form of three environment variables (AWS_SECRET_ACCESS_KEY, AWS_ACCESS_KEY_ID, MLFLOW_S3_ENDPOINT_URL). A question therefore arises: which service communicates with minio? the consumer? the server?

On these findings, should MLFlow be included in FADI? or can it be used simply via the helm MLFlow chart created?

banzo · 2020-02-11T09:01:45Z

MLFlow does not have user management. Can we imagine that this is a security breach?

We could rely on the git/S3 credentials for this I guess.

should MLFlow be included in FADI? or can it be used simply via the helm MLFlow chart created?

After discussion with @mgoeminne, I would say that it makes sense, the need is confirmed. Next steps would be to integrate the chart (default: false) in the fadi chart and provide a userguide.

I am thinking we might want to adopt some kind of "incubator" approach where we have several tiers of support for FADI services.

It is essential to define the S3 credentials in jupyterHub in the form of three environment variables (AWS_SECRET_ACCESS_KEY, AWS_ACCESS_KEY_ID, MLFLOW_S3_ENDPOINT_URL). A question therefore arises: which service communicates with minio? the consumer? the server?

I'd say both should be possible, which one would make more sense/be the simplest to implement?
NB: https://kubernetes.io/docs/concepts/configuration/secret/

Sellto · 2020-03-10T15:21:48Z

MLFlow is now available in FADI.

We are working on a practical usecase that use MLFlow, the result will be a documentation that the FADI users will can use to properly use this new ML tools.

banzo · 2020-05-27T10:16:37Z

Reopening this until we have some basic doc and ideally a full example.

mgoeminne added the enhancement New feature or request label Jan 8, 2020

banzo added this to the 0.1.2 milestone Jan 20, 2020

Sellto closed this as completed Mar 10, 2020

banzo reopened this May 27, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for MLFlow #90

Add support for MLFlow #90

mgoeminne commented Jan 8, 2020

mgoeminne commented Jan 8, 2020

Sellto commented Feb 6, 2020 •

edited

Loading

banzo commented Feb 11, 2020 •

edited

Loading

Sellto commented Mar 10, 2020

banzo commented May 27, 2020

Add support for MLFlow #90

Add support for MLFlow #90

Comments

mgoeminne commented Jan 8, 2020

mgoeminne commented Jan 8, 2020

Sellto commented Feb 6, 2020 • edited Loading

banzo commented Feb 11, 2020 • edited Loading

Sellto commented Mar 10, 2020

banzo commented May 27, 2020

Sellto commented Feb 6, 2020 •

edited

Loading

banzo commented Feb 11, 2020 •

edited

Loading