Skip to content
This repository has been archived by the owner on Feb 12, 2024. It is now read-only.

Add support for MLFlow #90

Open
mgoeminne opened this issue Jan 8, 2020 · 5 comments
Open

Add support for MLFlow #90

mgoeminne opened this issue Jan 8, 2020 · 5 comments
Labels
enhancement New feature or request
Milestone

Comments

@mgoeminne
Copy link

In order to improve the usability of FADI when deployed for Machine Learning / Data Science projects, a support for MLFlow should be added.

MLFlow is a relatively recent, open source project from Databricks for storing and managing metrics that relate to ML models. Due to its loose coupling, this tool can be used with a large set of ML libraries.

From the user's point of view, MLFlow is essentially a REST API for submitting quality metrics, plus a Web application for managing them.

Is your feature request related to a problem? Please describe.
No, it's a suggestion for an extension improving the functional coverage of FADI instances.

Describe the solution you'd like
Helm charts should be added to FADI, in order to be able to deploy and exploit an instance of MLFlow.

Describe alternatives you've considered
KubeFlow looks like a "natural" alternative, but it only focuses on the Tensorflow framework, which makes it more specific.

Additional context
N/A

@mgoeminne mgoeminne added the enhancement New feature or request label Jan 8, 2020
@mgoeminne
Copy link
Author

@Maher-badri

@banzo banzo added this to the 0.1.2 milestone Jan 20, 2020
@Sellto
Copy link
Contributor

Sellto commented Feb 6, 2020

Back on the integration of MLFlow in FADI.

Tests were carried out with an existing helm chart from MLFlow, but this did not offer certain essential configurations for its integration into FADI. We have improved this chart to meet our requirements. (It is now available in the CETIC helm repository).

The use case that we deployed is the use of MLFlow with the following modules present in FADI: a PostgreSQL database (saving of metrics), Minio (saving of artifacts), jupyterHub (for launching experiments) and OpenLdap (for user management). Several observations can be made:

  • The integration is functional: It was possible to carry out a simple experiment in jupyterhub, and to recover the metrics and the artifacts.

But some rather negative points deserve to be raised:

  • MLFlow does not have user management. Can we imagine that this is a security breach?
  • It is essential to define the S3 credentials in jupyterHub in the form of three environment variables (AWS_SECRET_ACCESS_KEY, AWS_ACCESS_KEY_ID, MLFLOW_S3_ENDPOINT_URL). A question therefore arises: which service communicates with minio? the consumer? the server?

On these findings, should MLFlow be included in FADI? or can it be used simply via the helm MLFlow chart created?

@banzo
Copy link
Contributor

banzo commented Feb 11, 2020

MLFlow does not have user management. Can we imagine that this is a security breach?

We could rely on the git/S3 credentials for this I guess.

should MLFlow be included in FADI? or can it be used simply via the helm MLFlow chart created?

After discussion with @mgoeminne, I would say that it makes sense, the need is confirmed. Next steps would be to integrate the chart (default: false) in the fadi chart and provide a userguide.

I am thinking we might want to adopt some kind of "incubator" approach where we have several tiers of support for FADI services.

It is essential to define the S3 credentials in jupyterHub in the form of three environment variables (AWS_SECRET_ACCESS_KEY, AWS_ACCESS_KEY_ID, MLFLOW_S3_ENDPOINT_URL). A question therefore arises: which service communicates with minio? the consumer? the server?

I'd say both should be possible, which one would make more sense/be the simplest to implement?
NB: https://kubernetes.io/docs/concepts/configuration/secret/

@Sellto
Copy link
Contributor

Sellto commented Mar 10, 2020

MLFlow is now available in FADI.

We are working on a practical usecase that use MLFlow, the result will be a documentation that the FADI users will can use to properly use this new ML tools.

@Sellto Sellto closed this as completed Mar 10, 2020
@banzo banzo reopened this May 27, 2020
@banzo
Copy link
Contributor

banzo commented May 27, 2020

Reopening this until we have some basic doc and ideally a full example.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants