Add support for custom environment with Jupyter #93

mgoeminne · 2020-01-14T19:03:45Z

Is your feature request related to a problem? Please describe.

No, it's a suggestion for improving the functional coverage of FADI.

Describe the solution you'd like

A data scientist can use Jupyter Hub for iteratively explore data sets and provide technical solutions to various problems.

In order to do so, she frequently has to change the Jupyter environment of her notebooks in order to include some specific package, to test alternative processing frameworks, etc. Typically, each project / use case can have one or many dedicated environments with daily or weekly undergoing changes.

FADI should foster such a dynamic adaptation of the data scientist's needs, by providing a way to efficiently manage extra dependencies.

For instance, a Web application could be provided for specifying, adapting or copying the environment right before instantiating Jupyter Hub. An interesting feature would be the possibility to inherit environments, and to share them among stakeholders.

Describe alternatives you've considered

The current recommanded way to do it is to adapt the Helm view file of the underlying Kubernetes cluster, and to restart the appropriate services. This is not really acceptable for a end user.

An alternative consists in specifying the additional dependencies in "conda install"-like commands at the beginning of the notebooks, but that makes these specifications notebooks-specific. It also implies the additional dependencies must be satisfied each time the notebook is loaded. Environment variables/secrets must be set in the notebooks, which raises securities issues. Etc, etc.

Additional context

Please have a look on how Domino provides this features. Basically, a Docker file can be edited by the finale user for personalizing the environment.

A nice optimization would consist in caching popular / recent / frequently used environments, in such a way running notebooks using these environments would be faster.

alexnuttinck · 2020-01-31T16:05:40Z

Hello @mgoeminne,

Thanks for your feature request!

Do you think that BinderHub could meet your needs?

See the diagram of the BinderHub architecture: https://binderhub.readthedocs.io/en/latest/overview.html#a-diagram-of-the-binderhub-architecture

BinderHub seems to allow a user to create automatically a Jupyter Notebook based on a git repository. BinderHub generates a Docker image based on specifications, requirements made in the git repo.

This video explains very well how BinderHub works: https://www.youtube.com/watch?v=KcC0W5LP9GM

Tell us if you think that it makes senses to add BinderHub to FADI.

mgoeminne · 2020-02-06T13:25:19Z

@alexnuttinck Thank you for your reactivity.I never used BinderHub, but it looks promizing.

However, I fear having to manage the specifications/requirements on Git repository limits the user experience, since she has to manage this repo. On the other hand, managing requirements by using a Git repository is pretty interesting, from the evolution/deployment management point of view.

BinderHub seems to be the perfect fit, since it allows to create environments from configuration files (Docker file, Python requirements, etc.) directly from the Jupyter Hub environment.

If BinderHub was systematically available, I would probably stop to complain & ask you devops guys about adding some weird dependencies to my environments 😄

mgoeminne · 2020-02-20T07:32:22Z

As I understand it, this feature is practically mandatory for using Seldon without having admin access to the Kubernetes cluster.

banzo · 2020-02-28T13:23:25Z

@AyadiAmen

AyadiAmen · 2020-02-28T14:51:15Z

I think it's possible to use the jupyter docker image jupyter/repo2docker with the current jupyterhub in fadi because repo2docker is the tool used by BinderHub to build images on demand.

jupyter-repo2docker is a tool to build, run, and push Docker images from source code repositories.

repo2docker fetches a repository (from GitHub, GitLab, Zenodo, Figshare, Dataverse installations, a Git repository or a local directory) and builds a container image in which the code can be executed. The image build process is based on the configuration files found in the repository.

The repo2docker doc comes with a how to use section, including the How to automatically create a environment.yml that works with repo2docker

alexnuttinck · 2020-06-16T09:23:38Z

https://github.com/cetic/fadi/tree/develop/examples/binderhub doc is available on binderhub on the develop branch, it will be merged soon. Binderhub will remain nevertheless as a "beta" feature.

mgoeminne added the enhancement New feature or request label Jan 14, 2020

mgoeminne assigned banzo Jan 14, 2020

banzo added this to the 0.1.2 milestone Jan 20, 2020

banzo assigned mgoeminne and unassigned banzo Feb 6, 2020

mgoeminne assigned alexnuttinck and unassigned mgoeminne Feb 6, 2020

banzo linked a pull request May 17, 2020 that will close this issue

Documentation/binderhub #112

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for custom environment with Jupyter #93

Add support for custom environment with Jupyter #93

mgoeminne commented Jan 14, 2020

alexnuttinck commented Jan 31, 2020

mgoeminne commented Feb 6, 2020

mgoeminne commented Feb 20, 2020

banzo commented Feb 28, 2020

AyadiAmen commented Feb 28, 2020

alexnuttinck commented Jun 16, 2020

Add support for custom environment with Jupyter #93

Add support for custom environment with Jupyter #93

Comments

mgoeminne commented Jan 14, 2020

alexnuttinck commented Jan 31, 2020

mgoeminne commented Feb 6, 2020

mgoeminne commented Feb 20, 2020

banzo commented Feb 28, 2020

AyadiAmen commented Feb 28, 2020

alexnuttinck commented Jun 16, 2020