From 62900aca0fea33c52a74845967cfbf6e004db477 Mon Sep 17 00:00:00 2001 From: Georgios Kafanas Date: Fri, 6 Oct 2023 14:42:41 +0200 Subject: [PATCH 1/8] [Conda] Fix syntax errors in the introduction --- docs/environment/conda.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/environment/conda.md b/docs/environment/conda.md index 21eec6d7..9e867dc9 100644 --- a/docs/environment/conda.md +++ b/docs/environment/conda.md @@ -11,11 +11,11 @@ ## A brief introduction to Conda -A few concepts are necessary to start working with Conda. In brief these are package managers which are the programs used to handle the environment, channels which are the repositories that contain the packages from which environments are composed, and distributions which are methods for shipping package managers. +A few concepts are necessary to start working with Conda. In brief, these are package managers which are the programs used to create and manage environments, channels which are the repositories that contain the packages from which environments are composed, and distributions which are methods for shipping package managers. ### Package managers -Package managers are the programms that install and manage the Conda environments. There are multiple package managers, such as [`conda`](https://docs.conda.io/projects/conda/en/stable/), [`mamba`](https://mamba.readthedocs.io/en/latest/user_guide/mamba.html), and [`micromamba`](https://mamba.readthedocs.io/en/latest/user_guide/micromamba.html). +Package managers are the programs that install and manage the Conda environments. There are multiple package managers, such as [`conda`](https://docs.conda.io/projects/conda/en/stable/), [`mamba`](https://mamba.readthedocs.io/en/latest/user_guide/mamba.html), and [`micromamba`](https://mamba.readthedocs.io/en/latest/user_guide/micromamba.html). !!! important "" The UL HPC centre supports the use of [`micromamba`](https://mamba.readthedocs.io/en/latest/user_guide/micromamba.html) for the creation and management of personal Conda environments. From c13ec01db35c6f19f41ecb03b1d88e5fd529c4d7 Mon Sep 17 00:00:00 2001 From: Georgios Kafanas Date: Fri, 6 Oct 2023 14:43:12 +0200 Subject: [PATCH 2/8] [Conda] Improve the explanation of Conda interface --- docs/environment/conda.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/docs/environment/conda.md b/docs/environment/conda.md index 9e867dc9..d987fcfc 100644 --- a/docs/environment/conda.md +++ b/docs/environment/conda.md @@ -160,15 +160,15 @@ Conda integrates well with any such tool. Some of the most frequent cases are de ### Managing packages with external tools -Quite often a package that is required in an environment is not available in a Conda channel, but it is available though some other distributions, such as the [Python Package Index (PyPI)](https://pypi.org/). In these cases makes sense to create a Conda environment and install the required packages with `pip` from the Python Package Index. +Quite often a package that is required in an environment is not available through a Conda channel, but it is available through some other distribution channel, such as the [Python Package Index (PyPI)](https://pypi.org/). In these cases the only solution is to create a Conda environment and install the required packages with `pip` from the Python Package Index. -Conda installs links to package components in an environment directory. When using the package tool, packages components are installed in the same directory where Conda installs links. Thus, there tools integrate seamlessly with Conda, however there are a couple of caveats: +Using an external packaging tool is possible because of the method that Conda uses to install packages. Conda installs package versions in a central directory (e.g. `~/micromamba/pkgs`). Any environment that requires a package links to the central directory with _hard links_. Links are added to the home directory (e.g. `~/micromamba/envs/R-project` for the `R-project` environment) of any environment that requires them. When using an external package tool, package components are installed in the same directory where Conda would install the corresponding link. Thus, external package management tools integrate seamlessly with Conda, with a couple of caveats: - each package must be managed by one tool, otherwise package components will get overwritten, and - packages installed by the package tool are specific to an environment and cannot be shared as with Conda, since components are installed directly and not with links. -!!! important "" - Installing the same package in multiple environments with an external package tool consumes quotas in terms of [storage space and number of files](../../filesystems/quotas/#current-usage), so prefer Conda when possible. +!!! important "Prefer Conda over external package managers" + Installing the same package in multiple environments with an external package tool consumes quotas in terms of [storage space and number of files](../../filesystems/quotas/#current-usage), so prefer Conda when possible. This is particularly important for the `inode` limit, since some packages install a large number of files, and the hard links used by Conda do not consume inodes. #### Pip From c3127901d865997da18b6ecf508c005e5a1ea03f Mon Sep 17 00:00:00 2001 From: Georgios Kafanas Date: Fri, 6 Oct 2023 15:03:17 +0200 Subject: [PATCH 3/8] [Conda] Add link to Conda in the environment overview - Explain the limitations of using Conda in the HPC environment. - Provide a process that the users can follow to try avoiding using space/inodes. --- docs/environment/conda.md | 7 ++++++- docs/environment/index.md | 10 ++++++++++ 2 files changed, 16 insertions(+), 1 deletion(-) diff --git a/docs/environment/conda.md b/docs/environment/conda.md index d987fcfc..f28bb66f 100644 --- a/docs/environment/conda.md +++ b/docs/environment/conda.md @@ -1,10 +1,15 @@ # Self management of Conda work environments in UL HPC facilities !!! important "" - **TL;DR:** install and use the [Micromamba package manager](#the-micromamba-package-manager). + **TL;DR:** install and use the [Micromamba package manager](conda.md#the-micromamba-package-manager). +In some cases rare or rarely used packages are not available through the standard options such as [modules](modules.md). In such cases it may make sense to install the package locally with an environment manager such as Conda. + +!!! warning "Contact the ULHPC before installing any software with Conda" + Installing packages locally with Conda consumes quotas in terms of [storage space and number of files](../../filesystems/quotas/#current-usage), in your or your project's account. Contact the ULHPC High Level Support Team in the [service portal](https://service.uni.lu/sp?id=index) [Home > Research > HPC > Software environment > Request expertise] to discuss possible options before installing any software. + [Conda](https://docs.conda.io/en/latest/) is an open source environment and package management system. With Conda you can create independent environments, where you can install applications such as python and R, together with any packages which will be used by these applications. The environments are independent with the Conda package manager resolving dependencies and ensuring that packages used in multiple environments are stored only once. In a typical setting, each user has their own installation of a Conda and a set of personal environments. diff --git a/docs/environment/index.md b/docs/environment/index.md index e0f23ca1..b93e15a2 100644 --- a/docs/environment/index.md +++ b/docs/environment/index.md @@ -186,3 +186,13 @@ See [ULHPC Tutorial / Getting Started](https://ulhpc-tutorials.readthedocs.io/en [:fontawesome-solid-sign-in-alt: ULHPC Environment modules](modules.md){: .md-button .md-button--link } [:fontawesome-solid-sign-in-alt: Using Easybuild on ULHPC Clusters](../environment/easybuild.md){: .md-button .md-button--link } + +## Self management of work environments in ULHPC with Conda + +{% + include-markdown "conda.md" + start="" + end="" +%} + +[:fontawesome-solid-sign-in-alt: Management of work environments with Conda](conda.md){: .md-button .md-button--link } From 6524062849df6c74af1bcf07c528a4dca511a011 Mon Sep 17 00:00:00 2001 From: Georgios Kafanas Date: Thu, 12 Oct 2023 13:37:55 +0200 Subject: [PATCH 4/8] [Conda] Add link to section about storage usage --- docs/environment/conda.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/environment/conda.md b/docs/environment/conda.md index f28bb66f..7e2e6e04 100644 --- a/docs/environment/conda.md +++ b/docs/environment/conda.md @@ -173,7 +173,7 @@ Using an external packaging tool is possible because of the method that Conda us - packages installed by the package tool are specific to an environment and cannot be shared as with Conda, since components are installed directly and not with links. !!! important "Prefer Conda over external package managers" - Installing the same package in multiple environments with an external package tool consumes quotas in terms of [storage space and number of files](../../filesystems/quotas/#current-usage), so prefer Conda when possible. This is particularly important for the `inode` limit, since some packages install a large number of files, and the hard links used by Conda do not consume inodes. + Installing the same package in multiple environments with an external package tool consumes quotas in terms of [storage space and number of files](../../filesystems/quotas/#current-usage), so prefer Conda when possible. This is particularly important for the `inode` limit, since some packages install a large number of files, and the hard links used by Conda do not consume inodes or [disk space](https://saturncloud.io/blog/understanding-conda-clean-where-does-it-remove-packages-from/). #### Pip @@ -214,7 +214,7 @@ and install the required packages with `pip` ```bash pip install --upgrade mkdocs mkdocs-minify-plugin ``` -inside the environment. The packages will be installed inside a directory that `micromamba` created for the Conda enviroment, for instance +inside the environment. The packages will be installed inside a directory that `micromamba` created for the Conda environment, for instance ``` ${HOME}/micromamba/envs/mkdocs ``` From 4233ce284d06c8e3aa80e93f5025c156b5edb5c7 Mon Sep 17 00:00:00 2001 From: Georgios Kafanas Date: Thu, 12 Oct 2023 14:54:25 +0200 Subject: [PATCH 5/8] [Conda] Warn that Conda binaries are not optimized --- docs/environment/conda.md | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/docs/environment/conda.md b/docs/environment/conda.md index 7e2e6e04..4e645fd1 100644 --- a/docs/environment/conda.md +++ b/docs/environment/conda.md @@ -8,7 +8,9 @@ In some cases rare or rarely used packages are not available through the standard options such as [modules](modules.md). In such cases it may make sense to install the package locally with an environment manager such as Conda. !!! warning "Contact the ULHPC before installing any software with Conda" - Installing packages locally with Conda consumes quotas in terms of [storage space and number of files](../../filesystems/quotas/#current-usage), in your or your project's account. Contact the ULHPC High Level Support Team in the [service portal](https://service.uni.lu/sp?id=index) [Home > Research > HPC > Software environment > Request expertise] to discuss possible options before installing any software. + Conda installs binaries that may not be optimal for the configuration of the ULHPC clusters. Prefer binaries provided through [modules](modules.md) or [containers](../../containers/), as these have been compiled with better optimized options for out clusters. Furthermore, installing packages locally with Conda consumes quotas in terms of [storage space and number of files](../../filesystems/quotas/#current-usage), in your or your project's account. + + Contact the ULHPC High Level Support Team in the [service portal](https://service.uni.lu/sp?id=index) [Home > Research > HPC > Software environment > Request expertise] to discuss possible options before installing any software. [Conda](https://docs.conda.io/en/latest/) is an open source environment and package management system. With Conda you can create independent environments, where you can install applications such as python and R, together with any packages which will be used by these applications. The environments are independent with the Conda package manager resolving dependencies and ensuring that packages used in multiple environments are stored only once. In a typical setting, each user has their own installation of a Conda and a set of personal environments. From 8e922acb23faf2e87d90b0c1ccd7acf753b967b5 Mon Sep 17 00:00:00 2001 From: Georgios Kafanas Date: Thu, 12 Oct 2023 14:57:55 +0200 Subject: [PATCH 6/8] [Conda] Better explain the operations of sub-environment management --- docs/environment/conda.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/environment/conda.md b/docs/environment/conda.md index 4e645fd1..3f1cdbe3 100644 --- a/docs/environment/conda.md +++ b/docs/environment/conda.md @@ -227,9 +227,9 @@ along side packages installed by `micromamba`. As a results, 'system-wide' insta ### Combining Conda with external environment management tools -Quite often it is required to create isolated environments using external tools. For instance, tools such as [`virtualenv`](https://virtualenv.pypa.io/en/latest/) can install and manage a Python distribution in a given directory and export and import environment descriptions from text files. This functionalities allows for instance the shipping of a description the Python environment as part of a project. Higher level tools such as [`pipenv`](https://pipenv.pypa.io/en/latest) automate the process of managing Python project environments whereas [`poetry`](https://python-poetry.org/) is a wholistic project management tool with integrated management of Python environments. +Quite often it is required to create isolated environments using external tools. For instance, tools such as [`virtualenv`](https://virtualenv.pypa.io/en/latest/) can install and manage a Python distribution in a given directory and export and import environment descriptions from text files. This functionalities allows for instance the shipping of a description of the Python environment as part of a project. Higher level tools such as [`pipenv`](https://pipenv.pypa.io/en/latest) automate the process by managing the Python environment as part of a project directory. The description of the environment is stored in version controlled files, and the Python packages are stored in a non-tracked directory within the project directory. Some wholistic project management tools, such as [`poetry`](https://python-poetry.org/), further integrate the management of the Python environment withing the project management workflow. -Installing and using in Conda environments tools that create isolated environments is relatively straight forward. Create an environment where only the required that tool is installed, and manage any project subenvironments using the installed tool. +Installing and using in Conda environments tools that create isolated environments is relatively straight forward. Create an environment where only the required that tool is installed, and manage any subenvironments using the installed tool. !!! important "Create a different environment for each tool" While this is not a requirement it is a good practice. For instance, `pipenv` and `poetry` used to and may still have conflicting dependencies; Conda detects the dependency and aborts the conflicting installation. From ef6e4017346b0417d82f61de204970f52a7eaed2 Mon Sep 17 00:00:00 2001 From: Georgios Kafanas Date: Thu, 12 Oct 2023 14:59:07 +0200 Subject: [PATCH 7/8] [Conda] Add section on package data cleanup --- docs/environment/conda.md | 29 +++++++++++++++++++++++++++++ 1 file changed, 29 insertions(+) diff --git a/docs/environment/conda.md b/docs/environment/conda.md index 3f1cdbe3..fc96bc05 100644 --- a/docs/environment/conda.md +++ b/docs/environment/conda.md @@ -159,6 +159,35 @@ _Useful scripting resources_ - [Formatting submission scripts for R (and other systems)](../slurm/launchers.md#serial-task-script-launcher) +### Cleaning up package data + +The Conda environment managers download and store a sizable amount of data to provided packages to the various environments. Even though the package data are shared between the various environments, they still consume space in your or your project's account. There are [limits in the storage space and number of files](../../filesystems/quotas/#current-usage) that are available to projects and users in the cluster. Since Conda packages are self managed, **you need to clear unused data yourself**. + +There are two main sources of unused data, the compressed archives of the packages that Conda stores in its cache when downloading a package, and the data of removed packages. All unused data in Micromoamba can be removed with the command +```bash +micromamba clean --all +``` +that opens up an interactive dialogue with details about the operations performed. You can follow the default option, unless you have manually edited any files in you package data directory (default location `${HOME}/micromamba`). + +??? info "Updating environments to remove old package versions" + As we create new environments, we often install the latest version of each package. However, if the environments are not updated regularly, we may end up with different versions of the same package across multiple environments. If we have the same version of a package installed in all environments, we can save space by removing unused older versions. + + To update a package across all environments, use the command + ```bash + for e in $(micromamba env list | awk 'FNR>2 {print $1}'); do micromamba update --name $e ; done + ``` + and to update all packages across all environments + ```bash + for e in $(micromamba env list | awk 'FNR>2 {print $1}'); do micromamba update --name $e --all; done + ``` + where `FNR>2` removes the headers in the output of `micromamba env list`, and is thus sensitive to changes in the user interface of Micromamba. + + After updating packages, the `clean` command can be called to removed the data of unused older package versions. + +_Sources_ + +- [Understanding Conda `clean`](https://saturncloud.io/blog/understanding-conda-clean-where-does-it-remove-packages-from/) + ## Combining Conda with other package and environment management tools It may be desirable to use Conda to manage environments but a different tool to manage packages, such as [`pip`](https://pip.pypa.io/en/stable/getting-started/). Or subenvironments may need to be used inside a Conda environment, as for instance with tools for creating and managing isolated Python installation, such as [`virtualenv`](https://virtualenv.pypa.io/en/latest/), or with tools for integrating managed Python installations and packages in project directories, such as [Pipenv](https://pipenv.pypa.io/en/latest) and [Poetry](https://python-poetry.org/). From 916b3e30350d30e4f80c48633054acca645e0cb4 Mon Sep 17 00:00:00 2001 From: Georgios Kafanas Date: Thu, 12 Oct 2023 14:59:38 +0200 Subject: [PATCH 8/8] [Conda] Add section on combining Conda with Julia --- docs/environment/conda.md | 71 +++++++++++++++++++++++++++++++++++++++ 1 file changed, 71 insertions(+) diff --git a/docs/environment/conda.md b/docs/environment/conda.md index fc96bc05..8065bc7e 100644 --- a/docs/environment/conda.md +++ b/docs/environment/conda.md @@ -254,6 +254,77 @@ along side packages installed by `micromamba`. As a results, 'system-wide' insta !!! warning "Do not install packages in Conda environments with pip as a user" User installed packages (e.g.`pip install --user --upgrade mkdocs-minify-plugin`) are installed in the same directory for all environments, typically in `~/.local/`, and can interfere with other versions of the same package installed from other Conda environments. +#### Pkg + +The Julia programming language comes with, Pkg, its own package manager. The package manage of Julia provides many useful capabilities and it is recommended that Pkg is used with Julia projects. Please read the [documentation](https://pkgdocs.julialang.org/v1/) for more details. + +The Pkg package manage comes packages with Julia. Start by creating an environment, +```bash +mocromamba env create --name Julia +``` +activate the environment, +```bash +micromamba activate julia +``` +and install Julia, +```bash +micromamba install --channel conda-forge julia +``` +to start using Pkg. + +In order to install a package, activate the Julia environment, and start an interactive REPL session +```bash +$ julia +julia> +``` +with the `julia` command. + +- Enter the Pkg package manager by pressing `]`. +- Exit the package manager by clearing all the input from the line with backspace, and then pressing backspace one more time. + +In the package manager you can see the status of the current environment, +```julia +(@julia) pkg> status +Status `~/micromamba/envs/julia/share/julia/environments/julia/Project.toml` (empty project) +``` +add and remove packages, +```julia +(@julia) pkg> add Example +(@julia) pkg> remove Example +``` +update the environment, +```julia +(@julia) pkg> update +``` +and perform many other operations such as exporting and importing environments from plain text files describing the environment setup, and pinning packages to specific versions. The Pkg package manager maintains a global environment, but also supports the creation of local environments that can be stored within a project directory. The use of local environments is highly recommended, please read the [documentation](https://pkgdocs.julialang.org/v1/environments/) for more information. + +After installing the Julia language in a Conda environment, the language distribution itself should be managed with `micromamba` and all packages with the Pkg package manager. To update Julia activate the Conda environment where Julia is stored and call +```bash +micromamba update julia +``` +where as to update packages installed with Pgk use the `update` command of Pkg. The packages for local and global environments are stored in the Julia installation directory, typically in +``` +${HOME}/micromamba/envs/julia/share +``` +which is the default location. + +??? info "Advanced management of package data" + Julia packages will consume [storage and number of files quota](../../filesystems/quotas/#current-usage). Pkg uses automatic garbage collection to cleanup packages that are no longer is use. In general you don't need to manage then package data, simply remove the package and its data will be deleted automatically after some time. However, when you exceed your quota you need to delete files immediately. + + The _immediate removal_ of the data of uninstalled packages can be forced with the command: + ```julia + using Pkg + using Dates + Pkg.gc(;collect_delay=Dates.Day(0)) + ``` + Make sure that the packages have been removed from all the environments that use them + + _Sources_: [Immediate package data clean up](https://discourse.julialang.org/t/packages-clean-up-general-julia-data-consumption/56198) + +_Useful resources_ + +- [Pkg documentation](https://pkgdocs.julialang.org/v1/) + ### Combining Conda with external environment management tools Quite often it is required to create isolated environments using external tools. For instance, tools such as [`virtualenv`](https://virtualenv.pypa.io/en/latest/) can install and manage a Python distribution in a given directory and export and import environment descriptions from text files. This functionalities allows for instance the shipping of a description of the Python environment as part of a project. Higher level tools such as [`pipenv`](https://pipenv.pypa.io/en/latest) automate the process by managing the Python environment as part of a project directory. The description of the environment is stored in version controlled files, and the Python packages are stored in a non-tracked directory within the project directory. Some wholistic project management tools, such as [`poetry`](https://python-poetry.org/), further integrate the management of the Python environment withing the project management workflow.