Tutoriel Guix - Compas 2023
Tutorial

Table of Contents

1. Foreword, goals and motivations

Reproducibility of a research study in computer science has always been a complex matter. One of the biggest challenges is to recreate the same software environment. The latter is often built manually or using modules [1], especially on high-performance computing (HPC) platforms. The main issue with this approach is that modules and building instructions are likely to vary from one machine to another depending on system configuration and available software. Some package managers such as Spack [2] allow users to define their own package variants and leave them the choice of the configuration, the dependencies or the compiler. However, as the package manager still depends on compilers and other components provided by the underlying system, the reproducibility of software environments remains threatened. From this point of view, the container solutions such us Singularity [3] or Docker [4] are more robust but they do not make updating or management of multiple environment variants very easy. For example, if we want to use a different version of one or more packages, we either have to modify the container interactively, which would make it even less reproducible, or re-build it, which can take lot of time if performed regularly. Moreover, because the container is often based on an existing Linux distribution, we are limited to the versions of various core packages (compilers, MPI 1, BLAS 2, …) provided by that particular distribution in that particular release unless we want to manually re-build and re-configure a good chunk of the software environment.

Our goal is to cope with these limitations, take total control over our software environments so as to be able to adapt and reproduce them easily. To this end, we propose to explore the usage of Guix [5], [6]. In this tutorial, after a short tour of Guix, Nix and other possible solutions, we will start from an existing experimental research study, the software environment of which can be recreated either manually or by the means of a pre-built Docker container. In the rest of the document, we refer to this version of the study as to the Reference study.

The participants will learn the basics of Guix and be able to observe its advantages over the aforementioned approaches. Then, we will make use of Guix to improve the reproducibility of this research study.

At the end of the session, the participants should have built a standalone git repository containing the same research study but managing the software environment with Guix. The ultimate goal is to reproduce the study from A to Z by redoing all the benchmarks, post-processing the results and finally producing the associated article in PDF. In the rest of this document, we refer to this version of the study as to the Study using Guix.

The rest of the document is organized as follows. In Section 2, we describe the workspace which will be used during the hands-on session the instructions for which are given in Section 3. Finally, we provide some additional useful pointers in Section 4.

2. Workspace

For the needs of this session, a dedicated project group, namely Tutoriel Guix - Compas 2023, has been created on the GitLab of Inria with the following structure:

Tutoriel Guix - Compas 2023
  │
  ├── test_FEMBEM
  │   ├── Reference study
  │   └── Study using Guix
  ├── Tutorial
  ├── Résumé
  └── Slides

The test_FEMBEM subgroup contains two versions of the same resarch study based on the open-source edition of the test_FEMBEM solver [7]:

  1. Reference study, the software environment of which is not managed with Guix and
  2. Study using Guix, the software environment of which is managed with Guix.

The Tutorial, Résumé and Slides repositories contain the sources and the Guix environment specifications corresponding to the present document, the introductory presentation and the abstract of the tutorial, respectively. We do not report further on these repositories.

2.1. Reference study

This repository contains an experimental study featuring the test_FEMBEM solver with the following structure:

Reference study
  │
  ├── benchmarks
  │   ├── definitions.csv
  │   └── run.sh
  ├── figures
  │   └── short-pipe.png
  ├── public
  │   └── .gitignore
  ├── .gitignore
  ├── .gitlab-ci.yml
  ├── Dockerfile
  ├── README.md
  ├── plot.R
  ├── references.bib
  └── study.tex

In this case, we do not rely on Guix to manage the software environment of the research study. One can either rely on the combination of native system package manager and manual builds in order to create a software environment close to the original or use the accompanying pre-built Docker container defined in Dockerfile, as detailed in README.md.

The README.md file further provides guidelines for redoing the experiments defined in definitions.csv using the dedicated run.sh shell script, post-processing the results by the means of the plot.R R script the output of which is placed into the figures folder and producing the PDF of the study manuscript based on its LaTeX source study.tex and the associated bibliography file references.bib.

Note that the public folder is used by the continuous integration engine for publishing repository's static webpage hosted on GitLab pages.

2.2. Study using Guix

This repository contains the same research study as the Reference study repository. However, here we do rely on Guix to manage the software environment of the study. See the structure of the repository below:

Study using Guix
  │
  ├── .guix
  │   ├── channels.scm
  │   └── manifests
  │       ├── benchmarks-openblas.scm
  │       ├── benchmarks-mkl.scm
  │       └── post-processing.scm
  ├── benchmarks
  │   ├── definitions.csv
  │   └── run.sh
  ├── figures
  │   └── short-pipe.png
  ├── public
  │   └── .gitignore
  ├── .gitignore
  ├── .gitlab-ci.yml
  ├── README.md
  ├── plot.R
  ├── references.bib
  └── study.tex

Compared to Reference study, this repository contains some extra files, i.e. in the .guix repository. channels.scm and the files in .guix/manifests represent the specification of the Guix software environment of the study (see Section 3.3). All the other files remain the same as in Reference study.

The second part of the hands-on session will be based on this repository. The master branch contains the complete configuration we should have built by the end of the session. The level0 branch represents the starting point for the participants to be completed during the session. For those joining us later or wanting to skip one or more phases, there are the other levelX branches corresponding to different levels of completion of the tutorial.

3. Hands-on session

In the first place, we will put the Study using Guix study repository aside and familiarize ourselves with Guix.

3.1. Installing Guix

If plan to use Guix on the PlaFRIM cluster, connect over secure shell (SSH) with:

ssh -Y NAME@plafrim -J NAME@formation.plafrim.fr

… where NAME is your login name, typically compas-LASTNAME.

Now you can skip this section and jump directly to Section 3.2.

Here, we assume that we are running a third-party Linux distribution such as Debian, Fedora or Manjaro. We can install the Guix package manager on top of that distribution without interferring with our primary package manager. To do so, we use the official installation shell script that needs to be run with superuser privileges.

cd /tmp
wget https://git.savannah.gnu.org/cgit/guix.git/plain/etc/guix-install.sh
chmod +x guix-install.sh
sudo ./guix-install.sh

Then, we just need to follow on-screen instructions.

3.2. Running Guix for the first time

If you’re using the PlaFRIM cluster, you only need to run two things:

guix build hello
guix pull

… and you can skip the remainder of this section.

After the installation, we proceed with a short sequence of commands to ensure a smooth user experience with Guix onward. At the beginning, we install our first package using Guix, i.e. glibc-locales to allow the system to switch locales.

guix install glibc-locales

Then, to be able to acquire new versions of installed packages, we will need to pull the latest version of Guix first. The following command can take a while to execute, especially when run for the first time.

guix pull

Once the process finishes, we need to follow the hint the command gives us and add the following lines to our .bash_profile or .bashrc to always get access to the most recent Guix built by guix pull.

GUIX_PROFILE="$HOME/.config/guix/current"
. "$GUIX_PROFILE/etc/profile"

We also have to tell to our shell to use this new Guix.

hash guix

Finally, we can update our installed packages.

guix upgrade

To get information on the generation (revision in Guix terminology) of Guix being used, we can use:

guix describe

3.3. Familiarization with Guix

Let us enter our first Guix environment containing two packages, bash and cowsay, using the guix shell command and launch a shell inside of that environment. Here, we use the --container or -C switch to span the new environment within an isolated container. By default, we don't have access to host filesystem (except for the current working directory), to host network or environment variables. See guix shell --help for more details.

guix shell --container bash cowsay -- bash

To test the cowsay package within the Guix shell, try to type cowsay "Hello world!", for example.

On some systems, using --container fails with an error along these lines:

$ guix shell --container coreutils
guix shell: error: clone: 2114060305: Invalid argument

This indicates that the system lacks support for Linux’s unprivileged user namespaces. Worry not: you can fall back to --pure, which is weaker, but still gives good control over the environment.

We can simply type exit to get back to our original shell. Also, we do not have to run an interactive shell inside of the environment. We can directly execute a given command like for example:

guix shell --container cowsay -- cowsay "Hello world!"

Note that we did not include bash this time, we did not need it. The above command line should give us the following output:

 ______________ 
< Hello world! >
 -------------- 
        \   ^__^
         \  (oo)\_______
            (__)\       )\/\
                ||----w |
                ||     ||

3.3.1. Manifests

The guix shell command seems very convenient. However, let us imagine that we do not need only two but 26 packages in our environment. The command line would become quiet long, right? The good news is that we can instead put our list of packages into a file, referred to as manifest, then use the --manifest or -m option to pass the manifest to our guix shell command line.

Manifest files [8] use the Scheme language [9] syntax which can be intimidating in the beginning. Fortunately, guix shell has recently got the --export-manifest option allowing one to automatically generate the manifest file corresponding to the environment specified on the command line. Let us thus create the manifest corresponding to our latest single-package environment and save it to a Scheme file named cowsay.scm.

guix shell --export-manifest cowsay > cowsay.scm

Our manifest should look like this. Not so scary in the end, is it?

(specifications->manifest
  (list "cowsay"))

Finally, we can enter the target environment using the manifest file and retry a cowsay command.

guix shell --pure -m cowsay.scm -- cowsay "Hello from the manifest!"

3.3.2. Channels

Note that, by default, guix shell considers the latest versions of the specified packages available in our current revision of Guix. Well, maybe we are fine with that at this point but what happens if we want to enter the exact same environment a couple of weeks, months or years later? Maybe the packages will not be even available anymore, or at least, not in the same version.

Software packages in Guix are provided through dedicated git repositories called channels [10]. The official Guix channel guix, automatically set up in our Guix installation, currently provides 21,921 packages (status on June 30, 2023 at 3:35PM) [11]. However, many other channels are available, e.g. for scientific HPC software and so on. We will discuss the usage of multiple channels later. For the moment, let us focus on the default guix channel.

To ensure the same revision of Guix providing the same packages in the same versions, we can accompany our manifest with a channel file, also written in Scheme. In the latter we can specify the channel or channels to use together with the desired revision number, i.e. commit. We can obtain the currently used commit of the guix channel (and other channels, if any) by typing guix describe. The output should look like follows, modulo the language, date and time ;-).

Pokolenie 4	22. juin 2023 11:50:18	(súčasné)
  guix 7f3c6d3
    zdroj repozitára: https://git.savannah.gnu.org/git/guix.git
    vetva: master
    úprava: 7f3c6d3b3ba86a8051e394e4ec9a6f6089753cb1

Using the above information, we can create the corresponding channel file my-channels.scm.

(list
 (channel
  (name 'guix)
  (url "https://git.savannah.gnu.org/git/guix.git")
  (commit "7f3c6d3b3ba86a8051e394e4ec9a6f6089753cb1")))

Note that, we can directly obtain the list of channels currently used by the system in Scheme by typing guix describe -f channels. It gives us a handy starting point for building our own channel file, i.e. by changing commit numbers, branches or by adding or removing channels to or from the list.

Finally, to execute the guix shell command using our channel file, we can use the guix time-machine command with the --channels or -C switch.

guix time-machine -C my-channels.scm -- shell --container \
     -m cowsay.scm -- cowsay "Great, a channel and a manifest file!"

3.4. Building a reproducible study

We are now ready for the core part of the hands-on session. We are going to improve the reproducibility of the research study from the Reference study thanks to Guix.

Before we begin, we need to clone the Study using Guix repository we are going to work with. It already contains all of the files required for it to work but some of them need to be completed. Note that, if you want to keep a copy of the repository with all you'll have done during the session, feel free to fork it first!

git clone https://gitlab.inria.fr/tutoriel-guix-compas-2023/test_fembem/study-using-guix.git

Navigate to the root of the freshly cloned repository

cd study-using-guix

and depending on where in the hands-on session you want to join, checkout the right branch to start from:

  • git checkout level0: the channel definition (the very beginning),
  • git checkout level1: the manifest definition,
  • git checkout level2: the reproduction of the study in a Guix environment.

3.4.1. Channels

The study relies on the test_FEMBEM solver suite which in turn depends on packages that are not available through the official guix channel. We will thus need two extra channels.

Fill the Scheme file .guix/channels.scm with the follwing list of channels while respecting the syntax seen in Section 3.3.2.

  1. guix, the official Guix channel
    • link: https://git.savannah.gnu.org/git/guix.git
    • commit: 7f3c6d3b3ba86a8051e394e4ec9a6f6089753cb1
  2. guix-hpc, the channel of the GuixHPC effort providing some commonly used HPC applications, e.g. solvers, runtimes, …
    • link: https://gitlab.inria.fr/guix-hpc/guix-hpc.git
    • commit: 356a3b4200a5c7a15cf85fc0e8680b74444c689d
  3. guix-hpc-non-free, a companion channel to guix-hpc providing non-free libraries (MKL, CUDA, …) and non-free versions of some of the packages provided by guix-hpc (PaStiX with MKL, …)
    • link: https://gitlab.inria.fr/guix-hpc/guix-hpc-non-free.git
    • commit: e9b113ddadc69fd21026777f5f893ba8ae3183aa

3.4.2. Manifests

There are some packages dedicated exclusively to the execution of benchmarks and some other dedicated to the post-proccesing of results and the publication of the study manuscript. We will thus have more than one manifest files in this study. For now, it will be .guix/manifests/benchmarks-openblas.scm and .guix/manifests/post-processing.scm. The manifest for post-processing results has been pre-filled. Therefore, we will focus only on the most important, the software environment for running benchmarks.

At first, we will need to compose the guix shell command allowing us to enter the correct environment. We begin by verifying whether a test_FEMBEM package is provided by one of the channels we specified earlier. For this, we can use the guix search command.

guix time-machine -C .guix/channels.scm -- search "test_FEMBEM"

In addition to this core package, we will need the following packages as well: openmpi, openssh, sed, which, grep, coreutils and bash. We can observe that the list of requested packages here is substantially shorter than the one in the README.md file in the Reference study repository. This is because we do not need to include the packages required to build test_FEMBEM, the package will be built in the appropriate environment by Guix automatically.

Once we have composed our guix shell command, we can verify whether it is working by running a quick test_FEMBEM test inside of the target environment:

guix time-machine -C .guix/channels.scm -- shell <list-of-packages> -- \
     test_FEMBEM --fembem -nbpts 1000 -solvehmat

By default, all the packages in the environment that depend on a BLAS 2 library use the OpenBLAS implementation. However, in this study, we want to evaluate also the vendor-specific Intel(R) MKL. Before going further, let us export the manifest specifying the experimental environment with OpenBLAS to the dedicated file .guix/manifests/benchmarks-openblas.scm:

guix time-machine -C .guix/channels.scm -- shell \
     --export-manifest <list-of-packages> \
     > .guix/manifests/benchmarks-openblas.scm

To replace a dependency (input in Guix terminology) in a package tree, it is possible to use the --with-input switch of guix shell. Modify your Guix command line like so. It may take some time to complete as some of the core packages need to be rebuilt with Intel(R) MKL instead of OpenBLAS.

guix time-machine -C .guix/channels.scm -- shell --container \
     --with-input=openblas=mkl  <list-of-packages> -- \
     test_FEMBEM --fembem -nbpts 1000 -solvehmat

Once everything is working, we export the corresponding manifest into .guix/manifests/benchmarks-mkl.scm using:

guix time-machine -C .guix/channels.scm -- shell --with-input=openblas=mkl \
     --export-manifest <list-of-packages> > .guix/manifests/benchmarks-mkl.scm

We can now use the following commands to enter the benchmark execution environment:

  1. with the OpenBLAS implementation of BLAS,

    guix time-machine -C .guix/channels.scm -- shell --container \
         -m .guix/manifests/benchmarks-openblas.scm -- ...
    
  2. with the vendor-specific Intel(R) MKL implementation of BLAS.

    guix time-machine -C .guix/channels.scm -- shell --container \
         -m .guix/manifests/benchmarks-mkl.scm -- ...
    

Finally, to enter the post-processing and publishing environment, we can use:

guix time-machine -C .guix/channels.scm -- shell --conatiner \
     -m .guix/manifests/post-processing.scm -- ...

3.4.3. Reproducing the study

We are now getting to the most important challenge of the day. We are about to reproduce the study using Guix. Explore the README.md in the Reference study repository and try to identify the commands for running experiments involving the run.sh script as well as the command for post-processing results involving the plot.R script. Then, execute them from within the root of the Study using Guix repository and in the right Guix environment, i.e. using .guix/channels.scm together with .guix/manifests/benchmarks-openblas.scm or .guix/manifests/benchmarks-mkl.scm for running benchmarks and .guix/manifests/post-processing.scm for post-processing the results. We have seen the associated guix command lines at the end of Section 3.4.2.

Note that after the execution of benchmarks we should obtain a file named results.csv in benchmarks/results-openblas as well as a second file named results.csv in benchmarks/results-mkl. After results post-processing we should obtain three scalable vector graphics *.pdf figures within the figures folder, i.e. chameleon.pdf, hmat-chameleon.pdf and hmat-chameleon-error.pdf.

Finally, to publish the study manuscript featuring our results, we can use this command.

guix time-machine -C .guix/channels.scm -- shell --container \
    -m .guix/manifests/post-processing.scm -- \
    latexmk --shell-escape -f -pdf -bibtex -interaction=nonstopmode study

3.4.4. Reproducing guidelines

The aim of the README.md document is to provide all the information necessary to reproduce our study.

Complete the two empty shell code blocks in the dedicated section of the document with the instructions for running experiments and post-processing results used in the previous step.

We're officially done! Congratulations!

via GIPHY

4. Pointers

In addition to the bibliography at the end of the document, the following pointers may be of interest for those who would like to learn further about Guix, literate programming or Org mode:

5. References

[1]
J. L. Furlani, “Providing a flexible user environment,” in Proceedings of the fifth large installation syst. administration (lisa v), 1991, pp. 141–152 [Online]. Available: http://modules.sourceforge.net/docs/Modules-Paper.pdf
[2]
T. Gamblin et al., “The Spack Package Manager: Bringing Order to HPC Software Chaos,” 2015, doi: 10.1145/2807591.2807623 [Online]. Available: https://github.com/spack/spack
[3]
G. M. Kurtzer, V. Sochat, and M. W. Bauer, “Singularity: Scientific containers for mobility of compute,” PLOS ONE, vol. 12, no. 5, pp. 1–20, 2017, doi: 10.1371/journal.pone.0177459. [Online]. Available: https://doi.org/10.1371/journal.pone.0177459
[4]
D. Merkel, “ Docker: Lightweight Linux Containers for Consistent Development and Deployment ,” Linux journal, vol. 2014, no. 239, Mar. 2014.
[5]
L. Courtès and R. Wurmus, “Reproducible and user-controlled software environments in HPC with Guix,” in Euro-par 2015: Parallel processing workshops, 2015, pp. 579–591 [Online]. Available: https://hal.inria.fr/hal-01161771/en
[6]
N. Vallet, D. Michonneau, and S. Tournier, “Toward practical transparent verifiable and long-term reproducible research using Guix,” Nature scientific data, vol. 9, 2022, doi: 10.1038/s41597-022-01720-9.
[7]
“test\_FEMBEM, a simple application for testing dense and sparse solvers with pseudo-FEM or pseudo-BEM matrices.” https://gitlab.inria.fr/solverstack/test_fembem.
[8]
“GNU Guix Cookbook: Basic setup with manifests.” https://guix.gnu.org/cookbook/en/html_node/Basic-setup-with-manifests.html.
[9]
R. Dybvig and J. Hébert, The Scheme Programming Language, Fourth Edition. 2009 [Online]. Available: https://www.scheme.com/tspl4/
[10]
“GNU Guix Reference Manual: Channels.” https://guix.gnu.org/manual/en/html_node/Channels.html.
[11]
“Packages - GNU Guix.” https://packages.guix.gnu.org/.

Footnotes:

1

Message Passing Interface - a message-passing library interface specification addressing primarily the message-passing parallel programming model, in which data is moved from the address space of one process to that of another process through cooperative operations on each proces.

2

Basic Linear Algebra Subprograms - routines that provide standard building blocks for performing basic vector and matrix operations.

Date: July 04, 2023 | 14:31:25

Author: Ludovic Courtès, Marek Felšöci

Email: ludovic.courtes@inria.fr, marek.felsoci@inria.fr

Emacs 28.2 (Org mode 9.6.6)

Validate