Tutoriel Guix - Compas 2023
Tutorial
Table of Contents
1. Foreword, goals and motivations
Reproducibility of a research study in computer science has always been a complex matter. One of the biggest challenges is to recreate the same software environment. The latter is often built manually or using modules [1], especially on high-performance computing (HPC) platforms. The main issue with this approach is that modules and building instructions are likely to vary from one machine to another depending on system configuration and available software. Some package managers such as Spack [2] allow users to define their own package variants and leave them the choice of the configuration, the dependencies or the compiler. However, as the package manager still depends on compilers and other components provided by the underlying system, the reproducibility of software environments remains threatened. From this point of view, the container solutions such us Singularity [3] or Docker [4] are more robust but they do not make updating or management of multiple environment variants very easy. For example, if we want to use a different version of one or more packages, we either have to modify the container interactively, which would make it even less reproducible, or re-build it, which can take lot of time if performed regularly. Moreover, because the container is often based on an existing Linux distribution, we are limited to the versions of various core packages (compilers, MPI 1, BLAS 2, …) provided by that particular distribution in that particular release unless we want to manually re-build and re-configure a good chunk of the software environment.
Our goal is to cope with these limitations, take total control over our software environments so as to be able to adapt and reproduce them easily. To this end, we propose to explore the usage of Guix [5], [6]. In this tutorial, after a short tour of Guix, Nix and other possible solutions, we will start from an existing experimental research study, the software environment of which can be recreated either manually or by the means of a pre-built Docker container. In the rest of the document, we refer to this version of the study as to the Reference study.
The participants will learn the basics of Guix and be able to observe its advantages over the aforementioned approaches. Then, we will make use of Guix to improve the reproducibility of this research study.
At the end of the session, the participants should have built a standalone git repository containing the same research study but managing the software environment with Guix. The ultimate goal is to reproduce the study from A to Z by redoing all the benchmarks, post-processing the results and finally producing the associated article in PDF. In the rest of this document, we refer to this version of the study as to the Study using Guix.
The rest of the document is organized as follows. In Section 2, we describe the workspace which will be used during the hands-on session the instructions for which are given in Section 3. Finally, we provide some additional useful pointers in Section 4.
2. Workspace
For the needs of this session, a dedicated project group, namely Tutoriel Guix - Compas 2023, has been created on the GitLab of Inria with the following structure:
Tutoriel Guix - Compas 2023 │ ├── test_FEMBEM │ ├── Reference study │ └── Study using Guix ├── Tutorial ├── Résumé └── Slides
The test_FEMBEM
subgroup contains two versions of the same resarch study based
on the open-source edition of the test_FEMBEM
solver [7]:
- Reference study, the software environment of which is not managed with Guix and
- Study using Guix, the software environment of which is managed with Guix.
The Tutorial
, Résumé
and Slides
repositories contain the sources and the
Guix environment specifications corresponding to the present document, the
introductory presentation and the abstract of the tutorial, respectively. We do
not report further on these repositories.
2.1. Reference study
This repository contains an experimental study featuring the test_FEMBEM
solver
with the following structure:
Reference study │ ├── benchmarks │ ├── definitions.csv │ └── run.sh ├── figures │ └── short-pipe.png ├── public │ └── .gitignore ├── .gitignore ├── .gitlab-ci.yml ├── Dockerfile ├── README.md ├── plot.R ├── references.bib └── study.tex
In this case, we do not rely on Guix to manage the software environment of the
research study. One can either rely on the combination of native system package
manager and manual builds in order to create a software environment close to the
original or use the accompanying pre-built Docker container defined in
Dockerfile
, as detailed in README.md
.
The README.md
file further provides guidelines for redoing the experiments
defined in definitions.csv
using the dedicated run.sh
shell script,
post-processing the results by the means of the plot.R
R script the output of
which is placed into the figures
folder and producing the PDF of the study
manuscript based on its LaTeX source study.tex
and the associated bibliography
file references.bib
.
Note that the public
folder is used by the continuous integration engine for
publishing repository's static webpage hosted on GitLab pages.
2.2. Study using Guix
This repository contains the same research study as the Reference study repository. However, here we do rely on Guix to manage the software environment of the study. See the structure of the repository below:
Study using Guix │ ├── .guix │ ├── channels.scm │ └── manifests │ ├── benchmarks-openblas.scm │ ├── benchmarks-mkl.scm │ └── post-processing.scm ├── benchmarks │ ├── definitions.csv │ └── run.sh ├── figures │ └── short-pipe.png ├── public │ └── .gitignore ├── .gitignore ├── .gitlab-ci.yml ├── README.md ├── plot.R ├── references.bib └── study.tex
Compared to Reference study, this repository contains some extra files, i.e. in the
.guix
repository. channels.scm
and the files in .guix/manifests
represent
the specification of the Guix software environment of the study (see Section
3.3). All the other files remain the same as in Reference study.
The second part of the hands-on session will be based on this repository. The
master
branch contains the complete configuration we should have built by the
end of the session. The level0
branch represents the starting point for the
participants to be completed during the session. For those joining us later or
wanting to skip one or more phases, there are the other levelX
branches
corresponding to different levels of completion of the tutorial.
3. Hands-on session
In the first place, we will put the Study using Guix study repository aside and familiarize ourselves with Guix.
3.1. Installing Guix
If plan to use Guix on the PlaFRIM cluster, connect over secure shell (SSH) with:
ssh -Y NAME@plafrim -J NAME@formation.plafrim.fr
… where NAME
is your login name, typically compas-LASTNAME
.
Now you can skip this section and jump directly to Section 3.2.
Here, we assume that we are running a third-party Linux distribution such as Debian, Fedora or Manjaro. We can install the Guix package manager on top of that distribution without interferring with our primary package manager. To do so, we use the official installation shell script that needs to be run with superuser privileges.
cd /tmp
wget https://git.savannah.gnu.org/cgit/guix.git/plain/etc/guix-install.sh
chmod +x guix-install.sh
sudo ./guix-install.sh
Then, we just need to follow on-screen instructions.
3.2. Running Guix for the first time
If you’re using the PlaFRIM cluster, you only need to run two things:
guix build hello guix pull
… and you can skip the remainder of this section.
After the installation, we proceed with a short sequence of commands to ensure a
smooth user experience with Guix onward. At the beginning, we install our first
package using Guix, i.e. glibc-locales
to allow the system to switch locales.
guix install glibc-locales
Then, to be able to acquire new versions of installed packages, we will need to pull the latest version of Guix first. The following command can take a while to execute, especially when run for the first time.
guix pull
Once the process finishes, we need to follow the hint the command gives us and
add the following lines to our .bash_profile
or .bashrc
to always get access
to the most recent Guix built by guix pull
.
GUIX_PROFILE="$HOME/.config/guix/current" . "$GUIX_PROFILE/etc/profile"
We also have to tell to our shell to use this new Guix.
hash guix
Finally, we can update our installed packages.
guix upgrade
To get information on the generation (revision in Guix terminology) of Guix being used, we can use:
guix describe
3.3. Familiarization with Guix
Let us enter our first Guix environment containing two packages, bash
and
cowsay
, using the guix shell
command and launch a shell inside of that
environment. Here, we use the --container
or -C
switch to span the new
environment within an isolated container. By default, we don't have access to
host filesystem (except for the current working directory), to host network or
environment variables. See guix shell --help
for more details.
guix shell --container bash cowsay -- bash
To test the cowsay
package within the Guix shell, try to type cowsay "Hello
world!"
, for example.
On some systems, using --container
fails with an error along these
lines:
$ guix shell --container coreutils guix shell: error: clone: 2114060305: Invalid argument
This indicates that the system lacks support for Linux’s unprivileged
user namespaces. Worry not: you can fall back to --pure
, which is
weaker, but still gives good control over the environment.
We can simply type exit
to get back to our original shell. Also, we do not
have to run an interactive shell inside of the environment. We can directly
execute a given command like for example:
guix shell --container cowsay -- cowsay "Hello world!"
Note that we did not include bash
this time, we did not need it. The above
command line should give us the following output:
______________ < Hello world! > -------------- \ ^__^ \ (oo)\_______ (__)\ )\/\ ||----w | || ||
3.3.1. Manifests
The guix shell
command seems very convenient. However, let us imagine that we
do not need only two but 26 packages in our environment. The command line would
become quiet long, right? The good news is that we can instead put our list of
packages into a file, referred to as manifest, then use the --manifest
or -m
option to pass the manifest to our guix shell
command line.
Manifest files [8] use the Scheme language [9]
syntax which can be intimidating in the beginning. Fortunately, guix shell
has
recently got the --export-manifest
option allowing one to automatically
generate the manifest file corresponding to the environment specified on the
command line. Let us thus create the manifest corresponding to our latest
single-package environment and save it to a Scheme file named cowsay.scm
.
guix shell --export-manifest cowsay > cowsay.scm
Our manifest should look like this. Not so scary in the end, is it?
(specifications->manifest
(list "cowsay"))
Finally, we can enter the target environment using the manifest file and retry
a cowsay
command.
guix shell --pure -m cowsay.scm -- cowsay "Hello from the manifest!"
3.3.2. Channels
Note that, by default, guix shell
considers the latest versions of the
specified packages available in our current revision of Guix. Well, maybe we are
fine with that at this point but what happens if we want to enter the exact same
environment a couple of weeks, months or years later? Maybe the packages will
not be even available anymore, or at least, not in the same version.
Software packages in Guix are provided through dedicated git repositories called
channels [10]. The official Guix channel guix
, automatically
set up in our Guix installation, currently provides 21,921 packages (status on
June 30, 2023 at 3:35PM) [11]. However, many other channels are
available, e.g. for scientific HPC software and so on. We will discuss the usage
of multiple channels later. For the moment, let us focus on the default guix
channel.
To ensure the same revision of Guix providing the same packages in the same
versions, we can accompany our manifest with a channel file, also written in
Scheme. In the latter we can specify the channel or channels to use together
with the desired revision number, i.e. commit. We can obtain the currently used
commit of the guix
channel (and other channels, if any) by typing guix
describe
. The output should look like follows, modulo the language, date and
time ;-)
.
Pokolenie 4 22. juin 2023 11:50:18 (súčasné) guix 7f3c6d3 zdroj repozitára: https://git.savannah.gnu.org/git/guix.git vetva: master úprava: 7f3c6d3b3ba86a8051e394e4ec9a6f6089753cb1
Using the above information, we can create the corresponding channel file
my-channels.scm
.
(list (channel (name 'guix) (url "https://git.savannah.gnu.org/git/guix.git") (commit "7f3c6d3b3ba86a8051e394e4ec9a6f6089753cb1")))
Note that, we can directly obtain the list of channels currently used by the
system in Scheme by typing guix describe -f channels
. It gives us a handy
starting point for building our own channel file, i.e. by changing commit
numbers, branches or by adding or removing channels to or from the list.
Finally, to execute the guix shell
command using our channel file, we can use
the guix time-machine
command with the --channels
or -C
switch.
guix time-machine -C my-channels.scm -- shell --container \ -m cowsay.scm -- cowsay "Great, a channel and a manifest file!"
3.4. Building a reproducible study
We are now ready for the core part of the hands-on session. We are going to improve the reproducibility of the research study from the Reference study thanks to Guix.
Before we begin, we need to clone the Study using Guix repository we are going to work with. It already contains all of the files required for it to work but some of them need to be completed. Note that, if you want to keep a copy of the repository with all you'll have done during the session, feel free to fork it first!
git clone https://gitlab.inria.fr/tutoriel-guix-compas-2023/test_fembem/study-using-guix.git
Navigate to the root of the freshly cloned repository
cd study-using-guix
and depending on where in the hands-on session you want to join, checkout the right branch to start from:
git checkout level0
: the channel definition (the very beginning),git checkout level1
: the manifest definition,git checkout level2
: the reproduction of the study in a Guix environment.
3.4.1. Channels
The study relies on the test_FEMBEM
solver suite which in turn depends on
packages that are not available through the official guix
channel. We will
thus need two extra channels.
Fill the Scheme file .guix/channels.scm
with the follwing list of channels
while respecting the syntax seen in Section 3.3.2.
guix
, the official Guix channel- link:
https://git.savannah.gnu.org/git/guix.git
- commit:
7f3c6d3b3ba86a8051e394e4ec9a6f6089753cb1
- link:
guix-hpc
, the channel of the GuixHPC effort providing some commonly used HPC applications, e.g. solvers, runtimes, …- link:
https://gitlab.inria.fr/guix-hpc/guix-hpc.git
- commit:
356a3b4200a5c7a15cf85fc0e8680b74444c689d
- link:
guix-hpc-non-free
, a companion channel toguix-hpc
providing non-free libraries (MKL, CUDA, …) and non-free versions of some of the packages provided byguix-hpc
(PaStiX with MKL, …)- link:
https://gitlab.inria.fr/guix-hpc/guix-hpc-non-free.git
- commit:
e9b113ddadc69fd21026777f5f893ba8ae3183aa
- link:
3.4.2. Manifests
There are some packages dedicated exclusively to the execution of benchmarks and
some other dedicated to the post-proccesing of results and the publication of
the study manuscript. We will thus have more than one manifest files in this
study. For now, it will be .guix/manifests/benchmarks-openblas.scm
and
.guix/manifests/post-processing.scm
. The manifest for post-processing results
has been pre-filled. Therefore, we will focus only on the most important, the
software environment for running benchmarks.
At first, we will need to compose the guix shell
command allowing us to enter
the correct environment. We begin by verifying whether a test_FEMBEM
package
is provided by one of the channels we specified earlier. For this, we can use
the guix search
command.
guix time-machine -C .guix/channels.scm -- search "test_FEMBEM"
In addition to this core package, we will need the following packages as well:
openmpi
, openssh
, sed
, which
, grep
, coreutils
and bash
. We can
observe that the list of requested packages here is substantially shorter than
the one in the README.md
file in the Reference study repository. This is because
we do not need to include the packages required to build test_FEMBEM
, the
package will be built in the appropriate environment by Guix automatically.
Once we have composed our guix shell
command, we can verify whether it is
working by running a quick test_FEMBEM
test inside of the target environment:
guix time-machine -C .guix/channels.scm -- shell <list-of-packages> -- \
test_FEMBEM --fembem -nbpts 1000 -solvehmat
By default, all the packages in the environment that depend on a BLAS 2
library use the OpenBLAS implementation. However, in this study, we want to
evaluate also the vendor-specific Intel(R) MKL. Before going further, let us
export the manifest specifying the experimental environment with OpenBLAS to the
dedicated file .guix/manifests/benchmarks-openblas.scm
:
guix time-machine -C .guix/channels.scm -- shell \ --export-manifest <list-of-packages> \ > .guix/manifests/benchmarks-openblas.scm
To replace a dependency (input in Guix terminology) in a package tree, it is
possible to use the --with-input
switch of guix shell
. Modify your Guix
command line like so. It may take some time to complete as some of the core
packages need to be rebuilt with Intel(R) MKL instead of OpenBLAS.
guix time-machine -C .guix/channels.scm -- shell --container \ --with-input=openblas=mkl <list-of-packages> -- \ test_FEMBEM --fembem -nbpts 1000 -solvehmat
Once everything is working, we export the corresponding manifest into
.guix/manifests/benchmarks-mkl.scm
using:
guix time-machine -C .guix/channels.scm -- shell --with-input=openblas=mkl \ --export-manifest <list-of-packages> > .guix/manifests/benchmarks-mkl.scm
We can now use the following commands to enter the benchmark execution environment:
with the OpenBLAS implementation of BLAS,
guix time-machine -C .guix/channels.scm -- shell --container \ -m .guix/manifests/benchmarks-openblas.scm -- ...
with the vendor-specific Intel(R) MKL implementation of BLAS.
guix time-machine -C .guix/channels.scm -- shell --container \ -m .guix/manifests/benchmarks-mkl.scm -- ...
Finally, to enter the post-processing and publishing environment, we can use:
guix time-machine -C .guix/channels.scm -- shell --conatiner \
-m .guix/manifests/post-processing.scm -- ...
3.4.3. Reproducing the study
We are now getting to the most important challenge of the day. We are about to
reproduce the study using Guix. Explore the README.md
in the Reference study
repository and try to identify the commands for running experiments involving
the run.sh
script as well as the command for post-processing results involving
the plot.R
script. Then, execute them from within the root of the Study using Guix
repository and in the right Guix environment, i.e. using .guix/channels.scm
together with .guix/manifests/benchmarks-openblas.scm
or
.guix/manifests/benchmarks-mkl.scm
for running benchmarks and
.guix/manifests/post-processing.scm
for post-processing the results. We have
seen the associated guix
command lines at the end of Section
3.4.2.
Note that after the execution of benchmarks we should obtain a file named
results.csv
in benchmarks/results-openblas
as well as a second file named
results.csv
in benchmarks/results-mkl
. After results post-processing we
should obtain three scalable vector graphics *.pdf
figures within the
figures
folder, i.e. chameleon.pdf
, hmat-chameleon.pdf
and
hmat-chameleon-error.pdf
.
Finally, to publish the study manuscript featuring our results, we can use this command.
guix time-machine -C .guix/channels.scm -- shell --container \ -m .guix/manifests/post-processing.scm -- \ latexmk --shell-escape -f -pdf -bibtex -interaction=nonstopmode study
3.4.4. Reproducing guidelines
The aim of the README.md
document is to provide all the
information necessary to reproduce our study.
Complete the two empty shell code blocks in the dedicated section of the document with the instructions for running experiments used in Section 3.4.2.
We're officially done! Congratulations!
4. Pointers
In addition to the bibliography at the end of the document, the following pointers may be of interest for those who would like to learn further about Guix, literate programming or Org mode:
- https://tuto-techno-guix-hpc.gitlabpages.inria.fr/guidelines/ (our first tutorial on Guix covering also literate programming in Org mode and repository archival on Software Heritage, self-contained)
- https://hpc.guix.info/ (Guix-HPC, reproducible software deployment for high-performance computing: channels, packages, events, …)
- https://cours-mf.gitlabpages.inria.fr/is328/tuto-chameleon.html (tutorial on how to use Guix or Singularity images produced by Guix on HPC platforms such as PlaFRIM)
- https://felsoci.sk/blog/posts.html (blog of Marek Felšöci with posts on Guix and Org mode usage - for work and for home)
5. References
Footnotes:
Message Passing Interface - a message-passing library interface specification addressing primarily the message-passing parallel programming model, in which data is moved from the address space of one process to that of another process through cooperative operations on each proces.
Basic Linear Algebra Subprograms - routines that provide standard building blocks for performing basic vector and matrix operations.