Ph.D. thesis of Marek Felšöci: Experimental environment
Table of Contents
- 1. Literate programming
- 2. Building reproducible software environments
- 3. Performing benchmarks
- 3.1. gcvb
- 3.2.
sbatch
templates - 3.3. Ensuring filesystem
- 3.4. Configuration file
- 3.5. Definition file
- 3.6. Resource monitoring
- 3.7. Energy consumption monitoring
- 3.8. Result parsing
- 3.9. Database injecting
- 3.10. Extracting additional results
- 3.11. Wrapper scripts
- 3.12. Generate benchmark runs
- 3.13. Job submission
- 4. Post-processing results
- 4.1. Benchmarks
- 4.1.1. Prerequisites
- 4.1.2. Importing data
- 4.1.3. Formatting data
- 4.1.4. Style presets
- 4.1.5. Plot functions
- 4.1.5.1.
times_by_nbpts
- 4.1.5.2.
multisolve_times_by_nbrhs_and_nbpts
- 4.1.5.3.
multisolve_rss_by_nbrhs_and_nbpts
- 4.1.5.4.
multisolve_memory_aware
- 4.1.5.5.
multifacto_memory_aware
- 4.1.5.6.
multifacto_times_by_nbpts_and_schur_size
- 4.1.5.7.
multifacto_rss_by_nbpts_and_schur_size
- 4.1.5.8.
compare_coupled
- 4.1.5.9.
compare_rss_coupled
- 4.1.5.10.
accuracy_by_nbpts
- 4.1.5.11.
rss_peaks_by_nbpts
- 4.1.5.12.
rss_by_time
- 4.1.5.13.
eprofile
- 4.1.5.14.
es_comparison
- 4.1.5.15.
scalability
- 4.1.5.16.
scalability_ram
- 4.1.5.17.
efficiency
- 4.1.5.1.
- 4.2. StarPU execution traces
- 4.1. Benchmarks
- 5. Appendix
Reproducibility of numerical experiments has always been a complex matter. Building exactly the same software environment on multiple computing platforms may be long, tedious and sometimes virtually impossible to be done manually.
Within our research work, we put strong emphasis on ensuring the reproducibility of the experimental environment. On one hand, we make use of the GNU Guix transactional package manager allowing for a complete reproducibility of software environments across different machines. On the other hand, we rely on the principles of literate programming in an effort to maintain an exhaustive, clear and accessible documentation of our experiments and the associated environment.
1. Literate programming
We choose to write the source code of scripts and various configuration files allowing us to design and automatize numerical experiments in respect of the paradigm known as literate programming Knuth84. The idea of this approach is to associate source code with an explanation of its purpose written in a natural language.
There are numerous software tools designed for literate programming. We rely on
Org mode for Emacs Dominik18,emacs which defines the Org markup language
allowing to combine formatted text, images and figures with traditional source
code. Files containing documents written in Org mode should end with the .org
extension.
Extracting a compilable or interpretable source code from an Org document is called tangling OrgTangle. It is also possible to evaluate a particular source code block directly from the Emacs editor OrgEval while editing. For example, this is particularly useful for the visualization of experimental results.
Eventually, an Org document can be exported to various output formats OrgExport such as LaTeX or Beamer, HTML and so on.
The .org
source this document is based on 1 features the source
code blocks of multiple scripts and configuration files that needs to be tangled
into separate source files before constructing our software environment and
realizing the experiments.
2. Building reproducible software environments
We are likely to work on various computing platforms and need to manage multiple software packages and their dependencies across different machines. To keep our software environment reproducible, we rely on the GNU Guix guix package manager.
2.1. GNU Guix
Guix is a transactional package manager and a stand-alone GNU Linux distribution where each user can install its own packages without any impact on the others and with the possibility to switch between multiple system or package versions. An environment created using Guix is fully reproducible across different computing platforms. In other words, a package built on a computer with a given commit remains the same when rebuilt on another machine.
2.2. Channels
Software packages in Guix are available through dedicated Git repositories, called channels, containing package definitions.
The first and default channel is the system channel providing Guix itself as
well as the definitions of some commonly used packages such as system libraries,
compilers, text editors and so on. Afterwards, we need to include additional
channels using a custom channel file channels.scm
.
For each channel, we specify the commit of the associated repository to acquire. This way, we make sure to always build the environment using the exact same versions of every single package in the system and guarantee a better reproducibility.
(list (channel (name 'guix) (url "https://git.savannah.gnu.org/git/guix.git") (commit "9235dd136e70dfa97684aff4e9af4c0ce366ad68"))
Following the system channel, we include guix-hpc
and guix-hpc-non-free
providing various open-source and non-free scientific software and libraries.
(channel (name 'guix-hpc) (url "https://gitlab.inria.fr/guix-hpc/guix-hpc.git") (commit "7e7fdfb641f5009dcf21cd8f8edc0d41d1faa7f5")) (channel (name 'guix-hpc-non-free) (url "https://gitlab.inria.fr/guix-hpc/guix-hpc-non-free.git") (commit "1305692c778810747c903cfa7e1d729d74ff2fb4"))
We need also the private channel guix-hpc-airbus
defining proprietary Airbus
packages providing the benchmark software we use. The channel is private and one
must have access to the corresponding repository and a duly configured SSH key
in $HOME/.ssh
on the target machine.
(channel (name 'guix-hpc-airbus) (url "git@gitlab.inria.fr:mfelsoci/guix-hpc-airbus.git") (commit "014bccdfebc4aa4ec4cc3f523301e55759e7a9ac"))
Eventually, we import the guix-extra
channel providing some utility packages
we use mainly during the post-processing of benchmark results.
(channel (name 'guix-extra) (url "https://gitlab.inria.fr/mfelsoci/guix-extra.git") (commit "35cdd19f80516ad7b89a68be5184910719324aca")))
The channel file must be saved to $HOME/.config/guix/channels.scm
and to make
the system aware of the changes, it is necessary to construct a new system
generation using guix pull
.
See the complete source file 2.
2.3. Environments
To enter a particular software environment using Guix, we use the command
guix shell
. Options of the latter allows us to specify the packages to
include together with the desired version, commit or branch.
To avoid typing lot of command line options anytime we want to enter the
environment, we use the setenv.sh
shell script to automatize the
process. It also allows us to adjust the environment using specific options.
We rely on the benchmark suite test_FEMBEM
from Airbus provided in one of
their proprietary software packages, namely scab
. test_FEMBEM
allows us to
evaluate numerous solver configurations. There is also an open-source version of
test_FEMBEM
connected exclusively to open-source solvers. It is available in
Guix as a separate package.
scab
as well as its direct dependencies, mpf
and hmat
, are proprietary
packages with closed source code residing in private Git repositories. Before
setting up the environment, one has to acquire their local copies so that Guix
can build the environment.
The setenv.sh
script begins with a help message function accessible through
the -h
option.
function help() { echo "Set up the experimental environment for the Ph.D. thesis of Marek" \ "Felšöci." >&2 echo "Usage: $0 [options]" >&2 echo >&2 echo "Options:" >&2 echo " -A Instead of the mainstream version of 'hmat', use" \ "its derived version containing the developments made by Aurélien" \ "Falco during his Ph.D. thesis." >&2 echo " -e ENVIRONMENT Switch the software environment to ENVIRONMENT." \ "Available environments are: 'benchmark' (includes packages for" \ "performing benchmarks, default choice), 'gcvb' (includes packages" \ "only for generating and launching benchmark job scripts, does not" \ "allow for running benchmark executables) 'ppp' (include packages for" \ "pre-processing and post-processing actions)." >&2 echo " -E Instead of the mainstream versions of 'mpf' and" \ "'scab', use their derived versions supporting the energy_scope" \ "tool." >&2 echo " -g Get the final 'guix environment' command line" \ "that can be followed by an arbitrary command to be executed inside" \ "the environment. The trailing '--' should be added manually!" >&2 echo " -h Show this help message." >&2 echo " -M Instead of the mainstream versions of 'hmat'," \ "'mpf' and 'scab' use their derived versions containing our work," \ "e. g. the 'mf/devel' branches of the associated Git repositories." >&2 echo " -o Use exclusively open-source software" \ "(open-source 'test_FEMBEM' with sequential version of 'hmat')." echo " -O Use OpenBLAS instead of Intel(R) MKL." >&2 echo " -p Prepare the source tarballs of the Airbus" \ "packages. If combined with '-r', the tarballs shall be placed into" \ "the directory specified by the latter. Also, the option can be used" \ "together with the '-A' or the '-M' option." >&2 echo " -r ROOT Search for Airbus sources at ROOT in lieu of the" \ "default location at '$HOME/src'." >&2 echo " -s PATH Instead of entering the specified environment," \ "create the corresponding Singularity image at PATH." >&2 echo " -S If the '-s' option is set, build the Singularity" \ "image using the same version of Open MPI as the one available on the" \ "Curta super-computer, e. g. replace the 'openmpi' and the" \ "'openmpi-mpi1-compat' packages by 'openmpi-curta'." >&2 echo " -x Use StarPU with the FXT support." >&2 }
We use a generic error message function. The error message to print is expected to be the first argument to the function. If not present, the function displays a generic error message.
function error() { if test $# -lt 1; then echo "An unknown error occurred!" >&2 else echo "Error: $1" >&2 fi }
The variable WITH_INPUT_STARPU_FXT
contains the --with-input
option for the
guix environment
command necessary to replace the usage of the ordinary StarPU
Starpu2010 package by the one compiled with FXT trace generation support
whenever the -x
option is passed on the command line.
WITH_INPUT_STARPU_FXT=""
Similarly, the WITH_INPUT_MKL
variable contains the --with-input
options
necessary to replace the usage of OpenBLAS OpenBLAS by Intel(R) Math kernel
library MKL in concerned packages.
WITH_INPUT_MKL="--with-input=pastix-5=pastix-5-mkl \ --with-input=mumps-scotch-openmpi=mumps-mkl-scotch-openmpi \ --with-input=openblas=mkl "
OPEN_SOURCE
handles the -o
option allowing to use exclusively open-source
solvers and the open-source version of the test_FEMBEM
suite. By default, the
option is disabled.
OPEN_SOURCE=0
AIRBUS_ROOT
specifies the location where to search for the source tarballs of
the Airbus packages. This can be modified using the -r
option. The default
value is $HOME/src
.
AIRBUS_ROOT="$HOME/src"
By default, the script suppose the tarballs exist already and tries to set up
the environment directly. PREPARE_TARBALLS
is a boolean switch indicating
whether the source tarballs of the Airbus packages should be generated instead
of setting up the environment. For machines without access to the concerned
private Airbus repositories, the tarballs can be generated on an another machine
using the -p
option.
PREPARE_TARBALLS=0
Normally, we want to set up the environment using the mainstream release of
hmat
. The -A
option and the associated boolean AIRBUS_AF
allows to switch
to the version containing the developments made by Aurélien Falco Falco2019
that have not been merged into the mainstream version of the package yet.
AIRBUS_AF=0
The -M
option and the associated boolean AIRBUS_MF
allows to switch to our
development versions of Airbus packages.
AIRBUS_MF=0
The -E
option and the associated boolean AIRBUS_ES
allows to switch to the
versions of mpf
and scab
having the support for the energy_scope
tool (see
Section 3.7).
AIRBUS_ES=0
At this stage, we are ready to parse the options and check the validity of option arguments where applicable.
GET_COMMAND=0 CURTA=0 ENVIRONMENT="benchmark" while getopts ":Ae:EgMoOhpr:s:Sx" option; do case $option in A) AIRBUS_AF=1 ;;
The -e
option allows to choose among multiple software environments available.
e) ENVIRONMENT=$OPTARG ;; E) AIRBUS_ES=1 ;;
The -g
option allows to print out the final guix shell
command instead of
directly entering the environment. This is useful for writing one-line commands,
for example, in the GitLab continuous integration configuration.
g) GET_COMMAND=1 ;; M) AIRBUS_MF=1 ;; o) OPEN_SOURCE=1 ;; O) WITH_INPUT_MKL="" ;; p) PREPARE_TARBALLS=1 ;; r) AIRBUS_ROOT=$OPTARG if test ! -d "$AIRBUS_ROOT"; then error "'$AIRBUS_ROOT' is not a valid directory!" exit 1 fi ;;
The -s
option allows to create a Singularity container corresponding to the
selected environment instead of directly entering it. If the container is
intended to be used on the Curta supercomputer curta, the -S
option
allows to ensure the same version of Open MPI during the build as the one
available on Curta.
s) SINGULARITY=$OPTARG SINGULARITY_DIR=$(dirname "$SINGULARITY") if test ! -d $SINGULARITY_DIR || test ! -w $SINGULARITY_DIR; then error "'$SINGULARITY_DIR' is not a valid and writable directory!" exit 1 fi ;; S) CURTA=1 ;; x) WITH_INPUT_STARPU_FXT="--with-input=starpu=starpu-fxt" ;;
We must also take into account unknown options, missing option arguments, syntax
mismatches as well as the case when the -h
option is specified.
\?) error "Arguments mismatch! Invalid option '-$OPTARG'." echo help exit 1 ;; :) error "Arguments mismatch! Option '-$OPTARG' expects an argument!" echo help exit 1 ;; h | *) help exit 0 ;; esac done
The following variables indicate the commit numbers, branch names and archive locations to use by default for the generation of the Airbus source tarballs.
HMAT_BASENAME="hmat-2021.1.0-21.52c4f6c" HMAT_TARBALL="$AIRBUS_ROOT/$HMAT_BASENAME.tar.gz" HMAT_COMMIT="52c4f6c9dfa084ad67bb9876c30a049edbd82b07" HMAT_BRANCH="master" MPF_BASENAME="mpf-2021.1.0-26.a158c8c" MPF_TARBALL="$AIRBUS_ROOT/$MPF_BASENAME.tar.gz" MPF_COMMIT="a158c8c65c223b732479ff3f321c9b0cb5dd3549" MPF_BRANCH="master" SCAB_BASENAME="scab-2021.1.0-30.6f669fe" SCAB_TARBALL="$AIRBUS_ROOT/$SCAB_BASENAME.tar.gz" SCAB_COMMIT="6f669fe98773013265240cdca4ce274f4dd85237" SCAB_BRANCH="master" ACTIPOLE_BASENAME="actipole-2020.0.1-105.c84cd1a" ACTIPOLE_TARBALL="$AIRBUS_ROOT/$ACTIPOLE_BASENAME.tar.gz" ACTIPOLE_COMMIT="c84cd1a84224ea6eb34c1a4f4ed59c818d66c3fe" ACTIPOLE_BRANCH="master"
However, when the -A
, the -M
or the -E
option is used, we need to update
these specifications accordingly.
if test $AIRBUS_AF -ne 0; then HMAT_BASENAME="hmat-af-1.6.0-382.60743b1" HMAT_TARBALL="$AIRBUS_ROOT/$HMAT_BASENAME.tar.gz" HMAT_COMMIT="60743b119afb966b6987c4421b949482dfbbf04f" HMAT_BRANCH="mf/af/bcsf" MPF_BASENAME="mpf-af-1.25.0-134.6fbcc2e" MPF_TARBALL="$AIRBUS_ROOT/$MPF_BASENAME.tar.gz" MPF_COMMIT="6fbcc2e67713205b1c3bbebdda1f6377d9698070" MPF_BRANCH="af/devel" SCAB_BASENAME="scab-af-1.9.0-175.2971244" SCAB_TARBALL="$AIRBUS_ROOT/$SCAB_BASENAME.tar.gz" SCAB_COMMIT="29712447d805d957d715a1f5ab7d8e6903997652" SCAB_BRANCH="af/ND" elif test $AIRBUS_MF -ne 0; then HMAT_BASENAME="hmat-mf-2021.1.0-23.db095d4" HMAT_TARBALL="$AIRBUS_ROOT/$HMAT_BASENAME.tar.gz" HMAT_COMMIT="db095d49d6282582701e84ab64a9a03b65780271" HMAT_BRANCH="mf/devel" MPF_BASENAME="mpf-mf-2021.1.0-27.54cd92b" MPF_TARBALL="$AIRBUS_ROOT/$MPF_BASENAME.tar.gz" MPF_COMMIT="54cd92b1976f634053771a98f4af8afe0bc71c90" MPF_BRANCH="mf/devel" SCAB_BASENAME="scab-mf-2021.1.0-30.6f669fe" SCAB_TARBALL="$AIRBUS_ROOT/$SCAB_BASENAME.tar.gz" SCAB_COMMIT="6f669fe98773013265240cdca4ce274f4dd85237" SCAB_BRANCH="mf/devel" elif test $AIRBUS_ES -ne 0; then MPF_BASENAME="mpf-energy_scope-2021.1.0-27.9aba318" MPF_TARBALL="$AIRBUS_ROOT/$MPF_BASENAME.tar.gz" MPF_COMMIT="9aba3184bb806dc0a05e2a933bfe858472bb207a" MPF_BRANCH="mf/energy_scope" SCAB_BASENAME="scab-energy_scope-2021.1.0-32.3056f3c" SCAB_TARBALL="$AIRBUS_ROOT/$SCAB_BASENAME.tar.gz" SCAB_COMMIT="3056f3c7a986cc2fabb6663707272ccf16fbb614" SCAB_BRANCH="mf/energy_scope" ACTIPOLE_BASENAME="actipole-energy_scope-2020.0.1-105.c84cd1a" ACTIPOLE_TARBALL="$AIRBUS_ROOT/$ACTIPOLE_BASENAME.tar.gz" ACTIPOLE_COMMIT="c84cd1a84224ea6eb34c1a4f4ed59c818d66c3fe" ACTIPOLE_BRANCH="master" fi
If the -p
option is specified, we get a clone of the Airbus repositories and
create the source tarballs using the specified commit number and branch name
instead of trying to setup up the environment.
if test $PREPARE_TARBALLS -ne 0; then
We begin by removing any previous clones of the Airbus repositories in
AIRBUS_ROOT
.
rm -rf $AIRBUS_ROOT/$HMAT_BASENAME $AIRBUS_ROOT/$MPF_BASENAME \ $AIRBUS_ROOT/$SCAB_BASENAME $AIRBUS_ROOT/$ACTIPOLE_BASENAME $HMAT_TARBALL \ $MPF_TARBALL $SCAB_TARBALL $ACTIPOLE_TARBALL
Then, we make fresh clones, checkout the required revisions
git clone --recurse-submodules --single-branch --branch $HMAT_BRANCH \ https://private-server.com/airbus/hmat $AIRBUS_ROOT/$HMAT_BASENAME cd $AIRBUS_ROOT/$HMAT_BASENAME git checkout $HMAT_COMMIT git submodule update git clone --single-branch --branch $MPF_BRANCH \ https://private-server.com/airbus/mpf $AIRBUS_ROOT/$MPF_BASENAME cd $AIRBUS_ROOT/$MPF_BASENAME git checkout $MPF_COMMIT git clone --single-branch --branch $SCAB_BRANCH \ https://private-server.com/airbus/scab $AIRBUS_ROOT/$SCAB_BASENAME cd $AIRBUS_ROOT/$SCAB_BASENAME git checkout $SCAB_COMMIT git clone --single-branch --branch $ACTIPOLE_BRANCH \ https://private-server.com/airbus/actipole $AIRBUS_ROOT/$ACTIPOLE_BASENAME cd $AIRBUS_ROOT/$ACTIPOLE_BASENAME git checkout $ACTIPOLE_COMMIT
and verify that the cloned repositories are valid directories.
if test ! -d $AIRBUS_ROOT/$HMAT_BASENAME || \ test ! -d $AIRBUS_ROOT/$MPF_BASENAME || \ test ! -d $AIRBUS_ROOT/$SCAB_BASENAME || \ test ! -d $AIRBUS_ROOT/$ACTIPOLE_BASENAME; then error "Failed to clone the Airbus reporitories!" exit 1 fi
We remove the .git
folders from inside the clones to shrink the size of the
final tarball created using the tar
utility.
rm -rf $AIRBUS_ROOT/$HMAT_BASENAME/.git \ $AIRBUS_ROOT/$HMAT_BASENAME/hmat-oss/.git $AIRBUS_ROOT/$MPF_BASENAME/.git \ $AIRBUS_ROOT/$SCAB_BASENAME/.git $AIRBUS_ROOT/$ACTIPOLE_BASENAME/.git tar -czf $HMAT_TARBALL -C $AIRBUS_ROOT $HMAT_BASENAME tar -czf $MPF_TARBALL -C $AIRBUS_ROOT $MPF_BASENAME tar -czf $SCAB_TARBALL -C $AIRBUS_ROOT $SCAB_BASENAME tar -czf $ACTIPOLE_TARBALL -C $AIRBUS_ROOT $ACTIPOLE_BASENAME
At the end of the procedure, we check if the tarballs were created and remove the clones.
if test ! -f $HMAT_TARBALL || test ! -f $MPF_TARBALL || \ test ! -f $SCAB_TARBALL || test ! -f $ACTIPOLE_TARBALL; then error "Failed to create tarballs!" exit 1 fi rm -rf $AIRBUS_ROOT/$HMAT_BASENAME $AIRBUS_ROOT/$MPF_BASENAME \ $AIRBUS_ROOT/$SCAB_BASENAME $AIRBUS_ROOT/$ACTIPOLE_BASENAME exit 0 fi
Eventually comes the guix shell
command itself. We use a variable to hold the
name of the package containing the solver test suite test_FEMBEM
. Indeed, by
default we use its mainstream proprietary version from the scab
package. When
the -A
option is used, we rely on scab-af
. When the -M
option is used, we
rely on scab-mf
. When the -E
option is used, we rely on scab-energy_scope
and actipole-energy_scope
. Finally, if the -o
option is used, we rely on the
open-source version of test_FEMBEM
available directly as a standalone Guix
package of the same name.
SCAB="scab" ACTIPOLE="actipole" if test $AIRBUS_AF -ne 0; then SCAB="scab-af" elif test $AIRBUS_MF -ne 0; then SCAB="scab-mf" elif test $AIRBUS_ES -ne 0; then SCAB="scab-energy_scope" ACTIPOLE="actipole-energy_scope" elif test $OPEN_SOURCE -ne 0; then SCAB="test_FEMBEM" fi
In order to access the additional features we implemented into gcvb
(see
Section 3), we switch to our fork of the package's
repository, namely gcvb-felsocim
. Sometimes, a local clone of the latter is
necessary. Being hosted on GitHub, it can not be acquired online by Guix on some
computing platforms having too restrictive proxy settings.
if test ! -d $AIRBUS_ROOT/gcvb; then git clone https://github.com/felsocim/gcvb.git $AIRBUS_ROOT/gcvb fi
The list of packages to include into the resulting environment as well as the
package modifiers and options to pass to the guix shell
command or to the
guix pack
command, if the -s
option is set, are based on the environment
switch -e
and the -x
option. Also, if the -S
option is used together with
-s
, the version of Open MPI should match the one available on the Curta
cluster.
WITH_INPUT_OPENMPI="" if test $CURTA -ne 0 && test "$SINGULARITY" != ""; then WITH_INPUT_OPENMPI="--with-input=hwloc=hwloc@1 --with-input=openmpi=openmpi-curta --with-input=openmpi-mpi1-compat=openmpi-curta" fi
Available environments are listed below. Note that, the --preserve
option
allows us to inherit selected environment variables from the parent environment.
benchmark
: environment for performing benchmarks,
MODIFIERS_BENCHMARK="$WITH_INPUT_OPENMPI $WITH_INPUT_MKL $WITH_INPUT_STARPU_FXT --with-git-url=gcvb-minimal-felsocim=$AIRBUS_ROOT/gcvb" OPTIONS_BENCHMARK="--preserve=^SLURM" PACKAGES_BENCHMARK="bash coreutils inetutils findutils grep sed bc jq openssh tar gzip likwid python python-psutil python-numpy python-matplotlib gcvb-minimal-felsocim $SCAB $ACTIPOLE" if test "$SINGULARITY" == ""; then OPTIONS_BENCHMARK="$OPTIONS_BENCHMARK --with-input=slurm=slurm@19" PACKAGES_BENCHMARK="$PACKAGES_BENCHMARK slurm openmpi" fi if test "$WITH_INPUT_STARPU_FXT" != ""; then PACKAGES_BENCHMARK="$PACKAGES_BENCHMARK r r-starvz" fi
gcvb
: environment for generating and launching benchmak job scripts, but not for running the benchmark jobs themselves which are expected to run in a Singularity container,
MODIFIERS_GCVB="--with-input=slurm=slurm@19 --with-git-url=gcvb-minimal-felsocim=$AIRBUS_ROOT/gcvb" OPTIONS_GCVB="--preserve=SINGULARITY_EXEC --preserve=SINGULARITY_IMAGE" PACKAGES_GCVB="bash coreutils inetutils findutils grep sed bc jq openssh tar gzip likwid openmpi slurm python python-psutil gcvb-minimal-felsocim"
ppp
: environment for pre-processing and post-processing actions, e.g. tangling source code from Org documents, HTML and LaTeX export, …
MODIFIERS_PPP="--with-input=r-ggplot2=r-ggplot2@git.bd50a551" OPTIONS_PPP="--preserve=TZDIR" PACKAGES_PPP="bash coreutils sed which emacs emacs-org2web emacs-org emacs-htmlize emacs-biblio emacs-org-ref emacs-ess python-pygments texlive r r-dbi r-rsqlite r-plyr r-dplyr r-readr r-tidyr r-ggplot2@git.bd50a551 r-scales r-cowplot r-stringr r-gridextra r-ggrepel r-rjson r-starvz r-ascii r-r-utils inkscape svgfix graphviz"
Based on the value of $ENVIRONMENT
, we select the environment to set up.
MODIFIERS="" OPTIONS="" PACKAGES="" case $ENVIRONMENT in benchmark) MODIFIERS="$MODIFIERS_BENCHMARK" OPTIONS="$OPTIONS_BENCHMARK" PACKAGES="$PACKAGES_BENCHMARK" ;; gcvb) MODIFIERS="$MODIFIERS_GCVB" OPTIONS="$OPTIONS_GCVB" PACKAGES="$PACKAGES_GCVB" ;; ppp) MODIFIERS="$MODIFIERS_PPP" OPTIONS="$OPTIONS_PPP" PACKAGES="$PACKAGES_PPP" ;; *) error "'$ENVIRONMENT' is not a valid software environment switch!" exit 1 ;; esac
Now it is possible to assemble the guix shell
command, its options and the
list of package to include in the resulting environment. To unset any existing
environment variables of the current environment, we use the --pure
option.
ENVIRONMENT_COMMAND="guix shell --pure $MODIFIERS $OPTIONS $PACKAGES"
If the -g
option is set, we do only print the command on the standard output.
If the -s
option is set, we generate the corresponding Singularity image using
the guix pack
command and exit. Otherwise, we directly enter the new
environment and launch a shell interpreter. The --norc
option of bash prevents
the sourcing of the current user's .bashrc
file which could compromise the
final environment with unwanted environment variables.
if test $GET_COMMAND -ne 0 && test "$SINGULARITY" == ""; then echo $ENVIRONMENT_COMMAND exit 0 fi
Note that the gcvb
environment is intended for running benchmarks in a
Singularity container. Therefore generating a container for the environment
itself makes no sense!
if test "$SINGULARITY" != "" && test "$ENVIRONMENT" == "gcvb"; then error "Generating a container for the 'gcvb' environment makes no sense!" exit 1 fi if test "$SINGULARITY" != ""; then PACK_COMMAND="guix pack -f squashfs -S /usr/bin/env=/bin/env $MODIFIERS $PACKAGES" PACK_COMMAND=$(echo $PACK_COMMAND | sed 's/\n/ /g') if test $GET_COMMAND -ne 0; then echo $PACK_COMMAND exit 0 fi echo "Building Singularity container using '$PACK_COMMAND'..." $PACK_COMMAND > .packing if test $? -ne 0; then error "Failed to build the Singularity container!" exit 1 fi echo "Done" echo "Copying the container to '$SINGULARITY'..." cp $(tail -n 1 .packing) $SINGULARITY if test $? -ne 0; then error "Failed to copy the Singularity container to '$SINGULARITY'!" exit 1 fi echo "Done" echo "Making the container readable..." chmod 644 $SINGULARITY if test $? -ne 0; then error "Failed to make the Singularity container '$SINGULARITY' readable!" exit 1 fi echo "Done" exit 0 fi $ENVIRONMENT_COMMAND -- bash --norc
See the complete source file 3.
3. Performing benchmarks
To automatize the generation and the computation of benchmarks, we use gcvb gcvb, an open-source tool developed at Airbus. gcvb allows us to define benchmarks, generate corresponding shell job scripts for every benchmark or a selected group of benchmarks, submit these job scripts for execution, then gather and optionally validate or post-process the results. To generate multiple variants of the same benchmark, gcvb supports templates.
3.1. gcvb
gcvb uses a specific file and directory structure. A minimal setup requires a
configuration file and a benchmark definition Yaml file. These files must be
placed in the same folder. Furthermore, the name of the configuration file must
be config.yaml
. On the other hand, the benchmark definition file may have an
arbitrary name. The folder we place thesefiles in represent the root of our gcvb
filesystem:
benchmarks/
represents the root of the benchmark filesystem.data/
contains data necessary to generate and perform benchmarks.all/
represents one of possibly more folders containing benchmark data. For the sake of simplicity, we use one single folder for all benchmarks.input
holds any input file necessary to generate benchmarks.references
holds any reference file needed for benchmark results validation.templates
provides file templates for template-based benchmarks. There is one subfolder for each file template. Note that we use templates to produce specific batch job script header directives for the workload manager on the target computing platform (see Section 3.2) as well as to generate wrapper scripts for launching benchmarks (see Section 3.11).
results/
contains benchmark results. Here, one sub-folder is produced every time a new session of benchmarks is generated based on the definition file. It contains job scripts and one folder per generated benchmark. These folders may hold any templated-based input file as well as the result of the corresponding benchmark execution.1/
- …
config.yaml
represents the configuration file.gcvb.db
represents an auto-generated NoSQL database which can be used to store benchmark results.benchmarks.yaml
represents the benchmark definition file.
3.2. sbatch
templates
We use slurm slurm to schedule and execute our experiments on the target
high-performance computing platforms. gcvb
produces a job script for each
benchmark described in the definition file. This script is then passed to
slurm for scheduling on a computation node or nodes.
Each job script produced by gcvb
is prepended with a header contaning the
configuration statements for the sbatch
command of slurm slurmGuide
used to submit jobs for computation. We take advantage of the template feature
in order to be able to dynamically generate #SBATCH
headers specific to a
given set of benchmarks.
An sbatch
template begins as a standard shell script.
#! /usr/bin/env bash # # /slurm/ batch job script #
We use multiple template files but most of the #SBATCH
directives are common
to all of them:
count of computational nodes to reserve,
#SBATCH -N {scheduler[nodes]}
count of task slots per node to reserve,
#SBATCH -n {scheduler[tasks]}
exclusion of the other users from the usage of the reserved resources,
#SBATCH --exclusive
reservation time,
#SBATCH --time={scheduler[time]}
location to place the slurm log files in where
%x
is the corresponding job name and%j
the identifier of the job,#SBATCH -o slurm/%x-%j.log
we exclude selected nodes because of inconsistent configuration (less RAM than expected).
#SBATCH --exclude=miriel019,miriel023,miriel030,miriel056
Note that {scheduler[nodes]}
and so on represent placeholders for values
replaced by actual values based on the definition Yaml file during template
expansion (see Section 3.5).
One of the #SBATCH
directives specific to each template file is the job name.
Based on the job name, we distinguish different sets of benchmarks. Grouping
individual benchmarks into a single job script allows us to submit less jobs.
This way, they are more balanced in terms of the computation time we need to
reserve for them on the target computing platform. For example, instead of
submitting 12 jobs having each the time limit of 10 minutes, we submit two jobs
with the time limit of 1 hour each. Benchmarks to be placed into a given common
job script are identified by matching their job name against a regular
expression. Finally, the --constraint
switch allows us to specify the node
family to rely on.
In the sbatch
header template monobatch
used for benchmarks with all the
jobs running on single computational node, the job name is simply composed of a
prefix which typically corresponds to the constant part of a benchmark name (see
Section 3.5):
<<sbatch-beginning>> #SBATCH --job-name={scheduler[prefix]} #SBATCH --constraint={scheduler[family]} <<sbatch-end>>
When a template-based benchmark definition yields a large amount of benchmarks,
we prefer to group them into multiple job scripts and launch the latter in
parallel. The value of {job[batch]}
in the polybatch
header determines which
benchmark belongs to which job script:
<<sbatch-beginning>> #SBATCH --job-name={scheduler[prefix]}-{job[batch]} #SBATCH --constraint={scheduler[family]} <<sbatch-end>>
polybatch-distributed
is a variation of polybatch
for distributed parallel
jobs requiring to specify additional scheduling constraints.
<<sbatch-beginning>> #SBATCH -N {scheduler[nodes]} #SBATCH --ntasks-per-node {scheduler[nt]} #SBATCH --exclusive #SBATCH --time={scheduler[time]} #SBATCH -o slurm/%x-%j.log #SBATCH --job-name={scheduler[prefix]}-{job[batch]} #SBATCH --constraint={scheduler[family]}{scheduler[constraint]} ulimit -c 0
For coupled solver benchmarks, we need a more fine grained distribution of the
latter among slurm jobs. So, in the coupled
sbatch
header, we add to the
job name also the names of sparse and dense solvers involved in the benchmark:
<<sbatch-beginning>> #SBATCH --job-name={scheduler[prefix]}-{job[batch]}-{sparse[name]}-{dense[name]} #SBATCH --constraint={scheduler[family]} <<sbatch-end>>
However, in some cases, the sparse and dense solver names are provided through a single placeholder:
<<sbatch-beginning>> #SBATCH --job-name={scheduler[prefix]}-{job[batch]}-{solver[name]} #SBATCH --constraint={scheduler[family]} <<sbatch-end>>
For coupled solver benchmarks relying on out-of-core computation, we need to be able to specify extra constraints additionally to the node family name as well as to exclude specific nodes from the list of available computational nodes due to the fact that not all of the nodes of one node family have the desired hard disk type.
<<sbatch-beginning>> #SBATCH --job-name={scheduler[prefix]}-{job[batch]}-{sparse[name]}-{dense[name]}-{dense[ooc]} #SBATCH --constraint={scheduler[family]}{scheduler[constraint]} <<sbatch-end>>
In case of multi-node parallel distributed benchmarks, we specify task count per node instead of specifying the total amount of tasks. Furthermore, just like for out-of-core benchmarks, we need to be able to specify additional node constraints.
<<sbatch-beginning>> #SBATCH -N {scheduler[nodes]} #SBATCH --ntasks-per-node {scheduler[nt]} #SBATCH --exclusive #SBATCH --time={scheduler[time]} #SBATCH -o slurm/%x-%j.log #SBATCH --job-name={scheduler[prefix]}-{sparse[name]}-{dense[name]}-{scheduler[nodes]}-{job[nbpts]} #SBATCH --constraint={scheduler[family]}{scheduler[constraint]} ulimit -c 0
Same for scalability benchmarks. In scalability
, we add to the job name the
name of the solver used:
<<sbatch-beginning>> #SBATCH --job-name={scheduler[prefix]}-{solver[name]}-{parallel[map]} #SBATCH --constraint={scheduler[family]} <<sbatch-end>>
At the end of the header, we want to record the date and the time when the job was scheduled and on which node.
echo "Job scheduled on $(hostname), on $(date)" echo
Then, we disable the creation of memory dump files in case of a memory error. Even if they can be particulary useful, in some cases they consume too much disk space and prevent other jobs from running.
ulimit -c 0
See the complete source files monobatch
4, polybatch
3.3. Ensuring filesystem
3.3.1. Initialization script
We wrote the shell script mkgcvbfs.sh
to automatize the initialization of a
gcvb
filesystem or to check if a specific gcvb
filesystem is valid.
Traditionally, the script begins with a help message function that can be
triggered using the -h
option.
function help() { echo "Initialize a gcvb file system described in FSTAB at FSPATH." >&2 echo "Usage: ./$(basename $0) [options]" >&2 echo >&2 echo "Options:" >&2 echo " -h Show this help message." >&2 echo " -c Check if a valid gcvb filesystem is present in PATH." >&2 echo " -f FSTAB Initialize the gcvb filesystem specified in FSTAB." >&2 echo " -o FSPATH Set the output path for the filesystem to create." >&2 }
We use a generic error message function. The error message to print is expected to be the first argument to the function. If not present, the function displays a generic error message.
function error() { if test $# -lt 1; then echo "An unknown error occurred!" >&2 else echo "Error: $1" >&2 fi }
The script requires an .fstab
file describing the filesystem to create (see
Section 3.3.2), e. g. the entries to initialize the filesystem
with and the destination path of the latter.
FSTAB
holds the path to an .fstab
description file provided using the -f
option.
FSTAB=""
FSPATH
holds the destination path to create the filesystem in specified using
the -o
option.
FSPATH=""
The -c
option, corresponding to the CHECK_ONLY
boolean variable, allows to
check an existing gcvb
filesystem against an .fstab
description instead of
creating it.
CHECK_ONLY=0
At this stage, we are ready to parse the options and check the validity of option arguments where applicable.
while getopts ":hcf:o:" option; do case $option in c) CHECK_ONLY=1 ;; f) FSTAB=$OPTARG if test ! -f $FSTAB; then error "'$FSTAB' is not a valid file!" exit 1 fi ;; o) FSPATH=$OPTARG ;;
We must also take into account unknown options, missing option arguments, syntax
mismatches as well as the case when the -h
option is specified.
\?) # Unknown option error "Arguments mismatch! Invalid option '-$OPTARG'." echo help exit 1 ;; :) # Missing option argument error "Arguments mismatch! Option '-$OPTARG' expects an argument!" echo help exit 1 ;; h | *) help exit 0 ;; esac done
Next, we have to check if the user has provided the path to the .fstab
file
if test "$FSTAB" == ""; then error "No filesystem description file was specified!" exit 1 fi
as well as the destination path of the gcvb
filesystem to create.
if test "$FSPATH" == ""; then error "No output location for the filesystem was specified!" exit 1 fi
Eventually, we process all of the entries in the .fstab
description file. Each
line represents a specification of an entry in the gcvb
filesystem to initialize
(see Section 3.3.2). Notice that, to separate information in an
entry specification we use colons.
for entry in $(cat $FSTAB); do
The first information tells us whether a file or a directory should be initialized.
ACTION=$(echo $entry | cut -d':' -f 1) case $ACTION in
If it is a file, follows its source path and its destination in the target filesystem.
F|f) SOURCE=$(echo $entry | cut -d':' -f 2) DESTINATION=$(echo $entry | cut -d':' -f 3)
If the -c
option is passed (see variable CHECK_ONLY
), we only check that the
target filesystem contains the file.
if test $CHECK_ONLY -ne 0; then if test ! -f $FSPATH/$DESTINATION; then error "Filesystem is incomplete! Missing '$FSPATH/$DESTINATION'." exit 1 fi continue fi
Otherwise, we need to check if the source file exists
if test ! -f $SOURCE; then error "Failed to initialize file '$SOURCE'!" exit 1 fi
before creating it at the desired path in the destination filesystem.
mkdir -p $FSPATH/$(dirname $DESTINATION) && \ cp $SOURCE $FSPATH/$DESTINATION if test $? -ne 0; then error "Failed to initialize file '$FSPATH/$DESTINATION'!" exit 1 fi ;;
If the entry specifies a directory, follows its destination path in the filesystem being initialized.
D|d) DESTINATION=$(echo $entry | cut -d':' -f 2)
If the -c
option is passed (see variable CHECK_ONLY
), we only check that the
target filesystem contains the directory.
if test $CHECK_ONLY -ne 0; then if test ! -d $FSPATH/$DESTINATION; then error "Filesystem is uncomplete! Missing '$FSPATH/$DESTINATION'." exit 1 fi continue fi
Otherwise, we create the directory at the specified path.
mkdir -p $FSPATH/$DESTINATION if test $? -ne 0; then error "Failed to initialize directory '$FSPATH/$DESTINATION'!" exit 1 fi ;;
We also need to take care of the case where the action specified in the description file is not known.
*) error "Failed to initialize filesystem! '$ACTION' is not a valid action." exit 1 ;; esac done
We finish by printing an information about successful filesystem initialization
or check in case the -c
option is passed.
if test $CHECK_ONLY -ne 0; then echo "Successfully checked the filesystem '$FSPATH'." else echo "Successfully initialized a fresh gcvb filesystem at '$FSPATH'." fi
See the complete source file mkgcvbfs.sh
5.
3.3.2. Description file
The format of an .fstab
description file is very straightforward. Each line
must begin with either a D=/=d
or a F=/=f
indicating whether a directory or
a file should be initialized. In case of a directory, the letter is followed by
a colon and the destination path of the directory. In case of a file, follows a
colon, the source path of the file, a colon and the destination path in the
target filesystem.
To describe the filesystem from Section 3.1, we use the benchmarks.fstab
description file:
D:data/all/input D:data/all/references D:data/all/templates D:results D:slurm F:monobatch:data/all/templates/monobatch/sbatch F:polybatch:data/all/templates/polybatch/sbatch F:polybatch-distributed:data/all/templates/polybatch-distributed/sbatch F:coupled:data/all/templates/coupled/sbatch F:coupled-simple:data/all/templates/coupled-simple/sbatch F:coupled-ooc:data/all/templates/coupled-ooc/sbatch F:coupled-distributed:data/all/templates/coupled-distributed/sbatch F:scalability:data/all/templates/scalability/sbatch F:wrapper-in-core.sh:data/all/templates/wrapper-in-core/wrapper.sh F:wrapper-ooc.sh:data/all/templates/wrapper-ooc/wrapper.sh F:wrapper-fxt.sh:data/all/templates/wrapper-fxt/wrapper.sh F:wrapper-in-core-distributed.sh:data/all/templates/wrapper-in-core-distributed/wrapper.sh F:es_config.json:data/all/templates/es/es_config.json F:setenv.sh:scripts/setenv.sh F:submit.sh:scripts/submit.sh F:rss.py:scripts/rss.py F:inject.py:scripts/inject.py F:parse.sh:scripts/parse.sh F:config.yaml:config.yaml F:benchmarks.yaml:benchmarks.yaml
See the complete description file benchmarks.fstab
6.
3.4. Configuration file
The configuration file is designed to provide a machine-specific information for
a gcvb
benchmark collection such as the submit command for job scripts, etc.
Nevertheless, our configuration does not vary from machine to machine, so we
use the same config.yaml
everywhere.
The configuration of a gcvb
benchmark collection is simple. It usually holds in
a few lines of code beginning by a machine identifier.
machine_id: generic
The most important is to define the path to the executable used to submit job
scripts produced by gcvb
. We rely on slurm as workload manager and we use its
sbatch
command to submit job scripts.
submit_command: sbatch
Sometimes, we need to re-run the validation phase (see Section
3.5) of benchmarks without repeating the computation itself,
e.g. in case of a change in the result parsing script (see Section
3.8). This phase does not have to be performed as a Slurm job on
a separate node. Therefore, we do not want to use sbatch
as submit command but
simply execute the job script produced by gcvb
in a Bash shell on the current
node.
va_submit_command: bash
Eventually, an associative list of executables can be defined for a handy access
from definition file. Although, as the executables are not available in the
validation phase (see Section 3.5), we can not make use of the
mechanism and initialize executables
as an empty list.
executables: []
See the complete configuration file config.yaml
7.
3.5. Definition file
The benchmark definition file begins by a set of default values automatically set for each benchmark defined in the file.
At first, we make all the benchmarks use the same data folder (see Section 3.1). Defining the benchmarks as of type template allows us to make gcvb automatically generate benchmarks for different set of parameters (see Section 3). We address this functionality further in this section too. Also, we want to keep resource usage and energy consumption logs in the gcvb SQLite databse.
default_values: test: description: "A test_FEMBEM benchmark." data: "all" type: "template" keep: ["rss*.log", "hdd*.log", "likwid*.csv", "tmp_energy_scope*/energy_scope_eprofile*.txt"]
For each task, we define the MPI parallel configuration options using the
nprocs
key. The placeholders {parallel[np]}
, {parallel[map]}
,
{parallel[rank]}
and {parallel[bind]}
shall be replaced during the
generation of the benchmark job script (see below).
task: nprocs: "-np {parallel[np]} -map-by ppr:1:{parallel[map]} -rank-by {parallel[rank]} -bind-to {parallel[bind]}"
The nthreads
key can be used to specify the number of threads to put in
action. Nevertheless, we set these values in the wrapper script used to launch
benchmark tasks (see Section 3.11). As nthreads
must be
initialized even if it is not used, we initialize it with an empty string.
nthreads: ""
The main executable to run benchmark is given by the executable
key. Although
we use the test_FEMBEM solver test suite, we do not launch the corresponding
executable directly. We use a Shell wrapper to perform some preparation and
completion actions (see Section 3.11) as well as a Python
wrapper to trace memory and storage resource consumption (see Section
3.6). At the end, all of this is run using mpirun
.
executable: mpirun
executable
and nprocs
are then reused in the final launch_command
of a
given benchmark. Below, we define the global launch command for all the
benchmark tasks based on the keys from the @job_creation
alias allowing us to
access task attributes potentially specific to each benchmark, i.e.
executable
, options
and nprocs
, from within launch_command
.
Note that commands are run from within benchmark-specific folders under the
results/<session>
directories (see Section 3.1). Also, we redirect
standard and error outputs to dedicated log files, i.e. stdout.log
and
stderr.log
respectively. stdout.log
is used later for parsing benchmark
results (see Section 3.8). Another copy of the standard output
is saved to slurm-$SLURM_JOBID.out
where SLURM_JOBID
is an environment
variable set by the job scheduler slurm
. This copy is required by the power
consumption monitoring tool energy_scope
we rely on (see Section
3.7).
The wrapper script generated by gcvb
based on the corresponding template is
not executable. Thus, we need to make it executable before running the
launch_command
.
launch_command: "chmod +x wrapper.sh && {@job_creation[executable]} {@job_creation[nprocs]} $SINGULARITY_EXEC $SINGULARITY_IMAGE bash ./wrapper.sh python3 ../../../scripts/rss.py test_FEMBEM -radius 2 -height 4 {@job_creation[options]} 2>&1 | tee stdout.log | tee slurm-$SLURM_JOBID.out"
In our case, we do not perform any result validation in terms of value check.
Although, we take advantage of the validation phase in gcvb to gather data from
log files into a separate data.csv
file per benchmark and inject them into the
gcvb NoSQL database (see Section 3.1) using a Python script (see Section
3.9). The latter calls our parsing script (see Section
3.8) to extract data from the output logs.
We begin by defining the type of each validation. The most adapted type for our
needs is the generic configuration_independent
type gcvb.
validation: type: "configuration_independent"
Then, we set the validation executable
. The executable
, i.e. inject.py
,
takes at least two arguments. The first one is the *.csv
file containing
parsed benchmark results. Follows the call to the parsing script together with
its arguments. In this case, we specify here the output log file to be
stdout.log
(see above). The -r .
parameter tells the parsing script to look
for the resource monitoring logs produced by the associated Python script (see
Section 3.6) in the current working directory and the -o data.csv
parameter defines the output file for the parsing result. We shall define the
validation launch_command
individually for each benchmark.
executable: "../../../scripts/inject.py data.csv ../../../scripts/parse.sh -s stdout.log -r . -o data.csv"
At this stage, we can define actual benchmarks. They are structured in packs. We define four packs, i.e. in-core, out-of-core, multi-node parallel distributed and test benchmarks. Each pack then contains a list of benchmarks and each benchmark may be composed of one or more tasks having each one or more validation tasks.
3.5.1. In-core benchmarks
Packs: - pack_id: "in-core" description: "In-core benchmarks." Tests:
Firstly, we want to benchmark the SPIDO solver on dense linear systems resulting
from BEM discretization for various unknown counts. Under
template_instantiation
there are four maps later expanded by gcvb to generate
multiple variants of the benchmark, e.g. for various problem sizes.
The solver
map provides the name of the solver being evaluated. This
information is not uniformely present in the standard output of test_FEMBEM.
Therefore, for the sake of clarity and simplicity, we manually specify the
solver
map in all benchmarks, not only when we evaluate multiple solvers
within one benchmark.
scheduler
holds the common job name prefix and the scheduling information used
for the generation of the associated sbatch
header file monobatch
(see
Section 3.2) as well as the wrapper script (see Section
3.11).
parallel
specifies the parallel configuration. The parallel{np}
key gives
the number of MPI processes. The parallel{nt}
key gives the number of threads
per MPI process. The parallel{map}
, parallel{rank}
and parallel{bind}
keys
indicate the mapping, the ranking and the binding of the MPI processes,
respectively.
The nbpts
array defines the problem sizes to generate the benchmark for. Note
that {scheduler[prefix]}
, {scheduler[platform]}
, {nbpts}
and so on are the
placeholders for the values defined in template_instantiation
.
The Cartesian product of all the map tuplets under template_instantiation
gives the total number of generated benchmarks. Here, we generate
1 \(\times\) 1 \(\times\) 3 = 3 variants grouped into a single job script with a
time limit of 2 hours.
Note that &MONO
and &PARALLEL_DEFAULT
are Yaml aliases to the corresponding
data allowing us to reuse them later in the document using *MONO
and
*PARALLEL_DEFAULT
, respectively.
- id: "spido-{nbpts}" template_files: &MONO [ "wrapper-in-core", "monobatch" ] template_instantiation: scheduler: - { prefix: "spido", platform: "plafrim", family: "miriel", nodes: 1, tasks: 24, time: "0-02:00:00" } parallel: - &PARALLEL_DEFAULT { np: 1, nt: 24, map: "node", rank: "node", bind: "none" } nbpts: [ 25000, 50000, 100000 ]
Follows the task corresponding to this benchmark. In this case, we only have to
indicate the options
of test_FEMBEM specific to this set of benchmarks.
Tasks: - options: "-z -nbrhs 50 --bem -withmpf -nbpts {nbpts}"
For the validation phase we only need to specify an id
and the corresponding
launch_command
. Here it consists of the validation executable
obtained
through the {@job_creation[va_executable]}
placeholder followed by an option
of the parsing script specific to this benchmark. The -K
option generally
allows us to include custom key-value pairs in the result output. Here, we use
it to specify the solver being evaluated.
Validations: - id: "va-spido-{nbpts}" launch_command: "{@job_creation[va_executable]} -K solver=spido"
In the next definition, we benchmark the HMAT solver on dense BEM systems and
under similar conditions as SPIDO. Although, here we also vary the precision
parameter \(\epsilon\) (see the epsilon
map) in template_instantiation
.
- id: "hmat-bem-{epsilon}-{nbpts}" template_files: *MONO template_instantiation: scheduler: - { prefix: "hmat-bem", platform: "plafrim", family: "miriel", nodes: 1, tasks: 24, time: "0-03:00:00" } parallel: - *PARALLEL_DEFAULT epsilon: [ "1e-3", "1e-6" ] nbpts: [ 25000, 50000, 100000, 200000, 400000, 800000, 1000000 ] Tasks: - options: "-z -nbrhs 50 --bem -withmpf --hmat --hmat-eps-assemb {epsilon} --hmat-eps-recompr {epsilon} -nbpts {nbpts}" Validations: - id: "va-hmat-bem-{epsilon}-{nbpts}" launch_command: "{@job_creation[va_executable]} -K solver=hmat-bem"
As of sparse solvers, we begin by benchmarking MUMPS on sparse linear systems
arising from FEM discretization. Besides unknown count (see the nbpts
map), we
the value of the precision parameter \(\epsilon\) when the low-rank compression
mechanism is enabled (see the epsilon
key in the compression
map). The
options
key in the compression
map defines the options for test_FEMBEM to
enable or disable the compression.
- id: "mumps-{compression[epsilon]}-{nbpts}" template_files: *MONO template_instantiation: scheduler: - { prefix: "mumps", platform: "plafrim", family: "miriel", nodes: 1, tasks: 24, time: "0-04:00:00" } parallel: - *PARALLEL_DEFAULT compression: - { epsilon: "1e-3", options: "--mumps-blr --mumps-blr-variant 1 --mumps-blr-accuracy" } - { epsilon: "1e-6", options: "--mumps-blr --mumps-blr-variant 1 --mumps-blr-accuracy" } - { epsilon: "", options: "--no-mumps-blr" } nbpts: [ 250000, 500000, 1000000, 2000000, 4000000, 8000000, 10000000 ] Tasks: - options: "-z -nbrhs 50 --fem -withmpf {compression[options]} {compression[epsilon]} --mumps-verbose -nbpts {nbpts}" Validations: - id: "va-mumps-{compression[epsilon]}-{nbpts}" launch_command: "{@job_creation[va_executable]} -K solver=mumps"
For illustrative purposes, we define a couple of additional benchmarks of MUMPS
on the same kind of linear systems. In this case, the problem size is fixed to
962 831 which matches the count of unknowns related to the FEM-discretized
domain in case of a coupled FEM/BEM system 1,000,000 unknowns. Moreover, we
consider both symmetric and non-symmetric systems (see the symmetry
map). The
precision parameter \(\epsilon\) is fixed to 10-3.
- id: "additional-mumps-{symmetry[label]}" template_files: *MONO template_instantiation: scheduler: - { prefix: "additional-mumps", platform: "plafrim", family: "miriel", nodes: 1, tasks: 24, time: "0-00:20:00" } parallel: - *PARALLEL_DEFAULT symmetry: - { label: "symmetric", options: "" } - { label: "non-symmetric", options: "-nosym" } Tasks: - options: "-z -nbrhs 50 --fem -withmpf --mumps-blr --mumps-blr-variant 1 --mumps-blr-accuracy 1e-3 --mumps-verbose -nbpts 962831 {symmetry[options]}" Validations: - id: "va-additional-mumps-{symmetry[label]}" launch_command: "{@job_creation[va_executable]} -K solver=mumps"
For HMAT on FEM systems, we evaluate the impact of the same parameters as in the case of MUMPS. In the benchmark prefix, we specify that we are working with symmetric matrices as we later define further HMAT benchmarks involving non-symmetric matrices.
- id: "hmat-fem-symmetric-{epsilon}-{nbpts}" template_files: *MONO template_instantiation: scheduler: - { prefix: "hmat-fem-symmetric", platform: "plafrim", family: "miriel", nodes: 1, tasks: 24, time: "0-02:00:00" } parallel: - *PARALLEL_DEFAULT epsilon: [ "1e-3", "1e-6" ] nbpts: [ 250000, 500000, 1000000, 2000000 ] Tasks: - options: "-z -nbrhs 50 --fem -withmpf --hmat --hmat-eps-assemb {epsilon} --hmat-eps-recompr {epsilon} -nbpts {nbpts}" Validations: - id: "va-hmat-fem-symmetric-{epsilon}-{nbpts}" launch_command: "{@job_creation[va_executable]} -K solver=hmat-fem"
We also want to evaluate the performance of the prototype implementation of the
Nested Dissection (ND) ordering technique in HMAT for the solution of sparse
systems (see the variant
map). The current implementation limits the
application of the algorithm to non-symmetric matrices. To be able to compare it
to runs without using the Nested Dissection, we must redo the latter using
non-symmetric matrices as well (see the variant
map). Moreover, when the ND is
enabled, the HMAT solver uses a significantly higher amount of memory.
Consequently, cases counting more than 250,000 unknowns cause a memory overflow.
Therefore, we benchmark the algorithm on smaller systems.
- id: "hmat-fem-non-symmetric{variant[ND]}-{epsilon}-{nbpts}" template_files: *MONO template_instantiation: scheduler: - { prefix: "hmat-fem-non-symmetric", platform: "plafrim", family: "miriel", nodes: 1, tasks: 24, time: "0-01:00:00" } parallel: - *PARALLEL_DEFAULT variant: - { ND: "", options: "--hmat-lu -nosym" } - { ND: "-nd", options: "--hmat-nd" } epsilon: [ "1e-3", "1e-6" ] nbpts: [ 25000, 50000, 100000, 200000, 250000 ] Tasks: - options: "-z -nbrhs 50 --fem {variant[options]} -withmpf --hmat --hmat-eps-assemb {epsilon} --hmat-eps-recompr {epsilon} -nbpts {nbpts}" Validations: - id: "va-hmat-fem-non-symmetric{variant[ND]}-{epsilon}-{nbpts}" launch_command: "{@job_creation[va_executable]} -K solver=hmat-fem{variant[ND]}"
In the next part, we benchmark the solvers for coupled sparse/dense FEM/BEM
systems. For solvers allowing data compression, we always set the precision
parameter \(\epsilon\) to 10-3 depending on the goal of benchmark. However, as
of the sparse part of the system, we consider both the case when it is
compressed (by MUMPS) and when it is not (see the BLR
and option
keys in the
sparse
map).
At first, we consider the two-stage implementation scheme multi-solve using
MUMPS as sparse solver and SPIDO as dense solver. We vary problem's unknown
count (see the nbpts
key in the job
map) as well as the count of right-hand
sides to be processed at once by MUMPS during the Schur complement computation
(see the nbrhs
key in the job
map). The maps sparse
and dense
define the
options for the sparse and the dense solver respectively. Finally, the track
key in the job
map allows us to enable execution timeline tracing for selected
runs.
- id: "multi-solve-{job[batch]}-{sparse[BLR]}{sparse[name]}-\ {dense[name]}-{job[nbrhs]}-{job[nbpts]}" template_files: &COUPLED [ "wrapper-in-core", "coupled" ] template_instantiation: scheduler: - { prefix: "multi-solve", platform: "plafrim", family: "miriel", nodes: 1, tasks: 24, time: "1-03:00:00" } parallel: - *PARALLEL_DEFAULT job: # N = 1M - { nbpts: 1000000, nbrhs: 32, batch: 1, track: "" } - { nbpts: 1000000, nbrhs: 64, batch: 1, track: "" } - { nbpts: 1000000, nbrhs: 128, batch: 1, track: "" } - { nbpts: 1000000, nbrhs: 256, batch: 1, track: "--timeline-trace-calls" } - { nbpts: 1000000, nbrhs: 512, batch: 1, track: "" } # N = 2M - { nbpts: 2000000, nbrhs: 32, batch: 2, track: "" } - { nbpts: 2000000, nbrhs: 64, batch: 2, track: "" } - { nbpts: 2000000, nbrhs: 128, batch: 2, track: "" } - { nbpts: 2000000, nbrhs: 256, batch: 2, track: "" } - { nbpts: 2000000, nbrhs: 512, batch: 3, track: "" } - { nbpts: 2000000, nbrhs: 1024, batch: 3, track: "" } # N = 4M - { nbpts: 4000000, nbrhs: 32, batch: 4, track: "" } - { nbpts: 4000000, nbrhs: 64, batch: 4, track: "" } - { nbpts: 4000000, nbrhs: 128, batch: 5, track: "" } - { nbpts: 4000000, nbrhs: 256, batch: 6, track: "" } - { nbpts: 4000000, nbrhs: 512, batch: 6, track: "" } # N = 7M - { nbpts: 7000000, nbrhs: 32, batch: 7, track: "" } - { nbpts: 7000000, nbrhs: 64, batch: 8, track: "" } - { nbpts: 7000000, nbrhs: 128, batch: 9, track: "" } - { nbpts: 7000000, nbrhs: 256, batch: 10, track: "" } sparse: - { name: "mumps", BLR: "blr-", options: "--mumps-verbose --mumps-blr --mumps-blr-variant 1 --mumps-blr-accuracy 1e-3 --mumps-multi-solve -nbrhsmumps" } - { name: "mumps", BLR: "", options: "--mumps-verbose --no-mumps-blr --mumps-multi-solve -nbrhsmumps" } dense: - { name: "spido", options: "" } Tasks: - options: "-d -nbrhs 50 --fembem -withmpf --coupled {sparse[options]} {job[nbrhs]} {dense[options]} -nbpts {job[nbpts]} {job[track]}" Validations: - id: "va-multi-solve-{job[batch]}-{sparse[BLR]}{sparse[name]}-\ {dense[name]}-{job[nbrhs]}-{job[nbpts]}" launch_command: "{@job_creation[va_executable]} -K solver={sparse[name]}/{dense[name]}"
Due to differences in parameter specification, we define a separate benchmark
for the two-stage multi-solve implementation relying on HMAT as dense solver.
Here, nbrhs
does not vary except for the 2,000,000 unknowns problem. For all
the other runs, we set nbrhs
to 256 which seems to be the optimal value based
on the runs involding SPIDO as dense solver. However, when using HMAT, the
parameter we are interested in is referred to as schur
here (see the job
map). The value of schur
must be at least as high as nbrhs
. In the 2,000,000
unknowns case, we want to demostrate the effect of too low values of schur
.
That is why we lower schur
(together with nbrhs
) to 32 in this case.
- id: "multi-solve-{job[batch]}-{sparse[name]}-{dense[name]}-\ {job[nbrhs]}-{job[schur]}-{job[nbpts]}" template_files: *COUPLED template_instantiation: scheduler: - { prefix: "multi-solve", platform: "plafrim", family: "miriel", nodes: 1, tasks: 24, time: "1-00:00:00" } parallel: - *PARALLEL_DEFAULT job: # N = 1M - { nbpts: 1000000, schur: 512, nbrhs: 256, batch: 1 } - { nbpts: 1000000, schur: 1024, nbrhs: 256, batch: 1 } - { nbpts: 1000000, schur: 2048, nbrhs: 256, batch: 1 } - { nbpts: 1000000, schur: 4096, nbrhs: 256, batch: 1 } # N = 2M - { nbpts: 2000000, schur: 32, nbrhs: 32, batch: 2 } - { nbpts: 2000000, schur: 64, nbrhs: 64, batch: 2 } - { nbpts: 2000000, schur: 128, nbrhs: 128, batch: 2 } - { nbpts: 2000000, schur: 256, nbrhs: 256, batch: 2 } - { nbpts: 2000000, schur: 512, nbrhs: 256, batch: 1 } - { nbpts: 2000000, schur: 1024, nbrhs: 256, batch: 1 } - { nbpts: 2000000, schur: 2048, nbrhs: 256, batch: 1 } - { nbpts: 2000000, schur: 4096, nbrhs: 256, batch: 1 } # N = 4M - { nbpts: 4000000, schur: 512, nbrhs: 256, batch: 3 } - { nbpts: 4000000, schur: 1024, nbrhs: 256, batch: 3 } - { nbpts: 4000000, schur: 2048, nbrhs: 256, batch: 4 } - { nbpts: 4000000, schur: 4096, nbrhs: 256, batch: 3 } # N = 7M - { nbpts: 7000000, schur: 512, nbrhs: 256, batch: 5 } - { nbpts: 7000000, schur: 1024, nbrhs: 256, batch: 6 } - { nbpts: 7000000, schur: 2048, nbrhs: 256, batch: 7 } - { nbpts: 7000000, schur: 4096, nbrhs: 256, batch: 8 } # N = 9M - { nbpts: 9000000, schur: 512, nbrhs: 256, batch: 9 } - { nbpts: 9000000, schur: 1024, nbrhs: 256, batch: 10 } - { nbpts: 9000000, schur: 2048, nbrhs: 256, batch: 11 } - { nbpts: 9000000, schur: 4096, nbrhs: 256, batch: 12 } sparse: - { name: "mumps", options: "--mumps-verbose --mumps-blr --mumps-blr-variant 1 --mumps-blr-accuracy 1e-3 --mumps-multi-solve" } dense: - { name: "hmat", options: "--hmat --hmat-eps-assemb 1e-3 --hmat-eps-recompr 1e-3 --coupled-mumps-hmat" } Tasks: - options: "-d -nbrhs 50 --fembem -withmpf --coupled {sparse[options]} -nbrhsmumps {job[nbrhs]} --mumps-nbcols-schur-hmat {job[schur]} {dense[options]} -nbpts {job[nbpts]}" Validations: - id: "va-multi-solve-{job[batch]}-{sparse[name]}-{dense[name]}-\ {job[nbrhs]}-{job[schur]}-{job[nbpts]}" launch_command: "{@job_creation[va_executable]} -K solver={sparse[name]}/{dense[name]}"
Then, we benchmark the two-stage implementation scheme multi-factorization using
MUMPS as sparse solver and either SPIDO or HMAT as dense solver. We vary
problem's unknown count (see the nbpts
key in the job
map) as well as the
size of row and column blocks of the Schur complement matrix during the
computation (see the schur
key in the job
map). The nb
key in job
indicates the corresponding count of blocks per block row or block column of the
Schur complement matrix. We use the IOblock
key under job
to set the size of
disk blocks when using SPIDO as dense solver. This is to ensure that the value
is a multiple of the chosen Schur complement block size. As we have only three
combinations possible, we use here the common solver
map to provide both
sparse and dense solver specifications.
- id: "multi-factorization-{job[batch]}-{solver[BLR]}{solver[name]}-\ {job[nbpts]}-{job[schur]}" template_files: [ "wrapper-in-core", "coupled-simple" ] template_instantiation: scheduler: - { prefix: "multi-factorization", platform: "plafrim", family: "miriel", nodes: 1, tasks: 24, time: "0-05:00:00" } parallel: - *PARALLEL_DEFAULT job: # N = 1.0M, n_BEM = 37,169 - { nbpts: 1000000, schur: 9296, IOblock: 2324, nb: 4, batch: 1 } - { nbpts: 1000000, schur: 12390, IOblock: 2478, nb: 3, batch: 1 } - { nbpts: 1000000, schur: 18585, IOblock: 3717, nb: 2, batch: 1 } - { nbpts: 1000000, schur: 37170, IOblock: 3717, nb: 1, batch: 1 } # N = 1.5M, n_BEM = 48,750 - { nbpts: 1500000, schur: 12188, IOblock: 3047, nb: 4, batch: 3 } - { nbpts: 1500000, schur: 16250, IOblock: 3250, nb: 3, batch: 3 } - { nbpts: 1500000, schur: 24375, IOblock: 4875, nb: 2, batch: 2 } - { nbpts: 1500000, schur: 48750, IOblock: 4875, nb: 1, batch: 2 } # N = 2.0M, n_BEM = 58,910 - { nbpts: 2000000, schur: 11784, IOblock: 1964, nb: 5, batch: 4 } - { nbpts: 2000000, schur: 14728, IOblock: 2104, nb: 4, batch: 4 } - { nbpts: 2000000, schur: 19638, IOblock: 3273, nb: 3, batch: 5 } - { nbpts: 2000000, schur: 29456, IOblock: 4208, nb: 2, batch: 6 } # N = 2.5M, n_BEM = 68,524 - { nbpts: 2500000, schur: 11421, IOblock: 1269, nb: 6, batch: 7 } - { nbpts: 2500000, schur: 13705, IOblock: 2741, nb: 5, batch: 8 } - { nbpts: 2500000, schur: 17135, IOblock: 3427, nb: 4, batch: 9 } - { nbpts: 2500000, schur: 22842, IOblock: 3807, nb: 3, batch: 10 } - { nbpts: 2500000, schur: 34264, IOblock: 4283, nb: 2, batch: 11 } solver: - { name: "mumps-spido", BLR: "blr-", options: "--mumps-verbose --mumps-blr --mumps-blr-variant 1 --mumps-blr-accuracy 1e-3 --mumps-multi-facto" } - { name: "mumps-spido", BLR: "", options: "--mumps-verbose --no-mumps-blr --mumps-multi-facto" } - { name: "mumps-hmat", BLR: "blr-", options: "--mumps-verbose --mumps-blr --mumps-blr-variant 1 --mumps-blr-accuracy 1e-3 --mumps-multi-facto --hmat --hmat-eps-assemb 1e-3 --hmat-eps-recompr 1e-3 --coupled-mumps-hmat" } Tasks: - options: "-d -nbrhs 50 --fembem -withmpf --coupled {solver[options]} --mumps-sizeschur {job[schur]} -diskblock {job[IOblock]} -nbpts {job[nbpts]}" Validations: - id: "va-multi-factorization-{job[batch]}-\ {solver[BLR]}{solver[name]}-{job[nbpts]}-{job[schur]}" launch_command: "{@job_creation[va_executable]} -K solver={solver[name]},n_blocks={job[nb]}"
As of the partially sparse-aware single-stage scheme, we benchmark the
implementation using HMAT on both sparse and dense parts. Here, we vary solely
the problem's unknown count (see the nbpts
map). The track
key in the
nbpts
map allows us to enable execution timeline tracing for selected runs.
- id: "full-hmat-{nbpts[count]}" template_files: *MONO template_instantiation: scheduler: - { prefix: "full-hmat", platform: "plafrim", family: "miriel", nodes: 1, tasks: 24, time: "3-00:00:00" } parallel: - *PARALLEL_DEFAULT nbpts: - { count: 1000000, track: "--timeline-trace-calls" } - { count: 2000000, track: "" } - { count: 4000000, track: "" } - { count: 7000000, track: "" } - { count: 9000000, track: "" } Tasks: - options: "-d -nbrhs 50 --fembem -withmpf --coupled --hmat --hmat-eps-assemb 1e-3 --hmat-eps-recompr 1e-3 -nbpts {nbpts[count]} {nbpts[track]}" Validations: - id: "va-full-hmat-{nbpts[count]}" launch_command: "{@job_creation[va_executable]} -K solver=hmat/hmat"
To evaluate the scalability of the solvers, we consider a large dense BEM or
sparse FEM linear system (see the nbpts
key in the solver
map) and vary the
parallel configuration. Note that for solvers allowing data compression, we set
the precision parameter ε to 10-3.
- id: "scalability-{solver[name]}-{parallel[map]}-\ {parallel[np]}x{parallel[nt]}-{solver[nbpts]}" template_files: &SCALA [ "wrapper-in-core", "scalability" ] template_instantiation: scheduler: - { prefix: "scalability", platform: "plafrim", family: "miriel", nodes: 1, tasks: 24, time: "0-20:00:00" } solver: - { name: "spido", nbpts: 100000, options: "--bem" } - { name: "mumps", nbpts: 2000000, options: "--fem --mumps-verbose --mumps-blr --mumps-blr-variant 1 --mumps-blr-accuracy 1e-3" } - { name: "mumps", nbpts: 4000000, options: "--fem --mumps-verbose --mumps-blr --mumps-blr-variant 1 --mumps-blr-accuracy 1e-3" }
We consider four kinds of parallel configurations:
- 1 MPI process mapped and ranked by node and 1 to 24 OpenMP and MKL threads;
mpirun
configuration example:OMP_NUM_THREADS=24 MKL_NUM_THREADS=24
mpirun -np 1 -map-by ppr:1:node -rank-by node -bind-to none test_FEMBEM...
parallel: - { np: 1, nt: 1, map: "node", rank: "node", bind: "none" } - { np: 1, nt: 6, map: "node", rank: "node", bind: "none" } - { np: 1, nt: 12, map: "node", rank: "node", bind: "none" } - { np: 1, nt: 18, map: "node", rank: "node", bind: "none" } - { np: 1, nt: 24, map: "node", rank: "node", bind: "none" }
- 2 MPI processes mapped by, ranked by and bound to sockets and 1 to 12
OpenMP and MKL threads;
mpirun
configuration example:OMP_NUM_THREADS=12 MKL_NUM_THREADS=12
mpirun -np 2 -map-by ppr:1:socket -rank-by socket -bind-to
socket test_FEMBEM...
- { np: 2, nt: 1, map: "socket", rank: "socket", bind: "socket" } - { np: 2, nt: 2, map: "socket", rank: "socket", bind: "socket" } - { np: 2, nt: 4, map: "socket", rank: "socket", bind: "socket" } - { np: 2, nt: 8, map: "socket", rank: "socket", bind: "socket" } - { np: 2, nt: 12, map: "socket", rank: "socket", bind: "socket" }
- 4 MPI processes mapped by, ranked by and bound to numa sub-nodes and 1 to 6
OpenMP and MKL threads;
mpirun
configuration example:OMP_NUM_THREADS=6 MKL_NUM_THREADS=6
mpirun -np 4 -map-by ppr:1:numa -rank-by numa -bind-to numa test_FEMBEM...
- { np: 4, nt: 1, map: "numa", rank: "numa", bind: "numa" } - { np: 4, nt: 2, map: "numa", rank: "numa", bind: "numa" } - { np: 4, nt: 4, map: "numa", rank: "numa", bind: "numa" } - { np: 4, nt: 6, map: "numa", rank: "numa", bind: "numa" }
- 1 to 24 MPI processes mapped by, ranked by and bound to cores and 1 OpenMP and
MKL thread.
mpirun
configuration example:OMP_NUM_THREADS=1 MKL_NUM_THREADS=1
mpirun -np 24 -map-by ppr:1:core -rank-by core -bind-to core
test_FEMBEM...
- { np: 1, nt: 1, map: "core", rank: "core", bind: "core" } - { np: 6, nt: 1, map: "core", rank: "core", bind: "core" } - { np: 12, nt: 1, map: "core", rank: "core", bind: "core" } - { np: 18, nt: 1, map: "core", rank: "core", bind: "core" } - { np: 24, nt: 1, map: "core", rank: "core", bind: "core" }
Finally, we have to set solver options (see the options
value) and configure
the validation phase.
Tasks: - options: "-z -nbrhs 50 -withmpf {solver[options]} -nbpts {solver[nbpts]}" Validations: - id: "va-scalability-{solver[name]}-{parallel[map]}-\ {parallel[np]}x{parallel[nt]}-{solver[nbpts]}" launch_command: "{@job_creation[va_executable]} -K solver={solver[name]}"
We define another two separate scalability benchmarks for HMAT as the MPI parallelization of the latter is not adapted for execution on one single node and consequently not studied here.
- id: "scalability-{solver[name]}-{parallel[map]}-\ {parallel[np]}x{parallel[nt]}-{solver[nbpts]}" template_files: *SCALA template_instantiation: scheduler: - { prefix: "scalability", platform: "plafrim", family: "miriel", nodes: 1, tasks: 24, time: "0-14:00:00" } solver: - { name: "hmat-bem", nbpts: 100000, options: "--bem --hmat --hmat-eps-assemb 1e-3 --hmat-eps-recompr 1e-3" } - { name: "hmat-bem", nbpts: 1000000, options: "--bem --hmat --hmat-eps-assemb 1e-3 --hmat-eps-recompr 1e-3" } - { name: "hmat-fem", nbpts: 2000000, options: "--fem --hmat --hmat-eps-assemb 1e-3 --hmat-eps-recompr 1e-3 --no-hmat-nd" }
We consider only one kind of parallel configuration here: 1 MPI process mapped and ranked by node and 1 to 24 OpenMP and MKL threads and StarPU workers.
parallel: - { np: 1, nt: 1, map: "node", rank: "node", bind: "none" } - { np: 1, nt: 6, map: "node", rank: "node", bind: "none" } - { np: 1, nt: 12, map: "node", rank: "node", bind: "none" } - { np: 1, nt: 18, map: "node", rank: "node", bind: "none" } - { np: 1, nt: 24, map: "node", rank: "node", bind: "none" } Tasks: - options: "-z -nbrhs 50 -withmpf {solver[options]} -nbpts {solver[nbpts]}" Validations: - id: "va-scalability-{solver[name]}-{parallel[map]}-\ {parallel[np]}x{parallel[nt]}-{solver[nbpts]}" launch_command: "{@job_creation[va_executable]} -K solver={solver[name]}"
By the means of an another HMAT scalability benchmark, we run the solver on one BEM and one FEM system in three different parallel configurations and generate an FXT execution trace for each one of them. Here, we consider smaller systems (25,000 unknowns for BEM and 50,000 unknowns for FEM) on a lower count of cores (4 cores maximum) than in the previous benchmark. The goal is to show the difference in performance between an exclusively StarPU-parallelized execution and an execution involving multiple MPI processes on one single node.
- id: "fxt-scalability-{solver[name]}-{parallel[map]}-\ {parallel[np]}x{parallel[nt]}-{solver[nbpts]}" template_files: [ "wrapper-fxt", "scalability" ] template_instantiation: scheduler: - { prefix: "fxt-scalability", platform: "plafrim", family: "miriel", nodes: 1, tasks: 4, time: "0-00:30:00" } solver: - { name: "hmat-bem", nbpts: 25000, options: "--bem --hmat --hmat-eps-assemb 1e-3 --hmat-eps-recompr 1e-3" } - { name: "hmat-fem", nbpts: 50000, options: "--fem --hmat --hmat-eps-assemb 1e-3 --hmat-eps-recompr 1e-3 --no-hmat-nd" }
We consider three parallel configurations here:
- 1 MPI process mapped and ranked by node and 4 OpenMP and MKL threads and StarPU workers;
- 2 MPI processes mapped, ranked and bound to sockets and 2 OpenMP and MKL threads and StarPU workers;
- 4 MPI processes mapped, ranked and bound to cores.
parallel: - { np: 1, nt: 4, map: "node", rank: "node", bind: "none" } - { np: 2, nt: 2, map: "socket", rank: "socket", bind: "socket" } - { np: 4, nt: 1, map: "core", rank: "core", bind: "core" }
In this case, we have an extra validation task to perform the analysis of produced FXT traces by the StarVZ library of R (see Section 4.2).
Tasks: - options: "-z -nbrhs 50 -withmpf {solver[options]} -nbpts {solver[nbpts]}" Validations: - id: "va-1-fxt-scalability-{solver[name]}-{parallel[map]}-\ {parallel[np]}x{parallel[nt]}-{solver[nbpts]}" launch_command: "{@job_creation[va_executable]} -K solver={solver[name]}" - id: "va-2-fxt-scalability-{solver[name]}-{parallel[map]}-\ {parallel[np]}x{parallel[nt]}-{solver[nbpts]}" launch_command: "phase1-workflow.sh $(pwd)/ && phase2-workflow.R $(pwd)/ $GUIX_ENVIRONMENT/site-library/starvz/etc/default.yaml"
3.5.2. energy_scope
benchmarks
To experiment energy profiling of the solvers we use the energy_scope
tool
(see Section 3.7). In the first place, we want to measure the
potential overhead energy_scope
may have on the performance of test_FEMBEM
.
We consider a coupled FEM/BEM system with a fixed size of 1,000,000 unknowns. We
use MUMPS as sparse and either SPIDO or HMAT as dense solver while applying the
multi-solve two-stage implementation scheme. Low-rank compression is enabled and
the precision parameter ε is set to 10-3. The number of right-hand
sides in the system is set to 150 for a more representative final solve phase.
We execute the benchmarks both with and without energy_scope
.
- pack_id: "energy_scope" description: "energy_scope benchmarks." Tests: - id: "es-overhead-{es[switch]}-{es[frequency]}-{es[interval]}-\ {es[profile]}-{dense[solver]}-{iteration}" template_files: ["wrapper-in-core", "monobatch", "es"] template_instantiation: scheduler: - { prefix: "es-overhead", platform: "plafrim", family: "miriel", nodes: 1, tasks: 24, time: "0-12:00:00" } parallel: - *PARALLEL_DEFAULT dense: &ES_DENSE - { solver: "spido", options: "" } - { solver: "hmat", options: "--mumps-nbcols-schur-hmat 512 --hmat --hmat-eps-assemb 1e-3 --hmat-eps-recompr 1e-3 --coupled-mumps-hmat" } es: - { switch: "on", frequency: "1Hz", interval: 1000, idle_time: 0.99, profile: "e", executable: "~/energy_scope/energy_scope_slurm.sh mpirun --report-bindings" } - { switch: "on", frequency: "2Hz", interval: 500, idle_time: 0.49, profile: "et", executable: "~/energy_scope/energy_scope_slurm.sh mpirun --report-bindings" } - { switch: "off", frequency: "1Hz", interval: 1000, idle_time: 0.99, profile: "e", executable: "mpirun" } iteration: [ 1, 2, 3 ] Tasks: - executable: "mkdir -p wrk && mv es_config.json wrk/es_config.json && export ENERGY_SCOPE_TRACES_PATH=$(pwd) && {es[executable]}" options: "-d -nbrhs 50 --fembem -withmpf --coupled --mumps-verbose --mumps-blr --mumps-blr-variant 1 --mumps-blr-accuracy 1e-3 --mumps-multi-solve -nbrhsmumps 256 -nbpts 1000000 {dense[options]} --energy-scope-tags" Validations: - id: "va-es-overhead-{es[switch]}-{es[frequency]}-\ {es[interval]}-{es[profile]}-{dense[solver]}-{iteration}" launch_command: "{@job_creation[va_executable]} -K solver=mumps/{dense[solver]},energy_scope={es[switch]} -K es_frequency={es[frequency]},es_profile={es[profile]}" - id: "va-es-overhead-eprofile-{es[switch]}-{es[frequency]}-\ {es[interval]}-{es[profile]}-{dense[solver]}-{iteration}" launch_command: "export ENERGY_SCOPE_TRACES_PATH=$(pwd) && ~/energy_scope/energy_scope_run_analysis_all.sh"
The next stage cosists of creating the energetic profile of test_FEMBEM
when
solving purely sparse FEM and purely dense BEM systems, respectively. In this
case, we consider a FEM and a BEM system of fixed size proportional to
corresponding coupled FEM/BEM test cases with \(1,000,000 \leq N \leq 5,000,000\).
We use MUMPS, both with and without compression, to solve the FEM systems and
SPIDO or HMAT to solve the BEM systems. When the compression is enabled, the
precision parameter ε is set to 10-3. The number of right-hand sides
is set to 150 to ensure a more representative solve phase.
- id: "es-fem-{job[batch]}-{nbpts}" template_files: ["wrapper-in-core", "polybatch", "es"] template_instantiation: scheduler: - { prefix: "es-fem", platform: "plafrim", family: "miriel", nodes: 1, tasks: 24, time: "0-08:00:00" } parallel: - *PARALLEL_DEFAULT es: &ES_DEFAULT - { interval: 1000, idle_time: 0.99, profile: "e", executable: "~/energy_scope/energy_scope_slurm.sh mpirun --report-bindings" } job: - { batch: 1, options: "--no-mumps-blr" } - { batch: 2, options: "--mumps-blr --mumps-blr-variant 1 --mumps-blr-accuracy 1e-3" } nbpts: [ 962831, 2922756, 4891562, 6864384 ] Tasks: - executable: "mkdir -p wrk && mv es_config.json wrk/es_config.json && export ENERGY_SCOPE_TRACES_PATH=$(pwd) && {es[executable]}" options: "-d -nbrhs 150 -nbrhsmumps 150 -withmpf --fem --mumps-verbose {job[options]} -nbpts {nbpts} --energy-scope-tags" Validations: - id: "va-es-fem-{job[batch]}-{nbpts}" launch_command: "{@job_creation[va_executable]} -K solver=mumps" - id: "va-es-fem-eprofile-{job[batch]}-{nbpts}" launch_command: "export ENERGY_SCOPE_TRACES_PATH=$(pwd) && ~/energy_scope/energy_scope_run_analysis_all.sh" - id: "es-bem-{job[batch]}-{nbpts}" template_files: ["wrapper-in-core", "polybatch", "es"] template_instantiation: scheduler: - { prefix: "es-bem", platform: "plafrim", family: "miriel", nodes: 1, tasks: 24, time: "0-08:00:00" } parallel: - *PARALLEL_DEFAULT es: *ES_DEFAULT job: - { batch: 1, solver: "spido", options: "" } - { batch: 2, solver: "hmat", options: "--hmat --hmat-eps-assemb 1e-3 --hmat-eps-recompr 1e-3" } nbpts: [ 37169, 77244, 108438, 135616 ] Tasks: - executable: "mkdir -p wrk && mv es_config.json wrk/es_config.json && export ENERGY_SCOPE_TRACES_PATH=$(pwd) && {es[executable]}" options: "-d -nbrhs 150 -withmpf --bem {job[options]} -nbpts {nbpts} --energy-scope-tags" Validations: - id: "va-es-bem-{job[batch]}-{nbpts}" launch_command: "{@job_creation[va_executable]} -K solver={job[solver]}" - id: "va-es-bem-eprofile-{job[batch]}-{nbpts}" launch_command: "export ENERGY_SCOPE_TRACES_PATH=$(pwd) && ~/energy_scope/energy_scope_run_analysis_all.sh"
For the energetic profiling of test_FEMBEM
when solving coupled systems, we
consider both the multi-solve and the multi-factorization two-stage
implementation schemes. We profile the schemes a system with \(N\) = 1,000,000.
The number of right-hand sides is set to 150 for a more representative final
solve phase. We measure the impact of low-rank compression on the dense part but
the compression is always enabled for the sparse part. When applicable, the
precision parameter ε is set to 10-3.
- id: "es-coupled-single-node-{job[scheme]}-{job[solver]}-{job[config]}" template_files: ["wrapper-in-core", "monobatch", "es"] template_instantiation: scheduler: - { prefix: "es-coupled-single-node", platform: "plafrim", family: "miriel", nodes: 1, tasks: 24, time: "0-03:00:00" } parallel: - *PARALLEL_DEFAULT es: *ES_DEFAULT job: - { scheme: "multi-solve", solver: "spido", config: "opti", options: "--mumps-multi-solve -nbrhsmumps 256" } - { scheme: "multi-solve", solver: "hmat", config: "ssize", options: "--mumps-multi-solve -nbrhsmumps 256 --mumps-nbcols-schur-hmat 256 --hmat --hmat-eps-assemb 1e-3 --hmat-eps-recompr 1e-3 --coupled-mumps-hmat" } - { scheme: "multi-solve", solver: "hmat", config: "opti", options: "--mumps-multi-solve -nbrhsmumps 256 --mumps-nbcols-schur-hmat 1024 --hmat --hmat-eps-assemb 1e-3 --hmat-eps-recompr 1e-3 --coupled-mumps-hmat" } - { scheme: "multi-facto", solver: "spido", config: "opti", options: "--mumps-multi-facto --mumps-sizeschur 12390" } - { scheme: "multi-facto", solver: "hmat", config: "opti", options: "--mumps-multi-facto --mumps-sizeschur 12390 --hmat --hmat-eps-assemb 1e-3 --hmat-eps-recompr 1e-3 --coupled-mumps-hmat" } Tasks: - executable: "mkdir -p wrk && mv es_config.json wrk/es_config.json && export ENERGY_SCOPE_TRACES_PATH=$(pwd) && {es[executable]}" options: "-d -nbrhs 50 -nbpts 1000000 -withmpf --fembem --coupled --mumps-verbose --mumps-blr --mumps-blr-variant 1 --mumps-blr-accuracy 1e-3 {job[options]} --energy-scope-tags" launch_command: &ES_LC_TIMELINE "chmod +x wrapper.sh && {@job_creation[executable]} {@job_creation[nprocs]} $SINGULARITY_EXEC $SINGULARITY_IMAGE bash ./wrapper.sh likwid-perfctr -g FLOPS_AVX -g MEM -o likwid-%r.csv -O -t 500ms python3 ../../../scripts/rss.py test_FEMBEM -radius 2 -height 4 {@job_creation[options]} 2>&1 | tee stdout.log | tee slurm-$SLURM_JOBID.out" Validations: - id: "va-es-coupled-single-node-{job[scheme]}-{job[solver]}-\ {job[config]}" launch_command: "{@job_creation[va_executable]} -K solver=mumps/{job[solver]}" - id: "va-es-coupled-single-node-eprofile-{job[scheme]}-\ {job[solver]}-{job[config]}" launch_command: "export ENERGY_SCOPE_TRACES_PATH=$(pwd) && ~/energy_scope/energy_scope_run_analysis_all.sh" - id: "es-coupled-gc-single-node-{job[batch]}-{job[scheme]}-\ {job[solver]}-{job[nbpts]}" template_files: ["wrapper-in-core", "polybatch", "es"] template_instantiation: scheduler: - { prefix: "es-coupled-gc-single-node", platform: "plafrim", family: "miriel", nodes: 1, tasks: 24, time: "0-18:00:00" } parallel: - *PARALLEL_DEFAULT es: *ES_DEFAULT job: # Multi-solve # MUMPS/SPIDO - { batch: 1, scheme: "multi-solve", solver: "spido", nbpts: 1000000, options: "--mumps-multi-solve -nbrhsmumps 256" } - { batch: 1, scheme: "multi-solve", solver: "spido", nbpts: 3000000, options: "--mumps-multi-solve -nbrhsmumps 256" } - { batch: 1, scheme: "multi-solve", solver: "spido", nbpts: 5000000, options: "--mumps-multi-solve -nbrhsmumps 256" } # MUMPS/HMAT - { batch: 1, scheme: "multi-solve", solver: "hmat", nbpts: 1000000, options: "--mumps-multi-solve -nbrhsmumps 256 --mumps-nbcols-schur-hmat 1024 --hmat --hmat-eps-assemb 1e-3 --hmat-eps-recompr 1e-3 --coupled-mumps-hmat" } - { batch: 1, scheme: "multi-solve", solver: "hmat", nbpts: 3000000, options: "--mumps-multi-solve -nbrhsmumps 256 --mumps-nbcols-schur-hmat 1024 --hmat --hmat-eps-assemb 1e-3 --hmat-eps-recompr 1e-3 --coupled-mumps-hmat" } - { batch: 1, scheme: "multi-solve", solver: "hmat", nbpts: 5000000, options: "--mumps-multi-solve -nbrhsmumps 256 --mumps-nbcols-schur-hmat 1024 --hmat --hmat-eps-assemb 1e-3 --hmat-eps-recompr 1e-3 --coupled-mumps-hmat" } # Multi-factorization # MUMPS/SPIDO - { batch: 2, scheme: "multi-facto", solver: "spido", nbpts: 1000000, options: "--mumps-multi-facto --mumps-sizeschur 12390" } - { batch: 2, scheme: "multi-facto", solver: "spido", nbpts: 1500000, options: "--mumps-multi-facto --mumps-sizeschur 16250" } - { batch: 2, scheme: "multi-facto", solver: "spido", nbpts: 2000000, options: "--mumps-multi-facto --mumps-sizeschur 19683" } # MUMPS/HMAT - { batch: 2, scheme: "multi-facto", solver: "hmat", nbpts: 1000000, options: "--mumps-multi-facto --mumps-sizeschur 12390 --hmat --hmat-eps-assemb 1e-3 --hmat-eps-recompr 1e-3 --coupled-mumps-hmat" } - { batch: 2, scheme: "multi-facto", solver: "hmat", nbpts: 1500000, options: "--mumps-multi-facto --mumps-sizeschur 16250 --hmat --hmat-eps-assemb 1e-3 --hmat-eps-recompr 1e-3 --coupled-mumps-hmat" } - { batch: 2, scheme: "multi-facto", solver: "hmat", nbpts: 2000000, options: "--mumps-multi-facto --mumps-sizeschur 19683 --hmat --hmat-eps-assemb 1e-3 --hmat-eps-recompr 1e-3 --coupled-mumps-hmat" } Tasks: - executable: "mkdir -p wrk && mv es_config.json wrk/es_config.json && export ENERGY_SCOPE_TRACES_PATH=$(pwd) && {es[executable]}" options: "-d -nbrhs 50 -nbpts {job[nbpts]} -withmpf --fembem --coupled --mumps-verbose --mumps-blr --mumps-blr-variant 1 --mumps-blr-accuracy 1e-3 {job[options]} --energy-scope-tags" Validations: - id: "va-es-coupled-gc-single-node-{job[batch]}-{job[scheme]}-\ {job[solver]}-{job[nbpts]}" launch_command: "{@job_creation[va_executable]} -K solver=mumps/{job[solver]}" - id: "va-es-coupled-gc-single-node-eprofile-{job[batch]}-\ {job[scheme]}-{job[solver]}-{job[nbpts]}" launch_command: "export ENERGY_SCOPE_TRACES_PATH=$(pwd) && ~/energy_scope/energy_scope_run_analysis_all.sh" - id: "es-coupled-multi-node-{job[batch]}-{job[scheme]}-{job[solver]}" template_files: ["wrapper-in-core-distributed", "polybatch-distributed", "es"] template_instantiation: scheduler: - { prefix: "es-coupled-multi-node", platform: "plafrim", family: "miriel", constraint: "&omnipath", nodes: 4, nt: 1, np: 1, time: "0-03:00:00" } parallel: - *PARALLEL_DEFAULT es: *ES_DEFAULT job: - { batch: 1, scheme: "multi-solve", solver: "spido", options: "--mumps-multi-solve -nbrhsmumps 256" } - { batch: 1, scheme: "multi-solve", solver: "hmat", options: "--mumps-multi-solve -nbrhsmumps 256 --mumps-nbcols-schur-hmat 1024 --hmat --hmat-eps-assemb 1e-3 --hmat-eps-recompr 1e-3 --coupled-mumps-hmat" } - { batch: 2, scheme: "multi-facto", solver: "spido", options: "--mumps-multi-facto --mumps-sizeschur 19638" } - { batch: 2, scheme: "multi-facto", solver: "hmat", options: "--mumps-multi-facto --mumps-sizeschur 19638 --hmat --hmat-eps-assemb 1e-3 --hmat-eps-recompr 1e-3 --coupled-mumps-hmat" } Tasks: - nprocs: "-np {scheduler[nodes]} -map-by ppr:{scheduler[np]}:{parallel[map]} -rank-by {parallel[rank]} -bind-to {parallel[bind]}" executable: "mkdir -p wrk && mv es_config.json wrk/es_config.json && export ENERGY_SCOPE_TRACES_PATH=$(pwd) && {es[executable]}" options: "-d -nbrhs 50 -nbpts 2000000 -withmpf --fembem --coupled --mumps-verbose --mumps-blr --mumps-blr-variant 1 --mumps-blr-accuracy 1e-3 {job[options]} --energy-scope-tags" launch_command: *ES_LC_TIMELINE Validations: - id: "va-es-coupled-multi-node-{job[batch]}-{job[scheme]}-\ {job[solver]}" launch_command: "{@job_creation[va_executable]} -K solver=mumps/{job[solver]}" - id: "va-es-coupled-multi-node-eprofile-{job[batch]}-\ {job[scheme]}-{job[solver]}" launch_command: "export ENERGY_SCOPE_TRACES_PATH=$(pwd) && ~/energy_scope/energy_scope_run_analysis_all.sh" - id: "es-coupled-debug-multi-facto-{parallel[np]}" template_files: ["wrapper-in-core", "monobatch", "es"] template_instantiation: scheduler: - { prefix: "es-coupled-debug-multi-facto", platform: "plafrim", family: "miriel", nodes: 1, tasks: 24, time: "0-12:00:00" } parallel: - { np: 4, nt: 6, map: "numa", rank: "numa", bind: "numa" } - { np: 2, nt: 12, map: "socket", rank: "socket", bind: "socket" } - { np: 1, nt: 24, map: "node", rank: "node", bind: "none" } es: *ES_DEFAULT job: - { options: "--mumps-multi-facto --mumps-sizeschur 19638 --hmat --hmat-eps-assemb 1e-3 --hmat-eps-recompr 1e-3 --coupled-mumps-hmat" } Tasks: - executable: "mkdir -p wrk && mv es_config.json wrk/es_config.json && export ENERGY_SCOPE_TRACES_PATH=$(pwd) && {es[executable]}" options: "-d -nbrhs 50 -nbpts 2000000 -withmpf --fembem --coupled --mumps-verbose --mumps-blr --mumps-blr-variant 1 --mumps-blr-accuracy 1e-3 {job[options]} --energy-scope-tags" launch_command: "chmod +x wrapper.sh && {@job_creation[executable]} {@job_creation[nprocs]} $SINGULARITY_EXEC $SINGULARITY_IMAGE bash ./wrapper.sh likwid-perfctr -g FLOPS_AVX -o likwid-%r.csv -O -t 1s python3 ../../../scripts/rss.py test_FEMBEM -radius 2 -height 4 {@job_creation[options]} 2>&1 | tee stdout.log | tee slurm-$SLURM_JOBID.out" Validations: - id: "va-es-coupled-debug-multi-facto-{parallel[np]}" launch_command: "{@job_creation[va_executable]} -K solver=mumps/hmat" - id: "va-es-coupled-debug-multi-facto-eprofile-{parallel[np]}" launch_command: "export ENERGY_SCOPE_TRACES_PATH=$(pwd) && ~/energy_scope/energy_scope_run_analysis_all.sh" - id: "es-industrial-{job[batch]}-{job[scheme]}-{job[solver]}" template_files: ["wrapper-in-core-distributed", "polybatch-distributed", "es"] template_instantiation: scheduler: - { prefix: "es-industrial", platform: "plafrim", family: "miriel", constraint: "&omnipath", nodes: 4, nt: 24, np: 1, time: "1-00:00:00" } parallel: - *PARALLEL_DEFAULT es: *ES_DEFAULT job: - { batch: 1, scheme: "multi-solve", solver: "spido", options: "--mumps-multi-solve -nbrhsmumps 256" } - { batch: 2, scheme: "multi-solve", solver: "hmat", options: "--mumps-multi-solve -nbrhsmumps 256 --mumps-nbcols-schur-hmat 1024 --hmat --hmat-eps-assemb 1e-3 --hmat-eps-recompr 1e-3 --coupled-mumps-hmat" } - { batch: 3, scheme: "multi-facto", solver: "spido", options: "--mumps-multi-facto --mumps-sizeschur 16885" } - { batch: 4, scheme: "multi-facto", solver: "hmat", options: "--mumps-multi-facto --mumps-sizeschur 16885 --hmat --hmat-eps-assemb 1e-3 --hmat-eps-recompr 1e-3 --coupled-mumps-hmat" } Tasks: - nprocs: "-np {scheduler[nodes]} -map-by ppr:{scheduler[np]}:{parallel[map]} -rank-by {parallel[rank]} -bind-to {parallel[bind]}" executable: "mkdir -p wrk && mv es_config.json wrk/es_config.json && export ENERGY_SCOPE_TRACES_PATH=$(pwd) && {es[executable]}" launch_command: "chmod +x wrapper.sh && {@job_creation[executable]} {@job_creation[nprocs]} $SINGULARITY_EXEC $SINGULARITY_IMAGE bash ./wrapper.sh python3 ../../../scripts/rss.py actipole -p01 EPI_FAN_rwd_isolated_FEM_BEM.p01 -cprec -potential {@job_creation[options]} 2>&1 | tee stdout.log | tee slurm-$SLURM_JOBID.out" options: "-withmpf --fembem --coupled --mumps-verbose --mumps-blr --mumps-blr-variant 1 --mumps-blr-accuracy 1e-3 {job[options]} --energy-scope-tags --putenv HMAT_DISABLE_OOC=1 --putenv MPF_SPIDO_INCORE=1 --mumpsic" Validations: - id: "va-es-industrial-{job[batch]}-{job[scheme]}-\ {job[solver]}" launch_command: "{@job_creation[va_executable]} -K solver=mumps/{job[solver]}" - id: "va-es-industrial-eprofile-{job[batch]}-\ {job[scheme]}-{job[solver]}" launch_command: "export ENERGY_SCOPE_TRACES_PATH=$(pwd) && ~/energy_scope/energy_scope_run_analysis_all.sh"
3.5.3. Out-of-core benchmarks
- pack_id: "ooc" description: "Out-of-core benchmarks." default_values: task: launch_command: "chmod +x wrapper.sh && {@job_creation[executable]} {@job_creation[nprocs]} $SINGULARITY_EXEC $SINGULARITY_IMAGE bash ./wrapper.sh python3 ../../../scripts/rss.py --with-hdd test_FEMBEM -radius 2 -height 4 {@job_creation[options]} 2>&1 | tee stdout.log | tee slurm-$SLURM_JOBID.out" Tests:
The goal is to analyze the impact of out-of-core computation on the two-stage
implementation schemes multi-solve and multi-factorization using MUMPS as sparse
solver and SPIDO or HMAT as dense solver. We vary problem's unknown count (see
the nbpts
key in the job
map). For multi-solve, we vary also the count of
right-hand sides to be processed at once by MUMPS during the Schur complement
computation (see the nbrhs
key in the job
map). The maps sparse
and
dense
defines the options for the sparse and the dense solver respectively.
- id: "ooc-multi-solve-{job[batch]}-{sparse[name]}-{dense[name]}-\ {dense[ooc]}-{job[nbrhs]}-{job[nbpts]}" template_files: &COUPLED_OOC [ "wrapper-ooc", "coupled-ooc" ] template_instantiation: scheduler: - { prefix: "ooc-multi-solve", platform: "plafrim", family: "miriel", constraint: "", exclude: "", nodes: 1, tasks: 24, time: "3-00:00:00" } parallel: - *PARALLEL_DEFAULT job: # N = 1M - { nbpts: 1000000, nbrhs: 128, batch: 1 } - { nbpts: 1000000, nbrhs: 256, batch: 1 } - { nbpts: 1000000, nbrhs: 512, batch: 1 } # N = 3M - { nbpts: 3000000, nbrhs: 128, batch: 1 } - { nbpts: 3000000, nbrhs: 256, batch: 1 } - { nbpts: 3000000, nbrhs: 512, batch: 1 } # N = 7M - { nbpts: 7000000, nbrhs: 128, batch: 2 } - { nbpts: 7000000, nbrhs: 256, batch: 3 } - { nbpts: 7000000, nbrhs: 512, batch: 4 } # N = 9M - { nbpts: 9000000, nbrhs: 128, batch: 5 } - { nbpts: 9000000, nbrhs: 256, batch: 6 } - { nbpts: 9000000, nbrhs: 512, batch: 7 } # N = 11M - { nbpts: 11000000, nbrhs: 128, batch: 8 } - { nbpts: 11000000, nbrhs: 256, batch: 9 } - { nbpts: 11000000, nbrhs: 512, batch: 10 } sparse: - { name: "mumps", options: "-mumpsooc --mumps-verbose --mumps-blr --mumps-blr-variant 1 --mumps-blr-accuracy 1e-3 --mumps-multi-solve -nbrhsmumps" } dense: - { name: "spido", ooc: 1 } - { name: "spido", ooc: 0 } Tasks: - options: "-d -nbrhs 50 --fembem -withmpf --coupled {sparse[options]} {job[nbrhs]} -nbpts {job[nbpts]}" Validations: - id: "va-ooc-multi-solve-{job[batch]}-{sparse[name]}-\ {dense[name]}-{dense[ooc]}-{job[nbrhs]}-{job[nbpts]}" launch_command: "{@job_creation[va_executable]} -K dense_ooc={dense[ooc]},solver={sparse[name]}/{dense[name]}" - id: "ooc-multi-solve-{job[batch]}-{sparse[name]}-{dense[name]}-\ {dense[ooc]}-{job[nbrhs]}-{job[schur]}-{job[nbpts]}" template_files: *COUPLED_OOC template_instantiation: scheduler: - { prefix: "ooc-multi-solve", platform: "plafrim", family: "miriel", constraint: "", exclude: "", nodes: 1, tasks: 24, time: "3-00:00:00" } parallel: - *PARALLEL_DEFAULT job: # N = 1M - { nbpts: 1000000, schur: 256, nbrhs: 256, batch: 1 } - { nbpts: 1000000, schur: 512, nbrhs: 256, batch: 1 } - { nbpts: 1000000, schur: 1024, nbrhs: 256, batch: 1 } # N = 3M - { nbpts: 3000000, schur: 256, nbrhs: 256, batch: 1 } - { nbpts: 3000000, schur: 512, nbrhs: 256, batch: 1 } - { nbpts: 3000000, schur: 1024, nbrhs: 256, batch: 1 } # N = 7M - { nbpts: 7000000, schur: 256, nbrhs: 256, batch: 2 } - { nbpts: 7000000, schur: 512, nbrhs: 256, batch: 3 } - { nbpts: 7000000, schur: 1024, nbrhs: 256, batch: 4 } # N = 9M - { nbpts: 9000000, schur: 256, nbrhs: 256, batch: 5 } - { nbpts: 9000000, schur: 512, nbrhs: 256, batch: 6 } - { nbpts: 9000000, schur: 1024, nbrhs: 256, batch: 7 } # N = 11M - { nbpts: 11000000, schur: 256, nbrhs: 256, batch: 8 } - { nbpts: 11000000, schur: 512, nbrhs: 256, batch: 9 } - { nbpts: 11000000, schur: 1024, nbrhs: 256, batch: 10 } sparse: - { name: "mumps", options: "-mumpsooc --mumps-verbose --mumps-blr --mumps-blr-variant 1 --mumps-blr-accuracy 1e-3 --mumps-multi-solve -nbrhsmumps" } dense: - { name: "hmat", ooc: 1} - { name: "hmat", ooc: 0} Tasks: - options: "-d -nbrhs 50 --fembem -withmpf --coupled {sparse[options]} {job[nbrhs]} --mumps-nbcols-schur-hmat {job[schur]} --hmat --hmat-eps-assemb 1e-3 --hmat-eps-recompr 1e-3 --coupled-mumps-hmat -nbpts {job[nbpts]}" Validations: - id: "va-ooc-multi-solve-{job[batch]}-{sparse[name]}-\ {dense[name]}-{dense[ooc]}-{job[nbrhs]}-{job[schur]}-\ {job[nbpts]}" launch_command: "{@job_creation[va_executable]} -K dense_ooc={dense[ooc]},solver={sparse[name]}/{dense[name]}"
For multi-factorization, we vary the size of blocks of the Schur complement
matrix during the computation (see the schur
key in the job
map) and use the
IOblock
key under job
to set the size of a disk block when using SPIDO as
dense solver. This is to ensure that the value is a multiple of the chosen Schur
complement block size.
- id: "ooc-multi-factorization-{job[batch]}-{sparse[name]}-{dense[name]}-\ {dense[ooc]}-{job[nbpts]}-{job[schur]}" template_files: *COUPLED_OOC template_instantiation: scheduler: - { prefix: "ooc-multi-factorization", platform: "plafrim", family: "miriel", constraint: "", exclude: "", nodes: 1, tasks: 24, time: "3-00:00:00" } parallel: - *PARALLEL_DEFAULT job: # N = 1.00M, n_BEM = 37,169 - { nbpts: 1000000, schur: 3379, nb: 11, batch: 1 } - { nbpts: 1000000, schur: 5310, nb: 7, batch: 1 } - { nbpts: 1000000, schur: 12390, nb: 3, batch: 1 } # N = 1.25M, n_BEM = 53,841 - { nbpts: 1750000, schur: 4895, nb: 11, batch: 2 } - { nbpts: 1750000, schur: 7692, nb: 7, batch: 2 } - { nbpts: 1750000, schur: 17943, nb: 3, batch: 3 } # N = 2.50M, n_BEM = 68,524 - { nbpts: 2500000, schur: 6230, nb: 11, batch: 3 } - { nbpts: 2500000, schur: 9790, nb: 7, batch: 4 } - { nbpts: 2500000, schur: 22842, nb: 3, batch: 4 } sparse: - { name: "mumps", options: "-mumpsooc --mumps-verbose --mumps-blr --mumps-blr-variant 1 --mumps-blr-accuracy 1e-3 --mumps-multi-facto" } dense: - { name: "spido", ooc: 1, options: "" } - { name: "spido", ooc: 0, options: "" } - { name: "hmat", ooc: 1, options: "--hmat --hmat-eps-assemb 1e-3 --hmat-eps-recompr 1e-3 --coupled-mumps-hmat" } - { name: "hmat", ooc: 0, options: "--hmat --hmat-eps-assemb 1e-3 --hmat-eps-recompr 1e-3 --coupled-mumps-hmat" } Tasks: - options: "-d -nbrhs 50 --fembem -withmpf --coupled {sparse[options]} {dense[options]} --mumps-sizeschur {job[schur]} -nbpts {job[nbpts]}" Validations: - id: "va-ooc-multi-factorization-{job[batch]}-{sparse[name]}-\ {dense[name]}-{dense[ooc]}-{job[nbpts]}-{job[schur]}" launch_command: "{@job_creation[va_executable]} -K dense_ooc={dense[ooc]},solver={sparse[name]}/{dense[name]} -K n_blocks={job[nb]}"
3.5.4. Multi-node parallel distributed benchmarks
- pack_id: "distributed" description: "Multi-node parallel distributed benchmarks." Tests:
The goal here is to analyze the impact of computation in a distributed memory
environment on the two-stage implementation schemes multi-solve and
multi-factorization using MUMPS as sparse solver and SPIDO or HMAT as dense
solver. We vary problem's unknown count (see the nbpts
key in the job
map).
For multi-solve, we vary also count of right-hand sides to be processed at once
by MUMPS during the Schur complement computation (see the nbrhs
key in the
job
map). For multi-factorization, we vary the size of blocks of the Schur
complement matrix during the computation (see the schur
key in the job
map)
and use the IOblock
key under job
to set the size of a disk block when using
SPIDO as dense solver. This is to ensure that the value is a multiple of the
chosen Schur complement block size. The maps sparse
and dense
defines the
options for the sparse and the dense solver respectively.
- id: "distributed-multi-solve-{sparse[name]}-{dense[name]}-\ {scheduler[nodes]}-{job[nbpts]}" template_files: &DISTRIBUTED [ "wrapper-in-core-distributed", "coupled-distributed" ] template_instantiation: scheduler: - { prefix: "distributed-multi-solve", platform: "plafrim", family: "miriel", constraint: "&omnipath", nodes: 2, nt: 24, np: 1, time: "1-12:00:00" } - { prefix: "distributed-multi-solve", platform: "plafrim", family: "miriel", constraint: "&omnipath", nodes: 4, nt: 24, np: 1, time: "1-12:00:00" } - { prefix: "distributed-multi-solve", platform: "plafrim", family: "miriel", constraint: "&omnipath", nodes: 8, nt: 24, np: 1, time: "1-12:00:00" } parallel: - *PARALLEL_DEFAULT job: - { nbpts: 1000000 } - { nbpts: 2000000 } - { nbpts: 4000000 } - { nbpts: 7000000 } - { nbpts: 9000000 } sparse: - { name: "mumps", options: "--mumps-verbose --mumps-blr --mumps-blr-variant 1 --mumps-blr-accuracy 1e-3 --mumps-multi-solve -nbrhsmumps 256" } dense: - { name: "spido", options: ""} - { name: "hmat", options: "--hmat --hmat-eps-assemb 1e-3 --hmat-eps-recompr 1e-3 --coupled-mumps-hmat 1024"} Tasks: - nprocs: "-np {scheduler[nodes]} -map-by ppr:{scheduler[np]}:{parallel[map]} -rank-by {parallel[rank]} -bind-to {parallel[bind]}" options: "-d -nbrhs 50 --fembem -withmpf --coupled {sparse[options]} {dense[options]} -nbpts {job[nbpts]}" Validations: - id: "va-distributed-multi-solve-{sparse[name]}-{dense[name]}-\ {scheduler[nodes]}-{job[nbpts]}" launch_command: "{@job_creation[va_executable]} -K solver={sparse[name]}/{dense[name]}" - id: "distributed-multi-factorization-{sparse[name]}-{dense[name]}-\ {scheduler[nodes]}-{job[nbpts]}" template_files: *DISTRIBUTED template_instantiation: scheduler: - { prefix: "distributed-multi-factorization", platform: "plafrim", family: "miriel", constraint: "&omnipath", nodes: 2, nt: 24, np: 1, time: "1-12:00:00" } - { prefix: "distributed-multi-factorization", platform: "plafrim", family: "miriel", constraint: "&omnipath", nodes: 4, nt: 24, np: 1, time: "1-12:00:00" } - { prefix: "distributed-multi-factorization", platform: "plafrim", family: "miriel", constraint: "&omnipath", nodes: 8, nt: 24, np: 1, time: "1-12:00:00" } parallel: - *PARALLEL_DEFAULT job: - { nbpts: 1000000, schur: 37170, IOblock: 3717, nb: 1 } - { nbpts: 1500000, schur: 24375, IOblock: 4875, nb: 2 } - { nbpts: 2000000, schur: 29456, IOblock: 4208, nb: 2 } - { nbpts: 2500000, schur: 13705, IOblock: 2741, nb: 5 } sparse: - { name: "mumps", options: "--mumps-verbose --mumps-blr --mumps-blr-variant 1 --mumps-blr-accuracy 1e-3 --mumps-multi-facto" } dense: - { name: "spido", options: "" } - { name: "hmat", options: "--hmat --hmat-eps-assemb 1e-3 --hmat-eps-recompr 1e-3 --coupled-mumps-hmat" } Tasks: - nprocs: "-np {scheduler[nodes]} -map-by ppr:{scheduler[np]}:{parallel[map]} -rank-by {parallel[rank]} -bind-to {parallel[bind]}" options: "-d -nbrhs 50 --fembem -withmpf --coupled {sparse[options]} {dense[options]} --mumps-sizeschur {job[schur]} -diskblock {job[IOblock]} -nbpts {job[nbpts]}" Validations: - id: "va-distributed-multi-factorization-{sparse[name]}-\ {dense[name]}-{scheduler[nodes]}-{job[nbpts]}" launch_command: "{@job_creation[va_executable]} -K solver={sparse[name]}/{dense[name]},n_blocks={job[nb]}"
3.5.5. Test benchmarks
- pack_id: "test" description: "Benchmarks for testing the benchmark framework." Tests:
In this section, we define a small number of testing benchmarks allowing us to check the correct operation of the benchmark framework itself for each category of benchmarks, i.e. in-core, out-of-core and parallel distributed.
- id: "test-ic" template_files: *MONO template_instantiation: scheduler: - { prefix: "test-ic", platform: "plafrim", family: "miriel", nodes: 1, tasks: 24, time: "0-00:15:00" } parallel: - *PARALLEL_DEFAULT Tasks: - options: "-z --fembem -withmpf --coupled --mumps-verbose --mumps-blr --mumps-blr-variant 1 --mumps-blr-accuracy 1e-3 --mumps-multi-solve -nbpts 200000" Validations: - id: "va-test-ic" launch_command: "{@job_creation[va_executable]}" - options: "-z -nbrhs 50 --fembem -withmpf --coupled --mumps-verbose --mumps-blr --mumps-blr-variant 1 --mumps-blr-accuracy 1e-3 --mumps-multi-solve -nbpts 200000 --mumps-nbcols-schur-hmat 512 --hmat --hmat-eps-assemb 1e-3 --hmat-eps-recompr 1e-3 --coupled-mumps-hmat" Validations: - id: "va-test-ic" launch_command: "{@job_creation[va_executable]}" - id: "test-ooc" template_files: ["wrapper-ooc", "monobatch"] template_instantiation: scheduler: - { prefix: "test-ooc", platform: "plafrim", family: "miriel", nodes: 1, tasks: 24, time: "0-00:15:00" } parallel: - *PARALLEL_DEFAULT dense: - { ooc: 1 } Tasks: - options: "-z -nbrhs 50 --fembem -withmpf --coupled --mumps-verbose --mumps-blr --mumps-blr-variant 1 --mumps-blr-accuracy 1e-3 -mumpsooc --mumps-multi-solve -nbpts 200000" Validations: - id: "va-test-ooc" launch_command: "{@job_creation[va_executable]}" - options: "-z --fembem -withmpf --coupled --mumps-verbose --mumps-blr --mumps-blr-variant 1 --mumps-blr-accuracy 1e-3 -mumpsooc --mumps-multi-solve -nbpts 200000 --mumps-nbcols-schur-hmat 512 --hmat --hmat-eps-assemb 1e-3 --hmat-eps-recompr 1e-3 --coupled-mumps-hmat" Validations: - id: "va-test-ooc" launch_command: "{@job_creation[va_executable]}" - id: "test-distributed" template_files: ["wrapper-in-core-distributed", "monobatch"] template_instantiation: scheduler: - { prefix: "test-distributed", platform: "plafrim", family: "miriel", nodes: 2, tasks: 24, nt: 24, np: 1, time: "0-00:15:00" } parallel: - *PARALLEL_DEFAULT Tasks: - nprocs: "-np 2 -map-by ppr:1:node -rank-by node -bind-to none" options: "-z -nbrhs 50 --fembem -withmpf --coupled --mumps-verbose --mumps-blr --mumps-blr-variant 1 --mumps-blr-accuracy 1e-3 --mumps-multi-solve -nbpts 400000" Validations: - id: "va-test-distributed" launch_command: "{@job_creation[va_executable]}" - nprocs: "-np 2 -map-by ppr:1:node -rank-by node -bind-to none" options: "-z --fembem -withmpf --coupled --mumps-verbose --mumps-blr --mumps-blr-variant 1 --mumps-blr-accuracy 1e-3 --mumps-multi-solve -nbpts 400000 --mumps-nbcols-schur-hmat 512 --hmat --hmat-eps-assemb 1e-3 --hmat-eps-recompr 1e-3 --coupled-mumps-hmat" Validations: - id: "va-test-distributed" launch_command: "{@job_creation[va_executable]}"
See the complete definition file benchmarks.yaml
8.
3.6. Resource monitoring
To continuously measure the amount of storage (RAM and optionally hard drive)
resources consumed during each benchmark run, we launch test_FEMBEM through the
Python script rss.py
producing as many memory and disk usage logs as there are
MPI processes created during benchmark execution. The peak values are included
at the very end of these log files. Note that measures are taken regularly each
second and the output units are mibibytes [MiB].
import subprocess import sys import threading import time import os import psutil from datetime import datetime
After importing necessary Python modules, we begin by defining a help message
function that can be triggered with the -h
option.
def help(): print("Monitor memory and disk usage of PROGRAM.\n", "Usage: ", sys.argv[0], " [options] [PROGRAM]\n\n", "Options:\n -h Show this help message.", file = sys.stderr, sep = "")
We must check if at least one argument has been passed to the script. It takes
as arguments the executable the resource consumption of which shall be measured
together with the arguments of the latter. Otherwise, one can specify the -h
option to make the script print a help message on how to use it. Also, the
--with-hdd
switch allows one to enable the hard drive consumption monitoring.
if len(sys.argv) < 2: print("Error: Arguments mismatch!\n", file = sys.stderr) help() sys.exit(1) if sys.argv[1] == "-h": help() sys.exit(0) with_hdd = False if sys.argv[1] == "--with-hdd": with_hdd = True
Follow some function definitions. At first, a function to determine the MPI rank of the currently monitored process.
def mpi_rank(): rank = 0 for env_name in ['MPI_RANKID', 'OMPI_COMM_WORLD_RANK']: try: rank = int(os.environ[env_name]) break except: continue return rank
We also need a function to determine and format the current date and time.
Indeed, we include a timestamp in the output log files everytime we record a
measure. This helps us to synchronize the records with other monitoring tools
we are using, i.e. energy_scope
(see Section 3.7).
def timestamp(): now = datetime.now() return now.strftime("%Y/%m/%dT%H:%M:%S")
The script gather real (resident) memory, e.g. the amout of data stored in the
random access memory (RAM). On Linux, the values can be obtained from the
/proc
filesystem.
Memory usage statistics of a particular process are stored in
/proc/<pid>/statm
where <pid>
is the process identifier (PID). In this file,
the field VmRSS
holds the amount of real memory used by the process at instant
\(t\). See the associated function below.
def rss(pid): with open("/proc/%d/statm" % pid, "r") as f: line = f.readline().split(); VmRSS = int(line[1]) * 4 return timestamp(), (VmRSS / 1024.)
To be able to monitor disk space used during benchmarks, we assign to each run
its specific temporary folder all the files used by test_FEMEBM are put into.
Finally, we monitor the evolution of the size of TMPDIR
in our Python script
using the du
Linux command.
def hdd(tmpdir): du = subprocess.Popen(["du", "-s", tmpdir], stdout = subprocess.PIPE) HddNow = int(du.stdout.read().decode("utf-8").split("\t")[0]) return timestamp(), (HddNow / 1024.)
Before entering into the monitoring loop, we need to:
- initialize variables to store peak values;
VmHWM = int(0) HddPeak = int(0)
- get the path of the temporary folder from the
TMPDIR
environment variable. If the variable is not set,/tmp
is returned to allow the script to detect that no separate temporary folder has been created and exit with failure to prevent taking potentially incorrect measures;
tmpdir = os.getenv('TMPDIR', '/tmp') if tmpdir == '/tmp': print('Error: Temporary files were not redirected!', file = sys.stderr) sys.exit(1)
- launch the program to monitor and get the rank of the currently monitored process.
myargs = sys.argv[1:] if with_hdd == True: myargs = sys.argv[2:] p = subprocess.Popen(myargs) rank = mpi_rank()
At this point, we begin the monitoring loop. The latter breaks when the monitored process exits. Note that the measured values are initially stored in memory. They are dumped into the corresponding log files at the end of the benchmark execution. This is to prevent frequent disk accesses that may impact the execution time of the monitored benchmark.
rss_log = [] hdd_log = [] while p.returncode == None:
The following tasks are performed to gather necessary results:
get current memory usage, update the peak memory usage if necessary and store the corresponding value converted from pages to kibibytes to the log;
VmRSS = rss(p.pid) if VmRSS[1] > VmHWM: VmHWM = VmRSS[1] rss_log.append(VmRSS)
collect disk usage statistics, update the peak disk space usage if necessary and store the corresponding value converted from bytes to kibibytes to the log;
if with_hdd == True: HddNow = hdd(tmpdir) if HddNow[1] > HddPeak: HddPeak = HddNow[1] hdd_log.append(HddNow)
sleep for one second and repeat.
time.sleep(1) p.poll()
Eventually, before exiting, we append usage peaks to the logs and dump the latter to disk.
print("Writing storage resource usage logs...") m = open("rss-%d.log" % rank, "w") for entry in rss_log: m.write("%s\t%g\n" % (entry[0], entry[1])) m.write("%g\n" % VmHWM) m.flush() m.close() if with_hdd == True: d = open("hdd-%d.log" % rank, "w") for entry in hdd_log: d.write("%s\t%g\n" % (entry[0], entry[1])) d.write("%g\n" % HddPeak) d.flush() d.close() sys.exit(p.returncode)
See the complete source file rss.py
9.
3.7. Energy consumption monitoring
energy_scope
is an energy consumption monitoring tool. To configure what and
how should be monitored, it uses a configuration in JSON format. Among other
parameters, it allows the choose the acquisition frequency and profile.
In this section, we define a gcvb
template (see Section 3.1) of the
configuration file.
{{ "parameter": {{ "comment":"configuration file template", "version":"2022-01", "owner":"energy_scope", "cpu_info":"/proc/cpuinfo", "nvidia_gpu_info":"/proc/driver/nvidia/gpus", "stop_file_prefix":"/tmp/es_read_rt_stop_flag_", "synchro_file_saved_prefix":"wrk/energy_scope_tmp_file_", "synchro_started_prefix":"wrk/energy_scope_tmp_started_", "msr_cmd":"rdmsr", "opt_enode":"node", "opt_estat":"shell", "opt_profile":"none", "opt_send":"none", "analyseafteracquisition":"no", "eprofile_period(ms)":{es[interval]}, "default_read_rt_profile":"{es[profile]}" }}, "data": {{ "intel": {{ "generic" : {{ "read_rt_idle_time":{es[idle_time]}, "registers": {{ "0x010": {{"when":"ae", "pct":"package"}}, "0x0ce": {{"when":"b", "pct":"package"}}, "0x0e7": {{"when":"N", "pct":"thread"}}, "0x0e8": {{"when":"a", "pct":"thread"}}, "0x198": {{"when":"N", "pct":"core"}}, "0x19c": {{"when":"at", "pct":"core"}}, "0x1A2": {{"when":"b", "pct":"package"}}, "0x606": {{"when":"b", "pct":"package"}}, "0x610": {{"when":"b", "pct":"package"}}, "0x611": {{"when":"ae", "pct":"package"}}, "0x613": {{"when":"a", "pct":"package"}}, "0x614": {{"when":"b", "pct":"package"}}, "0x619": {{"when":"ae", "pct":"package"}}, "0x620": {{"when":"b", "pct":"package"}}, "0x621": {{"when":"a", "pct":"package"}}, "0x639": {{"when":"ae", "pct":"package"}}, "0x641": {{"when":"ae", "pct":"package"}}, "0x64e": {{"when":"a", "pct":"thread"}}, "0x770": {{"when":"b", "pct":"package"}}, "0x771": {{"when":"b", "pct":"thread"}}, "0x774": {{"when":"b", "pct":"thread"}} }} }} }}, "amd":{{ "generic" : {{ "read_rt_idle_time":{es[idle_time]}, "registers": {{ "0x010": {{"when":"ae", "pct":"package"}}, "0x0e7": {{"when":"a", "pct":"thread"}}, "0x0e8": {{"when":"a", "pct":"thread"}}, "0xC0000104": {{"when":"b", "pct":"thread"}}, "0xC0010064": {{"when":"b", "pct":"core"}}, "0xC0010299": {{"when":"b", "pct":"package"}}, "0xC001029A": {{"when":"a", "pct":"core"}}, "0xC001029B": {{"when":"ae", "pct":"package"}} }} }} }} }} }}
See the complete energy_scope
configuration file template
es_config.json
10.
3.8. Result parsing
The Shell script parse.sh
parses data from a test_FEMBEM benchmark and store
the result to a comma-separated values .csv
file.
We begin with a help message function that can be triggered using the -h
option.
function help() { echo -n "Parse data from a test_FEMBEM run and store the result to a " >&2 echo "comma-separated (*.csv) file." >&2 echo "Usage: ./parse.sh [options]" >&2 echo >&2 echo "Options:" >&2 echo " -h Show this help message." >&2 echo -n " -s FILE Read the standard output of " >&2 echo "test_FEMBEM from FILE rather than from the standard input." >&2 echo -n " -r PATH Read memory and disk usage " >&2 echo "logs provided by rss.py from the directory at PATH." >&2 echo -n " -l FILE=FUNCTION[,FUNCTION] Read the trace log of the " >&2 echo -n "test_FEMBEM run from FILE and parse execution time and " >&2 echo "iteration count of one or more FUNCTION(s)." echo -n " -K KEY=VALUE[,KEY=VALUE] Parse one or more additional " >&2 echo "KEY=VALUE pairs to be added to the output (*.csv) file." >&2 echo " -H Do not print header on output." >&2 echo -n " -o FILE Specify the file to store the " >&2 echo "output of the parsing to." >&2 }
We use a generic error message function. The error message to print is expected to be the first argument to the function. If not present, the function displays a generic error message.
function error() { if test $# -lt 1; then echo "An unknown error occurred!" >&2 else echo "Error: $1" >&2 fi }
There should be at least one argument provided on the command line.
if test $# -lt 1; then help exit 1 fi
STDOUT
holds the path to the input file containing test_FEMBEM standard
output.
STDOUT=""
RSS
contains path and name pattern of input files containing the log of the
resource monitoring script (see Section 3.6).
RSS=""
TRACE
is the input file containing the trace call log produced by
test_FEMBEM.
TRACE=""
OUTPUT
is the path to the output file to store the parsed values to.
OUTPUT=""
FUNCTIONS
contains names of functions to parse information about from the
trace call log produced by test_FEMBEM.
FUNCTIONS=""
CUSTOM_KV
holds custom key-value pairs to be included in the output *.csv
file.
CUSTOM_KV=""
DISABLE_HEADING
toggles heading in the output *.csv
file.
DISABLE_HEADING=0
At this point, we parse provided arguments and check the validity of option values.
while getopts ":hs:r:l:K:o:H" option; do case $option in
Standard output of the test_FEMBEM executable run can be passed to the script
either through its standard input or in a file specified using the -s
option.
s) STDOUT=$OPTARG if test ! -f $STDOUT; then error "'$STDOUT' is not a valid file!" exit 1 fi ;;
When the -r
option is provided, the script shall also parse resource
monitoring logs.
r) RSS=$OPTARG if test ! -d $RSS; then error "'$RSS' is not a valid directory!" exit 1 fi ;;
When the -l
option is provided, the script shall read the trace call log of
test_FEMBEM from the file passed as argument and parse execution time, iteration
count and floating-point operations (Flops) of one or more functions.
l) TRACE=$(echo $OPTARG | cut -d '=' -f 1) FUNCTIONS=$(echo $OPTARG | cut -d '=' -f 2 | sed 's/,/\t/g') if test ! -f $TRACE; then error "'$TRACE' is not a valid file!" exit 1 fi ;;
The -K
option allows to add one or more KEY=VALUE
pairs into the output
*.csv
file.
K) if test "$CUSTOM_KV" == ""; then CUSTOM_KV="$OPTARG" else CUSTOM_KV="$CUSTOM_KV,$OPTARG" fi ;;
The -H
option toggles printing of the header in the output *.csv
file.
H)
DISABLE_HEADING=1
;;
Using the -o
option, one can specify the name of the output *.csv
file.
o) OUTPUT=$OPTARG ;;
Eventually, we take care of unknown options or missing arguments and raise an error if any.
\?) # Unknown option error "Arguments mismatch! Invalid option '-$OPTARG'." echo help exit 1 ;; :) # Missing option argument error "Arguments mismatch! Option '-$OPTARG' expects an argument!" echo help exit 1 ;; h | *) help exit 0 ;; esac done
If the standard output of test_FEMBEM is not provided in a file, we try to read its content from the standard input of the script.
if test "$STDOUT" == ""; then rm -rf .input while read line; do echo $line >> .input done STDOUT=.input fi
If no standard output from test_FEMBEM to parse is found, we abort the script and raise an error.
if test $(wc --bytes $STDOUT | cut -d' ' -f 1) -lt 1; then error "'$STDOUT' contains no 'test_FEMBEM' standard output to parse!" exit 1 fi
The treatment starts by separating custom key-value pairs, if any, by a
tabulation character, so they can be looped over using a for
loop.
CUSTOM_KV=$(echo $CUSTOM_KV | sed 's/,/\t/g')
We print the header line to the output file if it was not explicitly disabled
using the -H
option.
if test $DISABLE_HEADING -ne 1; then echo -n "processes,by,mapping,ranking,binding,omp_thread_num," > $OUTPUT echo -n "mkl_thread_num,hmat_ncpu,mpf_max_memory,nbpts,radius," >> $OUTPUT echo -n "height,nbrhs,step_mesh,nbem,nbpts_lambda,lambda," >> $OUTPUT echo -n "thread_block_size,proc_block_size,disk_block_size," >> $OUTPUT echo -n "tps_cpu_facto_mpf,tps_cpu_solve_mpf,error,symmetry," >> $OUTPUT echo -n "h_assembly_accuracy,h_recompression_accuracy," >> $OUTPUT echo -n "assembled_size_mb,tps_cpu_facto,factorized_size_mb," >> $OUTPUT echo -n "tps_cpu_solve,mumps_blr,mumps_blr_accuracy," >> $OUTPUT echo -n "mumps_blr_variant,coupled_method,coupled_nbrhs," >> $OUTPUT echo -n "size_schur,cols_schur,sparse_ooc,platform,node_family," >> $OUTPUT echo -n "singularity_container,slurm_jobid,system_kind,es_cpu," >> $OUTPUT echo -n "es_dram,es_duration,rm_peak,hdd_peak" >> $OUTPUT
The common header entries are followed by custom key-value pairs and additional details about selected functions acquired from the trace call logs produced by test_FEMBEM, if any.
for kv in $CUSTOM_KV; do KEY=$(echo $kv | cut -d '=' -f 1) echo -n ",$KEY" >> $OUTPUT done for f in $FUNCTIONS; do echo -n ","$f"_exec_time,"$f"_nb_execs,"$f"_flops" >> $OUTPUT done echo >> $OUTPUT fi
Next, we parse all the interesting information from the provided standard output
of test_FEMBEM. The regular expressions and parameters of grep
and sed
utilities used in the script are chosen based on the output format of
test_FEMBEM messages (see Section 5.1 in Appendix).
processes=$(cat $STDOUT | grep "<PERFTESTS> MPI process count" | uniq | \ cut -d '=' -f 2 | sed 's/[^0-9]//g') mapping=$(cat $STDOUT | grep "<PERFTESTS> MPI processes mapped by" | uniq | \ cut -d '=' -f 2 | sed 's/[^a-z]//g') ranking=$(cat $STDOUT | grep "<PERFTESTS> MPI processes ranked by" | uniq |\ cut -d '=' -f 2 | sed 's/[^a-z]//g') binding=$(cat $STDOUT | grep "<PERFTESTS> MPI processes bound to" | uniq | \ cut -d '=' -f 2 | sed 's/[^a-z]//g') omp_thread_num=$(cat $STDOUT | grep "OpenMP thread number" | cut -d '=' -f 2 | \ sed 's/[^0-9]//g') mkl_thread_num=$(cat $STDOUT | grep "MKL thread number" | cut -d '=' -f 2 | \ sed 's/[^0-9]//g') hmat_ncpu=$(cat $STDOUT | grep "HMAT_NCPU" | cut -d '=' -f 4 | \ sed 's/[^0-9]//g') mpf_max_memory=$(cat $STDOUT | grep "MPF_MAX_MEMORY" | cut -d '=' -f 2 | \ sed 's/[^0-9]//g') nbpts=$(cat $STDOUT | grep "<PERFTESTS> NbPts =" | cut -d '=' -f 2 | \ sed 's/[^0-9]//g') radius=$(cat $STDOUT | grep "Reading radius" | cut -d '=' -f 2 | \ sed 's/[^0-9.]//g') height=$(cat $STDOUT | grep "Reading height" | cut -d '=' -f 2 | \ sed 's/[^0-9.]//g') nbrhs=$(cat $STDOUT | grep "<PERFTESTS> NbRhs" | cut -d '=' -f 2 | \ sed 's/[^0-9]//g') step_mesh=$(cat $STDOUT | grep "<PERFTESTS> StepMesh" | cut -d '=' -f 2 | \ sed 's/[^-0-9e.+]//g') nbem=$(cat $STDOUT | grep "<PERFTESTS> NbPtsBEM" | cut -d '=' -f 2 | \ sed 's/[^0-9]//g') nbpts_lambda=$(cat $STDOUT | grep "<PERFTESTS> nbPtLambda" | \ cut -d '=' -f 2 | sed 's/[^-0-9e.+]//g') lambda=$(cat $STDOUT | grep "<PERFTESTS> Lambda" | cut -d '=' -f 2 | \ sed 's/[^-0-9e.+]//g') thread_block_size=$(cat $STDOUT | grep "thread block" | cut -d ':' -f 2 | \ cut -d 'x' -f 1 | sed 's/[^0-9]//g') proc_block_size=$(cat $STDOUT | grep "proc block" | cut -d ':' -f 2 | \ cut -d 'x' -f 1 | sed 's/[^0-9]//g') disk_block_size=$(cat $STDOUT | grep "disk block" | cut -d ':' -f 2 | \ cut -d 'x' -f 1 | sed 's/[^0-9]//g') tps_cpu_facto_mpf=$(cat $STDOUT | grep "<PERFTESTS> TpsCpuFactoMPF =" | \ cut -d '=' -f 2 | sed 's/[^0-9.]//g') tps_cpu_solve_mpf=$(cat $STDOUT | grep "<PERFTESTS> TpsCpuSolveMPF =" | \ cut -d '=' -f 2 | sed 's/[^0-9.]//g') error=$(cat $STDOUT | grep "<PERFTESTS> Error" | cut -d '=' -f 2 | \ sed 's/[^-0-9e.+]//g') symmetry=$(cat $STDOUT | grep "Testing: Non-Symmetric matrices and solvers.") if test "$symmetry" != ""; then symmetry="non-symmetric" else symmetry="symmetric" fi h_assembly_accuracy=$(cat $STDOUT | grep "\[HMat\] Compression epsilon" | \ cut -d ':' -f 2 | sed 's/[^-0-9e.+]//g') h_recompression_accuracy=$(cat $STDOUT | \ grep "\[HMat\] Recompression epsilon" | \ cut -d ':' -f 2 | sed 's/[^-0-9e.+]//g') assembled_size_mb=$(cat $STDOUT | grep "<PERFTESTS> AssembledSizeMb" | \ cut -d '=' -f 2 | sed 's/[^0-9.]//g') tps_cpu_facto=$(cat $STDOUT | grep "<PERFTESTS> TpsCpuFacto =" | \ cut -d '=' -f 2 | sed 's/[^0-9.]//g') factorized_size_mb=$(cat $STDOUT | grep "<PERFTESTS> FactorizedSizeMb" | \ cut -d '=' -f 2 | sed 's/[^0-9.]//g') tps_cpu_solve=$(cat $STDOUT | grep "<PERFTESTS> TpsCpuSolve =" | \ cut -d '=' -f 2 | sed 's/[^0-9.]//g') mumps_blr="NA" if test "$(cat $STDOUT | grep 'NOT Using Block-Low-Rank compression')" != ""; then mumps_blr=0 elif test "$(cat $STDOUT | grep 'Using Block-Low-Rank compression')" != ""; then mumps_blr=1 fi mumps_blr_accuracy=$(cat $STDOUT | \ grep "Accuracy parameter for Block-Low-Rank" | \ cut -d ':' -f 2 | sed 's/[^-0-9e.+]//g') mumps_blr_variant=$(cat $STDOUT | grep "BLR Factorization variant" | \ cut -d ':' -f 2 | sed 's/[^-0-9e.+]//g') coupled_method="" if test "$(cat $STDOUT | grep 'Multi solve method.')" != ""; then coupled_method="multi-solve" elif test "$(cat $STDOUT | grep 'Multi facto method.')" != ""; then coupled_method="multi-facto" fi coupled_nbrhs=$(cat $STDOUT | grep "Number of simultaneous RHS" | \ cut -d ':' -f 2 | sed 's/[^0-9.]//g') if test "$coupled_nbrhs" == ""; then coupled_nbrhs=32 # Default value fi size_schur=$(cat $STDOUT | grep "Size of the block Schur" | \ cut -d ':' -f 2 | sed 's/[^0-9.]//g') cols_schur=$(cat $STDOUT | \ grep "Number of columns in the H-matrix Schur block" | \ cut -d ':' -f 2 | sed 's/[^0-9.]//g') sparse_ooc=$(cat $STDOUT | grep "[mumps] Out-of-core.") if test "$sparse_ooc" != ""; then sparse_ooc=1 else sparse_ooc=0 fi platform=$(cat $STDOUT | grep "<PERFTESTS> Underlying platform" | uniq | \ cut -d '=' -f 2 | sed 's/[^a-z]//g') node_family=$(cat $STDOUT | grep "<PERFTESTS> Node family" | uniq | \ cut -d '=' -f 2 | sed 's/[^a-z]//g') singularity_container=$(cat $STDOUT | grep "<PERFTESTS> Singularity image") if test "$singularity_container" != ""; then singularity_container=$(echo $singularity_container | cut -d '=' -f 2 | \ sed 's/[[:space:]]*//g') else singularity_container="NA" fi slurm_jobid=$(cat $STDOUT | grep "<PERFTESTS> Slurm job identifier" | uniq | \ cut -d '=' -f 2 | sed 's/[^0-9.]//g') if test "$(cat $STDOUT | grep 'Testing : FEM-BEM')" != ""; then system_kind="fem-bem" elif test "$(cat $STDOUT | grep 'Testing : FEM')" != ""; then system_kind="fem" elif test "$(cat $STDOUT | grep 'Testing : BEM')" != ""; then system_kind="bem" else system_kind="unknown" fi es_cpu="NA" es_dram="NA" es_duration="NA" if test -f energy_scope_estat*; then es_estat=$(cat energy_scope_estat*) es_cpu=$(echo $es_estat | jq '.arch.data."joule(J)"."ecpu(J)"') es_dram=$(echo $es_estat | jq '.arch.data."joule(J)"."edram(J)"') es_duration=$(echo $es_estat | jq '."duration(sec)"') fi
If the -r
option was provided, we parse also storage resource monitoring logs.
Memory usage log files are named rss-x.log
and disk usage log files are named
hdd-x.log
where \(x\) is MPI process rank.
if test "$RSS" != ""; then rm_peak="0.0" hdd_peak="0.0" for rss in $(ls $RSS | grep -e "^rss"); do rm_peak="$rm_peak + $(cat $RSS/$rss | tail -n 1)" done for hdd in $(ls $RSS | grep -e "^hdd"); do hdd_peak="$hdd_peak + $(cat $RSS/$hdd | tail -n 1)" done rm_peak=$(echo $rm_peak | bc -l) hdd_peak=$(echo $hdd_peak | bc -l) fi
Eventually, we print parsed values, values from custom key-value pairs, if any, and additional function information possibly parsed from trace call logs produced by test_FEMBEM.
echo -n "$processes,$by,$mapping,$ranking,$binding,$omp_thread_num," >> $OUTPUT echo -n "$mkl_thread_num,$hmat_ncpu,$mpf_max_memory,$nbpts," >> $OUTPUT echo -n "$radius,$height,$nbrhs,$step_mesh,$nbem,$nbpts_lambda," >> $OUTPUT echo -n "$lambda,$thread_block_size,$proc_block_size," >> $OUTPUT echo -n "$disk_block_size,$tps_cpu_facto_mpf,$tps_cpu_solve_mpf," >> $OUTPUT echo -n "$error,$symmetry,$h_assembly_accuracy," >> $OUTPUT echo -n "$h_recompression_accuracy,$assembled_size_mb," >> $OUTPUT echo -n "$tps_cpu_facto,$factorized_size_mb,$tps_cpu_solve," >> $OUTPUT echo -n "$mumps_blr,$mumps_blr_accuracy,$mumps_blr_variant," >> $OUTPUT echo -n "$coupled_method,$coupled_nbrhs,$size_schur,$cols_schur," >> $OUTPUT echo -n "$sparse_ooc,$platform,$node_family,$singularity_container," >> $OUTPUT echo -n "$slurm_jobid,$system_kind,$es_cpu,$es_dram,$es_duration," >> $OUTPUT echo -n "$rm_peak,$hdd_peak" >> $OUTPUT for kv in $CUSTOM_KV; do VALUE=$(echo $kv | cut -d '=' -f 2) echo -n ",$VALUE" >> $OUTPUT done for f in $FUNCTIONS; do exec_time=$(cat $TRACE | grep $f | cut -d '|' -f 2 | sed 's/[^0-9.]//g') nb_execs=$(cat $TRACE | grep $f | cut -d '|' -f 3 | sed 's/[^0-9]//g') flops=$(cat $TRACE | grep $f | cut -d '|' -f 4 | sed 's/[^0-9]//g') echo -n ",$exec_time,$nb_execs,$flops" >> $OUTPUT done
At the very end, we print the trailing new line character to the output file, perform cleaning and terminate.
echo >> $OUTPUT rm -rf .input exit 0
See the complete source file parse.sh
11.
3.9. Database injecting
The Python script inject.py
allows to call a custom parsing script (see
Section 3.8), producing a .csv
file on output and gather the
results exported in the latter, then inject the values into a gcvb
NoSQL
database (see Section 3.1) for a possible result visualization using the
gcvb
dashbord feature even if it is only experimental for now. See a concrete
usage of this script in Section 3.5.
At the beginning, we import necessary Python modules as well as the gcvb
module.
import gcvb import subprocess import csv import sys
The script expect the -h
option or at least two arguments:
DATASET
which stands for the.csv
file to gather the data from,PARSER
which represents the path to the parsing script to use.
DATASET
should be a .csv.
file containing two lines, one heading providing
the captions and then the associated values.
Next, may follow the arguments to call PARSER
with.
The script continues with a help message function, that can be triggered with
the -h
option, and argument check.
def help(): print("Launch PARSER (with ARGUMENTS if needed) instrumented to produce a ", "comma-separated values .csv output DATASET, deserialize it and ", "inject to the current gcvb database provided by the gcvb module.\n", "Usage: ", sys.argv[0], " DATASET PARSER [ARGUMENTS]\n\n", "Options:\n -h Show this help message.", file = sys.stderr, sep = "") if len(sys.argv) < 2: print("Error: Arguments mismatch!\n", file = sys.stderr) help() if sys.argv[1] == "-h": help() sys.exit(0)
In the main function, we parse the arguments
def main(): args = sys.argv args.pop(0) # Drop the script name. dataset = args.pop(0) parser = args.pop(0)
and launch the parsing script.
subprocess.run([parser] + args)
Once it's finished, we open the data set produced, gather data into a dictionary
and inject key-value pairs one by one into the currently used gcvb
database
provided by the imported gcvb
module.
with open(dataset, mode = 'r') as data: reader = csv.DictReader(data) for row in reader: for item in row: gcvb.add_metric(item, row[item]) if __name__ == '__main__': main()
See the complete source file inject.py
12.
3.10. Extracting additional results
In some cases, we need to extract a more detailed information from the log files produced by one or more selected benchmarks. For example, regarding real memory (RAM) consumption, we gather only the peak usage values by default. Although, we may also need to know how the consumption evolves during the entire execution time of a given benchmark. Furthermore, some benchmarks may produce additional data which are not taken into account by the parsing script (see Section 3.8).
Therefore, we define here the extract.sh
shell script allowing to extract
additional benchmark results for one or more user-specified benchmarks.
The script begins traditionnaly with a help message function that can be
triggered using the -h
option.
function help() { echo "Extract additional results for selected benchmark or benchmarks." >&2 echo "Usage: ./extract.sh [options]" >&2 echo >&2 echo "Options:" >&2 echo " -h Show this help message." >&2 echo -n " -B ID[,ID] Select the benchmark matching ID. Multiple " >&2 echo "identifiers may be specified to select multiple benchmarks." >&2 echo " -d PATH Place the extracted files to PATH." >&2 echo " -r Extract detailed real memory (RAM) consumption." >&2 echo " -s PATH Search for benchmark results in PATH:" >&2 echo " -t Extract the execution timeline." >&2 echo " -T PATH Search for the timeline extraction tool in PATH." \ "By default, the script looks in './sources'.">&2 }
We use a generic error message function. The error message to print is expected to be the first argument to the function. If not present, the function displays a generic error message.
function error() { if test $# -lt 1; then echo "An unknown error occurred!" >&2 else echo "Error: $1" >&2 fi }
Follows the option parsing.
BENCHMARKS="" DESTINATION=$(pwd) RSS=0 SOURCE="" TIMELINE=0 TIMELINE_PATH="./sources" while getopts ":hB:d:rs:tT:" option; do case $option in
The -B
option allows to specify one or more benchmarks to extract additional
results for. If more than one identifier is present, they must be separated by
commas. The latter are then converted to tabulations for easier post-processing.
B) BENCHMARKS="$BENCHMARKS $(echo $OPTARG | sed 's/,/\t/g')" if test "$BENCHMARKS" == ""; then error "Bad usage of the '-B' option (no identifiers specified)!" exit 1 fi ;;
By default, the extracted files shall be placed to the current working directory
of the script. One can use the -d
option to specify an another destination
directory.
d) DESTINATION=$OPTARG if test ! -d "$DESTINATION"; then error "'$DESTINATION' is not a valid destination directory!" exit 1 fi ;;
The -r
option tells the script to extract detailed real memory (RAM)
consumption data.
r)
RSS=1
;;
The -s
option allows to specify the path to the directory to search for
benchmark results in. Typically, this represent the path to a gcvb benchmark
session directory (see Section 3.1).
s) SOURCE=$OPTARG if test ! -d "$SOURCE"; then error "'$SOURCE' is not a valid source directory!" exit 1 fi ;;
The -t
option allows to extract the execution timelines of the selected
benchmarks. We also have to check that the external proprietary extraction tool
from Airbus we use is available from the current working directory.
t)
TIMELINE=1
;;
The -T
option allows to specify the path to search the timeline extraction
tool in.
T) TIMELINE_PATH=$OPTARG ;;
Eventually, we have to take care of unknown options or missing arguments, if any, raise an error and terminate the script in that case.
\?) # Unknown option error "Arguments mismatch! Invalid option '-$OPTARG'." echo help exit 1 ;; :) # Missing option argument error "Arguments mismatch! Option '-$OPTARG' expects an argument!" echo help exit 1 ;; h | *) help exit 0 ;; esac done
The -B
and the -s
options are mandatory and at least one of the extraction
options, -r
or -t
, must be specified as well.
if test "$SOURCE" == ""; then error "No source directory was specified! Nothing to do." exit 1 fi if test "$BENCHMARKS" == ""; then error "No benchmark identifier was specified!" exit 1 fi if test $RSS -eq 0 && test $TIMELINE -eq 0; then error "Use at least one of the extraction options! Nothing to do." exit 1 fi
If the -t
option was specified, we need to check if the timeline extraction
tool is accessible.
if test $TIMELINE -ne 0 && test ! -f $TIMELINE_PATH/timeline.py; then error "Timeline extraction tool was not found in '$TIMELINE_PATH'!" exit 1 fi
For each benchmark identifier specified using the -B
option we:
- extract all of the real memory consumption logs if the
-r
option is present,
for b in $BENCHMARKS; do if test $RSS -ne 0; then RSS_LOGS=$(ls $SOURCE/$b/rss-*.log) if test "$RSS_LOGS" == ""; then error "There are no real memory (RAM) consumption logs at '$RSS_LOGS'!" exit 1 fi COUNTER=0 for l in $RSS_LOGS; do cp $l $DESTINATION/rss-$COUNTER-$b.log if test $? -ne 0; then error "Failed to extract the real memory (RAM) consumption log '$l'!" exit 1 fi COUNTER=$(expr $COUNTER + 1) done fi
- extract the execution timelines from the standard output log of the selected
benchamrks if the
-t
option is present.
if test $TIMELINE -ne 0; then python $TIMELINE_PATH/timeline.py $SOURCE/$b/stdout.log \ > $DESTINATION/timeline-$b.log if test $? -ne 0 || test ! -f $DESTINATION/timeline-$b.log; then error "Failed to extract the execution timeline of the benchmark '$b'!" exit 1 fi fi done
See the complete source file extract.sh
13.
3.11. Wrapper scripts
Before launching a benchmark there are some preparation actions to perform.
Similarly, there are some completion and cleaning actions to perform once the
benchmark task finishes. For the sake of convenience, especially in case of
distributed parallel execution, we use a wrapper Shell script to perform these
actions. This script has the form of a gcvb
template file (see Section
3.1) allowing us to make the script fit the individual parameters of each
benchmark. Although, the major part of the template is identical for all kinds
of benchmark we define (see Section 3.5), some parts may differ
to meet the specific needs of a given benchmark or set of benchmarks, e.g.
setting of the environment variables related to out-of-core computation
activation. In this section, we define multiple variations of the wrapper script
template.
3.11.1. In-core benchmarks
The wrapper script template wrapper-incore.sh
is meant for benchmarks run
in-core, i.e. with the out-of-core computation disabled.
At first, we need to setup a dedicated temporary folder for the benchmark
enabling us to measure the disk space used during the execution of the latter
(see Section 3.6). The GCVB_TEST_ID
environment variable is set by
gcvb
and represents a unique benchmark identifier. Note that we remove the
/tmp/vive-pain-au-chocolat
directory first to ensure there is no data left
behind by a previous benchmark run, if any.
rm -rf /tmp/vive-pain-au-chocolat mkdir -p /tmp/vive-pain-au-chocolat/$GCVB_TEST_ID export TMPDIR=/tmp/vive-pain-au-chocolat/$GCVB_TEST_ID
As a temporary workaround for Singularity containers to work properly, we need
to set the PYTHONPATH
environment variable too.
if test "$SINGULARITY_CONTAINER" != ""; then export PYTHONPATH=$GUIX_PYTHONPATH fi
Using the OMPI_MCA_pml
environment variable, we specify the communication
library to use by OpenMPI at runtime.
export OMPI_MCA_pml='^ucx'
Then, we position the environment variables specifying the number of threads to
use available through the parallel
placeholder (see Section
3.5).
export OMP_NUM_THREADS={parallel[nt]} MKL_NUM_THREADS={parallel[nt]} \ HMAT_NCPU={parallel[nt]}
Also, we need to disable the out-of-core computation feature. For MUMPS, it is disabled by default. For SPIDO and HMAT we need to set a couple of environment variables to disable it explicitly.
export MPF_SPIDO_INCORE=1 HMAT_DISABLE_OOC=1
In prevision of later post-processing of the standard output of the execution, we try to determine the rank of the current process
if test "$OMPI_COMM_WORLD_RANK" != ""; then MPI_RANK=$OMPI_COMM_WORLD_RANK elif test "$MPI_RANKID" != ""; then MPI_RANK=$MPI_RANKID else echo "Failed to get the MPI of the current process! Can not proceed." >&2 exit 1 fi
and if we are the 0th process, we print out:
selected scheduling information (see the
scheduler
placeholder in Section 3.5),if test $MPI_RANK -eq 0; then echo "<PERFTESTS> Underlying platform = {scheduler[platform]}" echo "<PERFTESTS> Node family = {scheduler[family]}" fi
the MPI configuration (see the
parallel
placeholder in Section 3.5),if test $MPI_RANK -eq 0; then echo "<PERFTESTS> MPI process count = {parallel[np]}" echo "<PERFTESTS> MPI processes mapped by = {parallel[map]}" echo "<PERFTESTS> MPI processes ranked by = {parallel[rank]}" echo "<PERFTESTS> MPI processes bound to = {parallel[bind]}" fi
the path to the Singularity container, if the execution is taking place within one of them,
if test "$SINGULARITY_CONTAINER" != "" && test $MPI_RANK -eq 0; then echo "<PERFTESTS> Singularity image = $SINGULARITY_CONTAINER." fi
the Slurm job identifier.
if test "$SLURM_JOBID" != "" && test $MPI_RANK -eq 0; then echo "<PERFTESTS> Slurm job identifier = $SLURM_JOBID" fi
We also need to create symbolic links to industrial test case meshes in the current working directory.
if test $MPI_RANK -eq 0; then MESHES="../../../meshes" for i in $(ls $MESHES); do ln -s $MESHES/$i $i done fi
At this point, we can launch the benchmark command passed as argument.
eval "$*"
After the execution, we have to clean all the files produced during the
benchmark execution except for logs, results, FXT traces, configuration files,
scheduling and launching scripts, i.e. files matching *.log
, *.yaml
,
es_config.json
, prof_file*
, *.out
, *batch
, coupled
, scalability
, or
*.sh
.
if test $MPI_RANK -eq 0; then rm -f $(find . -type f ! -name "*.log" ! -name "*.yaml" ! -name "prof_file*" \ ! -name "*.out" ! -name "*batch" ! -name "coupled" \ ! -name "scalability" ! -name "*.sh" ! -name "*energy_scope*" \ ! -name "es_config.json" ! -name "*.csv") fi
3.11.2. Out-of-core benchmarks
Compared to the template for in-core benchmarks (see Section
3.11.1), the template for out-of-core benchmarks wrapper-ooc.sh
makes sure the environment variables disabling the feature for SPIDO and HMAT
are unset if the placeholder {dense[ooc]}
is true. For MUMPS, the activation
of out-of-core is handled using a dedicated switch of test_FEMBEM
. Also, by
default, HMAT starts to dump data on disk only when the solver's consumption
reaches 55% of available system memory. This leaves only 45% of available memory
for all the other data and computations. Therefore, we need to decrease this
threshold.
<<code:incore-preparation>> <<code:incore-mpi>> if test {dense[ooc]} -eq 1; then unset MPF_SPIDO_INCORE unset HMAT_DISABLE_OOC TOTAL_MEMORY_KiB=$(grep "MemTotal" /proc/meminfo | sed 's/[^0-9]//g') TOTAL_MEMORY_MiB=$(echo "$TOTAL_MEMORY_KiB / 1024 / 4;" | bc -l) export HMAT_LIMIT_MEM=$TOTAL_MEMORY_MiB else export MPF_SPIDO_INCORE=1 HMAT_DISABLE_OOC=1 fi <<code:incore-completion>>
3.11.3. FXT execution tracing template
In the template wrapper-fxt.sh
for benchmarks ensuring FXT execution tracing,
we need to set specific StarPU environment variables related to this
functionality and add the path to the StarVZ R library (see Section
4.2) in PATH
. Note that the out-of-core computation
remains disabled.
<<code:incore-preparation>> <<code:incore-mpi>> <<code:incore-incore>> export STARPU_FXT_PREFIX=$(pwd)/ STARPU_PROFILING=1 STARPU_WORKER_STATS=1 export PATH=$GUIX_ENVIRONMENT/site-library/starvz/tools/:$PATH <<code:incore-completion>>
3.11.4. Multi-node parallel distributed benchmarks template
In the template wrapper-in-core-distributed.sh
for multi-node parallel
distributed benchmarks, the total number of MPI processes is given by the
{scheduler[nodes]}
placeholder instead of {parallel[np]}
. In this case, we
also print out the number of MPI processes per node.
<<code:incore-preparation>> <<code:incore-incore>> if test $MPI_RANK -eq 0; then echo "<PERFTESTS> MPI process count = {scheduler[nodes]}" echo "<PERFTESTS> MPI processes per node = {scheduler[np]}" echo "<PERFTESTS> MPI processes mapped by = {parallel[map]}" echo "<PERFTESTS> MPI processes ranked by = {parallel[rank]}" echo "<PERFTESTS> MPI processes bound to = {parallel[bind]}" fi <<code:incore-completion>>
See the complete wrapper script templates
wrapper-in-core.sh
14, wrapper-ooc.sh
15 and
wrapper-fxt.sh
16.
3.12. Generate benchmark runs
3.13. Job submission
To simplify the scheduling of benchmarks to run in parallel based on the
associated job names (see Section 3.2), we use the shell
script submit.sh
. It determines the list of benchmarks from a given benchmark
session, finds the corresponding slurm batch script configuration
headers and schedule associated jobs.
We begin by defining a help message function that can be triggered if the script
is run with the -h
option.
function help() { echo "Submit benchmarks from SESSION." >&2 echo "Usage: $(basename $0) [options]" >&2 echo >&2 echo "Options:" >&2 echo " -h Print this help message." >&2 echo " -E REGEX Submit all benchmarks except those verifying REGEX" \ "(mutually exclusive with '-F')." >&2 echo " -F REGEX Submit only the benchmarks verifying REGEX." >&2 echo " -g Generate job scripts without submitting them." >&2 echo " -i ID[,ID] Submit only the benchmarks having ID." >&2 echo -n " -l Only list the benchmarks to run instead of " >&2 echo "actually submitting the corresponding jobs." >&2 echo " -s SESSION Session to submit the benchmarks from." >&2 echo " -v Validate benchmarks without computing." >&2 echo " -w Make the script wait until all of the submitted" \ "jobs complete." >&2 }
We use a generic error message function. The error message to print is expected to be the first argument to the function. If not present, the function displays a generic error message.
function error() { if test $# -lt 1; then echo "An unknown error occurred!" >&2 else echo "Error: $1" >&2 fi }
Follows the parsing of options.
REGEX="" EXCLUDE=0 INCLUDE=0 IDs="" SESSION="" SESSION_ID="" GENERATE_ONLY=0 LIST_ONLY=0 VALIDATE=0 WAIT=0 while getopts ":hE:F:gi:ls:vw" option; do case $option in
Using the -E
option, one can decide to exclude from scheduling the benchmarks
the names of which verify a given regular expression.
E) REGEX=$OPTARG EXCLUDE=1 if test "$REGEX" == ""; then error "Bad usage of the '-E' option (empty regular expression)!" exit 1 fi ;;
Using the -F
option, one can decide to schedule only the benchmarks the names
of which verify a given regular expression.
F) REGEX=$OPTARG INCLUDE=1 if test "$REGEX" == ""; then error "Bad usage of the '-F' option (empty regular expression)!" exit 1 fi ;;
The -g
option can be used to generate job scripts for selected benchmarks
without actually submitting them. The generated scripts are then placed into
results/<session>/jobs
.
g)
GENERATE_ONLY=1
;;
The -i
option allows to select benchmarks to schedule by providing their
exact identifiers separated by commas if multuiple identifiers are specified.
i) IDs="$IDs $(echo $OPTARG | sed 's/,/ /g')" if test "$IDs" == ""; then error "Bad usage of the '-i' option (empty list of identifiers)!" exit 1 fi ;;
The -l
option enables to list benchmarks selected for scheduling without
actually scheduling them. It may be particularly useful, for example, in
combination with the -F
option to verify whether the provided regular
expression matches the intended benchmarks.
l)
LIST_ONLY=1
;;
The -s
option is used to specify the benchmark session in the results
directory (see Section 3.1) to schedule the benchmarks from.
s) SESSION=$OPTARG if test ! -d $SESSION; then error "'$SESSION' is not a valid directory!" exit 1 fi SESSION_ID=$(basename $SESSION) ;;
To re-run the validation phase without performing the computation of benchmarks,
we can use the -v
option.
v)
VALIDATE=1
;;
To make the script wait until all the jobs that it has scheduled finish, the
-w
option may be used.
w)
WAIT=1
;;
Eventually, we have to take care of unknown options or missing arguments, if any, raise an error and terminate the script in that case.
\?) # Unknown option error "Arguments mismatch! Invalid option '-$OPTARG'." echo help exit 1 ;; :) # Missing option argument error "Arguments mismatch! Option '-$OPTARG' expects an argument!" echo help exit 1 ;; h | *) help exit 0 ;; esac done
Options -E
and -F
, -F
and -i
, -l
and -w
, -g
and -w
as well as
-g
and -l
are mutually exclusive.
if test $EXCLUDE -ne 0 && test $INCLUDE -ne 0; then error "Options '-E' and '-F' must not be used together!" exit 1 fi if test "$REGEX" != "" && test "$IDs" != ""; then error "Options '-E' and '-F' must not be used together with the '-i' option!" exit 1 fi if test $WAIT -ne 0 && test $LIST_ONLY -ne 0; then error "Options '-l' and '-w' must not be used together!" exit 1 fi if test $WAIT -ne 0 && test $GENERATE_ONLY -ne 0; then error "Options '-g' and '-w' must not be used together!" exit 1 fi if test $GENERATE_ONLY -ne 0 && test $LIST_ONLY -ne 0; then error "Options '-g' and '-l' must not be used together!" exit 1 fi
After argument parse and check, we naviagte to the root of the gcvb
filesystem
of the given benchmark session folder.
cd $SESSION/../../
We also check whether we are in a valid gcvb
filesystem contaning a
config.yaml
and a results
folder (see Section 3.1).
if test ! -f config.yaml || test ! -d results; then error "'$SESSION' is not a correct gcvb session directory!" exit 1 fi
Then, we ensure to clean all the temporary folders we may have used before. As
configured in definition files (see Section 3.5), to isolate our
temporary files from other users, we use a dedicated temporary folder at
/tmp/vive-pain-au-chocolat
.
rm -rf /tmp/vive-pain-au-chocolat if test $? -ne 0; then error "Unable to clean any temporary folder(s) previously used!" exit 1 fi
Before submitting, we look for all the sbatch
configuration headers. If there
is none, there is no valid benchmark job to submit.
RUNS=$(find $SESSION -maxdepth 2 -type f -name "sbatch") if test "$RUNS" == ""; then error "No sbatch file was found! No valid job for submission." exit 1 fi
The script loops over all the headers found and extracts the job name from each
of them. We use an associative array to store for each job name, corresponding
to a set of benchmarks (e. g. multi-solve-1-mumps-spido
), the path to the
associated header file.
declare -A BATCH_JOBS
In session folder, there is one folder for each benchmark, not only per set of benchmarks. So, the same header file may be present multiple times in different folders. Using an associative array we manage to keep the path to only one copy of each header and prevent redundant job submissions.
for run in $RUNS do JOB_NAME=$(grep "\-\-job-name" $run | cut -d '=' -f 2)
If the -i
options is used to specify exact identifiers of benchmarks to be
submitted, we need to filter the detected benchmarks.
if test "$IDs" != ""; then for id in $IDs; do if test $(echo $run | grep "$id" | wc -l) -gt 0; then BATCH_JOBS[$JOB_NAME]=$run fi done
Also, in the case the -F
option followed by a regular expression is specified,
we pick only the jobs the name of which matches the expression.
elif test "$REGEX" == "" || \ (test $INCLUDE -ne 0 && [[ "$JOB_NAME" =~ $REGEX ]]) || \ (test $EXCLUDE -ne 0 && ! [[ "$JOB_NAME" =~ $REGEX ]]); then BATCH_JOBS[$JOB_NAME]=$run fi done
If the user instruments the script to wait until the submitted jobs complete
using the -w
option, before actually submitting the jobs, we must reinitialize
the list of jobs to wait for.
if test $WAIT -ne 0; then JOB_LIST="" fi
Finally, we call the gcvb
submission command for each job name with the
--filter-by-test-id
and --header
options allowing us to specify a subset of
jobs to submit and prepend associated shell scripts with corresponding slurm
batch script configuration headers. If the -v
option is used, we pass the
--validate-only
option to gcvb in order to run only the validation phase. If
the -g
option is used, we pass the --dry-run
option to gcvb in order to
exit before actually submit the generated job script. The latter is then moved
into results/<session>/jobs
. If the -l
option is specified, the jobs
selected for scheduling are not submitted, only their identifier is printed out
on the screen.
SET_COUNT=${#BATCH_JOBS[@]} i=1 for unique in "${!BATCH_JOBS[@]}"; do PREFIX="[ $i / $SET_COUNT ]" if test $LIST_ONLY -ne 0; then echo "$PREFIX Listing job '$unique'." else ACTION_ERROR="submit" ACTION_SUCCESS="Submitted" if test $VALIDATE -ne 0; then ACTION_ERROR="validate" ACTION_SUCCESS="Validated" python3 -m gcvb --filter-by-test-id "$unique" compute \ --gcvb-base $SESSION_ID --validate-only --wait-after-submitting elif test $GENERATE_ONLY -ne 0; then ACTION_ERROR="generate job script for" ACTION_SUCCESS="Generated job script for" python3 -m gcvb --filter-by-test-id "$unique" compute \ --gcvb-base $SESSION_ID --dry-run --header ${BATCH_JOBS[$unique]} mkdir -p $SESSION/jobs mv $SESSION/job.sh $SESSION/jobs/$unique.sh else python3 -m gcvb --filter-by-test-id "$unique" compute \ --gcvb-base $SESSION_ID --wait-after-submitting \ --header ${BATCH_JOBS[$unique]} fi if test $? -ne 0; then error "$PREFIX Failed to $ACTION_ERROR batch job '$unique'!" exit 1 else echo "$PREFIX $ACTION_SUCCESS batch job '$unique'." fi
Also, if the -w
option is set, we use the Slurm's squeue
command to collect
job identifiers of the launched jobs to make the script wait for the latter to
complete before exiting itself.
if test $WAIT -ne 0; then LAST_JOB=$(squeue -u $USER -ho "%i" -S -V | head -n 1) JOB_LIST="$LAST_JOB $JOB_LIST" fi fi i=$(expr $i + 1) done
Once the jobs are submitted through the sbatch
command and if the -w
option
is set, we make the script wait until all the submitted jobs complete. We use
squeue
again to retrieve the list of running jobs.
Note that if the compute
command of gcvb
executes correctly but slurm
fails to submit the job anyway, JOB_LIST
will not be an empty string but will
contain spaces. Therefore, before entering the waiting loop, we have to ensure
the variable does not begin with a space in order to prevent an infinite waiting
loop.
if test $WAIT -ne 0; then if test "${JOB_LIST:0:1}" == " "; then error "No jobs to wait for!" exit 1 fi echo -n "Waiting for submitted jobs to complete... " while test "$JOB_LIST" != ""; do STILL_RUNNING="" for id in $JOB_LIST; do if test $(squeue | grep "$id" | wc -l) -gt 0; then STILL_RUNNING="$STILL_RUNNING $id" fi done JOB_LIST=$STILL_RUNNING sleep 1m done echo "Done" fi exit 0
See the complete source file submit.sh
17.
4. Post-processing results
4.1. Benchmarks
To visualize benchmark results we use R and rely mainly on the plot drawing
library ggplot2
ggplot2. The R script file plot.R
we describe in this
section is meant to be used from within plot drawing R source code blocks in
concerned Org documents, e.g. reports, articles and so on.
4.1.1. Prerequisites
At the beginning we need to import several libraries for:
manipulating data sources, i.e. R data frames, SQLite databases and JSON files,
library(plyr) library(dplyr) library(readr) library(DBI) library(rjson)
transforming data frames,
library(tidyr)
decompressing gzip files,
library(R.utils)
defining and exporting plots.
library(ggplot2) library(scales) library(grid) library(gridExtra) require(cowplot) library(stringr) library(ggrepel)
We also need to prevent using the cairo
library to produce figures. This does
not work on Linux.
set_null_device(cairo_pdf)
4.1.2. Importing data
4.1.2.1. Benchmark results
The benchmark tool we rely on, gcvb
, stores benchmark results into a local
SQLite database (see Section 3.1). We define a global variable having the
name gcvb_db_path
to store the path to the gcvb
database we are currently
using.
gcvb_db_path <- "../results.db"
The database file containing our latest experimental results is available for download through the link below.
To extract the results from the database, we define the function gcvb_import
which takes up to three arguments:
session
indicating thegcvb
benchmark session to extract the results from (see Section 3.1),like
representing a list or a vector of benchmark name patterns,path
containing the path to thegcvb
SQLite database file and by default is set togcvb_db_path
(see above).
Using the like
argument one can extract only the results of benchmarks the
identifiers of which match given pattern or patterns (see Section
3.5). The latter may contain wildcard characters supported by
the SQL's operator LIKE
sqliteLike.
The function begins by opening a connection to gcvb
SQLite database and
setting the LIKE
keyword to be case-sensitive.
gcvb_import <- function(session, like = c(), path = gcvb_db_path) { conn <- dbConnect(RSQLite::SQLite(), path) dbExecute(conn, "PRAGMA case_sensitive_like = true;")
A gcvb
database contains 6 tables (see Figure 1).
Figure 1: EER diagram of a gcvb
database.
At first, we need to select all the benchmarks belonging to session
from the
test
table and optionally restrict the selection to benchmarks matching the
patterns of like
.
request <- "" for(p in 1:length(like)) { if(p > 1) { request <- paste(request, "OR ") } request <- paste0(request, "name LIKE '", like[p], "'") } if(length(like) > 0) { request <- paste( "SELECT id, name, start_date FROM test WHERE (", request, ") AND run_id IN", "(SELECT id FROM run WHERE gcvb_id =", session, ")" ) } else { request <- paste( "SELECT id, name, start_date FROM test WHERE run_id IN", "(SELECT id FROM run WHERE gcvb_id =", session, ")" ) } benchmarks <- dbGetQuery(conn, request)
Then, if a given benchmark was executed more then once, we want to keep only the
data associated with the latest run. To do this, we sort the data by the
execution start timestamp, i.e. the start_date
column and remove duplicates
based on benchmark identifiers in the name
column. Note that start_date
is
imported as string, so we need to convert it to a datetime format at first. At
the end, we drop the start_date
column being useless for further treatment.
benchmarks$start_date <- as.POSIXct(benchmarks$start_date) benchmarks <- benchmarks[order(benchmarks$start_date, decreasing = TRUE), ] benchmarks <- benchmarks[!duplicated(benchmarks$name), ] benchmarks <- subset(benchmarks, select = -c(start_date))
Benchmark results are stored in the separate valid
table. Each row references
a benchmark in the test
table and represents one metric and its value (see an
example in Table 1). This representation is referred to as
long format.
id | metric | value | test_id | task_step |
---|---|---|---|---|
39870 | nbrhs | 50 | 1260 | NULL |
39871 | step_mesh | 0.02241996 | 1260 | NULL |
… | … | … | … | … |
In addition to the valid
table, the file
table holds files potentially
associated with a given benchmark, e.g. resource consumption logs (see an
example in Table 2. Note that the files themselves are stored
in a gzip-compressed binary form.
id | filename | file | test_id |
---|---|---|---|
9 | rss-0.log | BLOB | 3666 |
10 | hdd-0.log | BLOB | 3666 |
11 | …/energy_scope_eprofile_1473279.txt | BLOB | 3666 |
… | … | … | … |
So, for each of the selected benchmarks, we extract the associated rows from the
valid
and the files
tables based on the test_id
key. Note that the value
column is of mixed type, i.e. it contains both strings and numerical values. To
prevent the R dbGetQuery
function from doing inappropriate type conversions,
we cast the value
column to text in the corresponding SQL request. We do the
same for the file
column in the files
table. Note also that for the union
operation below to work, we have to rename the selected columns of the files
table to match those in the valid
table.
for(row in 1:nrow(benchmarks)) { test_id <- benchmarks[row, "id"] metrics <- dbGetQuery( conn, paste( "SELECT metric, CAST(value AS TEXT) AS 'value'", "FROM valid WHERE test_id = :id", "UNION", "SELECT filename AS 'metric', QUOTE(file) AS 'value'", "FROM files WHERE test_id = :id" ), params = list(id = test_id) )
At the same time, we convert the original long format representation (see Table 1) into wide format for easier data manipulation. In the latter, to each metric corresponds a separate column (see an example in Table 3). This way, each row contains all the results of a given benchmark.
id | name | nbrhs | step_mesh | … |
---|---|---|---|---|
1260 | spido-100000 | 50 | 0.02241996 | … |
if(nrow(metrics) > 0) { for(metric in 1:nrow(metrics)) { metric_name <- metrics[metric, "metric"] if(!(metric_name %in% colnames(benchmarks))) { benchmarks[[metric_name]] <- NA } benchmarks[which(benchmarks$id == test_id), metric_name] <- metrics[metric, "value"] } } }
Finally, we convert columns containing numerical values to the appropriate data
type, close the database connection and return the resulting data frame. Note
that the names of some columns, e.g. rss-0.log
, hdd-1.log
or
tmp_energy_scope_1473279/energy_scope_eprofile_1473279.txt
may vary from one
benchmark to another so we have to determine their names dynamically using a
regular expression.
cols_to_numeric <- colnames(benchmarks) dynamic_columns <- cols_to_numeric[ grepl( paste( "^rss.*\\.log$", "^hdd.*\\.log$", "^likwid.*\\.csv$", ".*energy_scope_eprofile.*\\.txt$", sep = "|" ), cols_to_numeric ) ] cols_to_numeric <- cols_to_numeric[! cols_to_numeric %in% c( c("name", "mapping", "ranking", "binding", "coupled_method", "solver", "variation", "platform", "node", "node_family", "symmetry", "singularity_container", "system_kind", "energy_scope", "config_kind", "solver_config"), dynamic_columns )] benchmarks <- benchmarks %>% mutate_at(cols_to_numeric, as.numeric) dbDisconnect(conn) return(benchmarks) }
4.1.2.2. Trace files
We use multiple benchmarking tools to monitor the usage of storage resources (memory and hard drive), the power and energy consumption as well as the floating-point operation rate and memory bandwidth of our solver suite.
To monitor the memory and hard drive usage, we use a custom Python script named
rss.py
(see Section 3.6). A measure is taken every second during the
execution. There is one consumption log file per MPI process. The names of the
log files matche rss-N.log
for memory usage and hdd-N.log
for hard drive
usage where N
stands for the rank of the corresponding MPI process. In a log
file, each line contains the current timestamp and usage in mibibytes (MiB). The
very last line corresponds to the peak value.
To measure the power and the energy consumption of processing units and RAM, we
use the energy_scope
tool (see Section 3.7). In this case, the
acquisition interval can be defined by the user. As a result, energy_scope
produces an archive containing all the measures in a raw form. It can then
produce a JSON trace file based on this archive. The output file is stored under
./tmp_energy_scope_X/energy_scope_eprofile_X.txt
where X
represents the
identifier of the corresponding Slurm job (see Section 3.2).
Finally, to measure the floating-point operation rate and the memory bandwidth,
we rely on the likwid
tool likwid,treibig2010likwid. likwid
benchmarks
each MPI process of the application separately and stores all the measures into
a comma-separated value file the name of which mathes likwid-N.csv
where N
stands for the rank of the corresponding MPI process.
At the end of a benchmark execution, the above log and trace files are compressed and stored into the SQLite database holding all the benchmark results (see Section 4.1.2.1). Therefore, in the first place, we need to define a function allowing us to extract the files from the database and decompress them.
4.1.2.3. Retrieving files
The function gcvb_retrieve_files
takes a data frame containing the results of
one single benchmark, through the benchmark
argument, retrieved by the
gcvb_import
function (see Section 4.1.2.1) and
formatted using the gcvb_format
function (see Section
4.1.3.1), extracts the corresponding trace files and
decompresses them into the current working directory. Note that the
energy_scope
trace file is renamed to eprofile.txt
to ensure a static file
name and simplify its later processing.
In the first place, we need to dynamically determine the names of the columns containing the log and trace files based on the patterns discussed above.
gcvb_retrieve_files <- function(benchmark) { trace_file_columns <- colnames(benchmark) trace_file_columns <- trace_file_columns[ grepl( paste( "^rss.*\\.log$", "^hdd.*\\.log$", "^likwid.*\\.csv$", ".*energy_scope_eprofile.*\\.txt$", sep = "|" ), trace_file_columns ) ]
Then, we iterate over the corresponding columns in benchmark
and:
for(trace_file_column in trace_file_columns) {
retrieve the compressed version of the file as string,
trace_file_name <- benchmark[[trace_file_column]]
if the file is not empty, strip the
X'
prefix and the'
suffix from this string,if(!is.na(trace_file_name)) { trace_file_name <- substring( trace_file_name, 3, nchar(trace_file_name) - 1 )
split the string into a vector of two-character strings representing the hexadecimal binary content of the file,
trace_file_name <- strsplit(trace_file_name, "")[[1]] trace_file_name <- paste0( trace_file_name[c(TRUE, FALSE)], trace_file_name[c(FALSE, TRUE)] )
change the output file name to
eprofile.txt
, if the file corresponds to anenergy_scope
trace file,if(grepl("energy_scope", trace_file_column, fixed = TRUE)) { filename <- "eprofile.txt.gz" } else { filename <- paste0(basename(trace_file_column), ".gz") }
convert the vector of two-character strings into a raw binary form and restore the corresponding gzip-compressed file,
writeBin(as.raw(as.hexmode(trace_file_name)), filename)
decompress the file into the current working directory and remove its compressed counterpart.
gunzip(filename, overwrite = TRUE) } } }
- Reading
rss.py
trace files
The function
read_rss
constructs an R data frame based on memory usage and optionally on hard drive usage trace files (if theinclude_hdd
switch is set toTRUE
) located inlogs_path
. If a named vector is provided through the optionalaesthetics
argument, the function adds extra constant columns into the data frame using the names from the named vector and initialized with the values of the named vector. Adding such columns may be useful later for using aesthetics function (see Section 4.1.4) when drawing plots. If the output of the function is intended to be combined with anenergy_scope
trace (see Section 4.1.2.3.3), the optionaloverride_beginning
argument can be used to synchronize the output data frame according to theenergy_scope
trace by considering the datetime given by this argument as the alternative beginning of the acquisition. Note that theenergy_scope
trace always starts first asenergy_scope
is always launched before the other benchmarking tools.The
output
data frame shall contain four columns, thetimestamp
column giving the time of the measure in milliseconds since the beginning of the acquisition, the correspondingconsumption
value, thekind
of the measure (sram
for memory usage orshdd
for hard drive usage) and theprocess
rank column.read_rss <- function(logs_path, include_hdd = FALSE, aesthetics = NA, override_beginning = NA) { columns <- c("timestamp", "consumption", "kind", "process") output <- data.frame(matrix(nrow = 0, ncol = length(columns))) colnames(output) <- columns
We begin by listing all the memory and, optionally, hard drive usage trace files in
logs_path
.pattern <- ifelse(include_hdd, "rss|hdd.*\\.log", "rss.*\\.log") logs <- list.files( logs_path, pattern = pattern, full.names = TRUE, include.dirs = FALSE )
Then, we read each log file, strip its last line containing the peak value, which we do not use here, and append the data into the output data frame
data
.Note that in a previous version of
rss.py
, the trace file format was different. There was only one column in the trace file corresponding to the amount of memory or hard drive used. For backward compatibility with old trace files, we continue supporting the former trace file format too.for(log in logs) { count <- length(readLines(log, warn = FALSE)) - 1 data <- read.table(log, nrow = count) if(ncol(data) < 2) { # Legacy format colnames(data) <- c("consumption") data$timestamp <- 0:(count - 1) } else { colnames(data) <- c("timestamp", "consumption") data$timestamp <- as.POSIXct(data$timestamp, format = "%Y/%m/%dT%H:%M:%S") start <- min(data$timestamp)
If the
override_beginning
argument was provided, we need to synchronize thetimestamp
column in such a ways as to consideroverride_beginning
to be the beginning of the trace.if(!is.na(override_beginning)) { beginning <- as.integer(min(data$timestamp) - override_beginning) data$timestamp <- data$timestamp + beginning }
Then, we convert the timestamp to milliseconds and the measured value to gibibytes (GiB).
data$timestamp <- (as.integer(data$timestamp) - as.integer(start)) * 1000 } data$consumption <- as.numeric(data$consumption / 1024.)
Based on the name of the currently processed trace file, we determine the kind of the measure as well as the MPI process rank. We merge the data frame corresponding to the currently processed trace file with the global output data frame and continue with the other trace files, if any.
base <- basename(log) kind <- ifelse(!include_hdd || startsWith(base, "rss"), "sram", "shdd") data$kind <- rep(kind, nrow(data)) process <- gsub("([a-z]+)-(\\d+)\\.log", "\\2", base) data$process <- rep(as.integer(process), nrow(data)) output <- rbind(output, data) }
In case of multiple log files per benchmark, i.e. when a benchmark is run in a distributed fashion, we need to sum up the consumption of all the processes.
processes <- max(output$process) if(processes > 0) { output <- output %>% group_by(timestamp, kind) %>% summarise( consumption = sum(consumption) ) }
Finally, we process the
aesthetics
parameter, if provided, and return the output data frame.if(length(aesthetics) > 0) { for(aesthetic in aesthetics) { name <- names(aesthetics)[aesthetics == aesthetic] output[[name]] <- rep(as.character(aesthetic), nrow(output)) } } return(output) }
- Reading
likwid
trace files
The function
read_likwid
constructs a data frame based onlikwid
trace files located inlogs_path
. If a named vector is provided through the optionalaesthetics
argument, the function adds extra constant columns into the data frame using the names from the named vector and initialized with the values of the named vector (see Section 4.1.2.3.1). If the output of the function is intended to be combined with anenergy_scope
trace (see Section 4.1.2.3.3), the optionalshift
argument can be used to synchronize the output data frame according to theenergy_scope
trace by shifting the output timestamps by the value (in millisecons) of this argument. As the measuring tool provides only timestamps relative to the beginning of the execution, we are unable to provide a more precise synchronization procedure. Therefore, we fallback to considering the beginning of therss.py
trace (see Section 4.1.2.3.1) corresponding to the same test case as t0 for thelikwid
trace file.The
output
data frame shall contain four columns, thetimestamp
column giving the time of the measure in milliseconds since the beginning of the acquisition, the correspondingconsumption
value, thekind
of the measure (flops
for floating-point operation rate andbw
for memory bandwidth) and theprocess
rank column.read_likwid <- function(logs_path, aesthetics = NA, shift = 0) { columns <- c("timestamp", "consumption", "kind", "process") output <- data.frame(matrix(nrow = 0, ncol = length(columns))) colnames(output) <- columns
We begin by listing all the trace files in
logs_path
.pattern <- "likwid.*\\.csv" logs <- list.files( logs_path, pattern = pattern, full.names = TRUE, include.dirs = FALSE )
A
likwid
trace file has two or more heading lines at its beginning. The count of heading lines depends on the count of monitored hardware performance counters or groups of the latter. In our case, we use two groups of performance counters, one for floating-point operation rate and one for memory bandwidth. We thus have three heading lines in ourlikwid-N.csv
trace files. Then, the first four columns (# GID
,MetricsCount
,CpuCount
andTotal runtime [s]
) of the trace file are common for all the performance counters. Each group of performance counters may have a different number of associated metrics and thus a different number of associated columns in the trace file. TheMetricsCount
column gives the number of metrics and allows to compute the number of columns in the trace file associated to a given performance group. In our case, there are 6 metrics for the floating-point operation rate group and 10 for the memory bandwidth group. Moreover, there is a separate column for each processing unit and for each metric. Note that the number of processing units used during the execution is given by theCpuCount
column. In summary, the trace file contains 4 common columns followed by 6 ×CpuCount
columns contaning the values for the metrics of the floating-point operation rate group followed by 10 ×CpuCount
columns containing the values for the metrics of the memory bandwidth group.We read each log file and:
for(log in logs) { input <- read.csv(log, header = FALSE, skip = 3)
determine the count of processing units used for the corresponding test case,
cpus <- as.integer(input[1, 3])
create an intermediate data frame with the timestamps, the floating-point operation rate, corresponding to the metric
Packed DP [MFLOP/s]
, and the memory bandwidth, corresponding to the metricMemory bandwidth [MBytes/s]
,data_flops <- data.frame( timestamp = input[which(input[, 2] == 6), c(4)], consumption = rowSums(input[which(input[, 2] == 6), c((4 + 5 * cpus + 1):(4 + 6 * cpus - 1))]) ) data_bw <- data.frame( timestamp = input[which(input[, 2] == 10), c(4)], consumption = rowSums(input[which(input[, 2] == 10), c((4 + 8 * cpus + 1):(4 + 9 * cpus - 1))]) )
initialize the
kind
columns in these separate data frames,data_flops$kind <- rep("flops", nrow(data_flops)) data_bw$kind <- rep("bw", nrow(data_bw))
merge the separate data frames again into a unique data frame,
data <- rbind(data_flops, data_bw)
convert the timestamp to milliseconds and synchronize it if the
shift
argument was provided,data$timestamp <- as.integer(round(data$timestamp, 0) * 1000) if(shift > 0) { data$timestamp <- data$timestamp + shift }
determine the MPI process rank corresponding to the current trace file based on the name of the latter and initialize the
rank
column,process <- gsub("([a-z]+)-(\\d+)\\.csv", "\\2", basename(log)) data$process <- rep(as.integer(process), nrow(data))
merge the data frame corresponding to the currently processed trace file with the global output data frame and continue with the other trace files, if any.
output <- rbind(output, data) }
In case of multiple log files per benchmark, i.e. when a benchmark is run in distributed fashion, we need to sum up the consumption of all the processes.
processes <- max(output$process) if(processes > 0) { output <- output %>% group_by(timestamp, kind) %>% summarise( consumption = sum(consumption) ) }
Finally, we process the
aesthetics
parameter, if provided, and return the output data frame.if(length(aesthetics) > 0) { for(aesthetic in aesthetics) { name <- names(aesthetics)[aesthetics == aesthetic] output[[name]] <- rep(as.character(aesthetic), nrow(output)) } } return(output) }
- Reading
energy_scope
trace files
The function
read_es
constructs an R data frame based on a givenenergy_scope
JSON data filees_data
. If a named vector is provided through the optionalaesthetics
argument, the function adds extra constant columns into the data frame using the names from the named vector and initialized with the values of the named vector. Adding such columns may be useful later for using aesthetics function (see Section 4.1.4) when drawing plots. The data frame contains also tags. Benchmark executable may print specialenergy_scope
tags to delimit the beginning and the end of a particular portion of computation in theenergy_scope
trace. If theonly
optional argument is provided, only the tags specified by the latter are treated. Theinterval
argument allows one to set the interval (in milliseconds) the measures were taken in. By default, the function considers the interval of 1000 milliseconds.The resulting data frame is composed of the following columns:
tag
representing the tag label,type
indicating whether a line corresponds to a start or a stop tag,type_label
holding prettified version oftype
for plot drawing,timestamp
representing the time of the measure in milliseconds since the beginning of the acquisition,consumption
representing current power consumption of CPU or RAM (in Watts [W]),kind
indicating whetherconsumption
represents power consumption of CPU or RAM,aesthetics
representing an optional named vector describing constant columns to be added to the output data frame in order to simplify plot drawing based on the latter.
Note that the format of the output data frame is compatible, in terms of columns it conains, to the format of data frames produced by the functions
read_rss
(see Section 4.1.2.3.1) andread_likwid
(see Section 4.1.2.3.2) for reading trace files produced by the other benchmarking tools we use. This is to allow for combining different kinds of benchmark data. Indeed, one can provide this function with a data frame obtained withread_rss
through therss_data
argument and extend the output data frame with the measures done by therss.py
monitoring script (see Section 3.6). Similarly, we can extend the output data frame with the measures done bylikwid
and extracted using theread_likwid
function through thelikwid_data
argument. Thetimestamp
column present also inrss_data
as well as inlikwid_data
shall allow us to synchronize the measures.The first step is to read the data from the JSON data file at
es_data
read_es <- function(es_data, aesthetics = c(), only = NA, interval = 1000, rss_data = NA, include_hdd = FALSE, likwid_data = NA) { es_raw <- fromJSON(file = es_data)
and extract the timestamp corresponding to the beginning of acquisition as
es_start
.es_start <- as.POSIXct( es_raw$data$data$tags$es_total$start, format = "%Y/%m/%d %H:%M:%S" )
Then, we collect CPU and RAM power consumption data into the respective data frames
es_cpu
andes_ram
. Thetimestamp
column is generated based ones_start
. Note that the acquisition interval is one second.es_cpu <- data.frame( consumption = as.numeric(es_raw$data$data$`ecpu(W)`), timestamp = as.integer(0) ) for(i in 2:nrow(es_cpu)) { es_cpu$timestamp[i] <- es_cpu$timestamp[i - 1] + interval } es_cpu$kind <- rep("ecpu", nrow(es_cpu)) es_ram <- data.frame( power_total = as.numeric(es_raw$data$data$`etotal(W)`), power_cpu = as.numeric(es_raw$data$data$`ecpu(W)`), timestamp = as.integer(0) ) es_ram$consumption <- es_ram$power_total - es_ram$power_cpu es_ram <- subset(es_ram, select = c("consumption", "timestamp")) for(i in 2:nrow(es_ram)) { es_ram$timestamp[i] <- es_ram$timestamp[i - 1] + interval } es_ram$kind <- rep("eram", nrow(es_ram))
We also make a local copy of
rss_data
if provided.rss_ram <- NA rss_hdd <- NA likwid <- NA if(!is.na(rss_data)) { rss <- read_rss(rss_data, include_hdd, aesthetics, es_start) rss_ram <- subset(rss, kind == "sram") if(include_hdd) { rss_hdd <- subset(rss, kind == "shdd") } actual_start <- min(rss_ram$timestamp) actual_end <- max(rss_ram$timestamp) es_cpu <- subset( es_cpu, timestamp >= actual_start & timestamp <= actual_end ) es_ram <- subset( es_ram, timestamp >= actual_start & timestamp <= actual_end ) } if(!is.na(likwid_data)) { shift <- 0 if(!is.na(rss_data)) { shift <- min(rss_ram$timestamp) } likwid <- read_likwid(likwid_data, aesthetics, shift) }
Now, we initialize additional columns to store tag information.
es_cpu$tag <- NA es_cpu$type <- NA es_cpu$type_label <- NA es_ram$tag <- NA es_ram$type <- NA es_ram$type_label <- NA if(!is.na(rss_data)) { rss_ram$tag <- NA rss_ram$type <- NA rss_ram$type_label <- NA if(include_hdd) { rss_hdd$tag <- NA rss_hdd$type <- NA rss_hdd$type_label <- NA } } if(!is.na(likwid_data)) { likwid$tag <- NA likwid$type <- NA likwid$type_label <- NA }
For each tag considered, we add four new lines into the
es
data frame, i.e. one line with start time and CPU power consumption, one for stop time and CPU power consumption, one for start time and RAM power consumption and one for stop time and RAM power consumption.If during benchmark program execution a loop generates more than one occurrence of a given
energy_scope
tag, a unique identifier is appended to the latter, e.g.tag_0
,tag_1
, etc. Otherwise,energy_scope
would discard all the data related to the tag. However, the presence of such identifiers complicates the post-treatment. Therefore, we need to strip them off the tag nametag
but keep them for later usage ininstance
.for(entire_tag in names(es_raw$data$data$tags)) { tag <- sub("_[^_]+$", "", entire_tag) instance <- tail(strsplit(entire_tag, "_")[[1]], n = 1) instance <- suppressWarnings(as.integer(instance)) instance <- ifelse(is.na(instance), "i", instance)
Here we use the imported JSON data file
es_raw
to get the power consumption values as well as the start and stop timestamps of tags.if(is.na(only) || tag %in% only) { es_start <- as.integer(es_start) start_time <- as.integer( as.POSIXct( es_raw$data$data$tags[[entire_tag]]$start, format = "%Y/%m/%d %H:%M:%S" ) ) stop_time <- as.integer( as.POSIXct( es_raw$data$data$tags[[entire_tag]]$stop, format = "%Y/%m/%d %H:%M:%S" ) ) begin <- (start_time - es_start) * 1000 end <- (stop_time - es_start) * 1000 row <- data.frame( tag = as.character(tag), type = "start", type_label = paste0( "bold(alpha)~", label_tag_short(c(tag), instance) ), timestamp = begin, consumption = es_cpu[ es_cpu$timestamp == begin & is.na(as.character(es_cpu$tag)), "consumption" ], kind = "ecpu" ) es_cpu <- rbind(es_cpu, row) row <- data.frame( tag = as.character(tag), type = "stop", type_label = paste0( "bold(omega)~", label_tag_short(c(tag), instance) ), timestamp = end, consumption = es_cpu[ es_cpu$timestamp == end & is.na(as.character(es_cpu$tag)), "consumption" ], kind = "ecpu" ) es_cpu <- rbind(es_cpu, row) row <- data.frame( tag = as.character(tag), type = "start", type_label = paste0( "bold(alpha)~", label_tag_short(c(tag), instance) ), timestamp = begin, consumption = es_ram[ es_ram$timestamp == begin & is.na(as.character(es_ram$tag)), "consumption" ], kind = "eram" ) es_ram <- rbind(es_ram, row) row <- data.frame( tag = as.character(tag), type = "stop", type_label = paste0( "bold(omega)~", label_tag_short(c(tag), instance) ), timestamp = end, consumption = es_ram[ es_ram$timestamp == end & is.na(as.character(es_ram$tag)), "consumption" ], kind = "eram" ) es_ram <- rbind(es_ram, row) if(!is.na(rss_data)) { consumption <- rss_ram[ rss_ram$timestamp == begin & is.na(as.character(rss_ram$tag)), "consumption" ] consumption <- ifelse(length(consumption) > 0, consumption, NA) if(!is.na(consumption)) { row <- data.frame( tag = as.character(tag), type = "start", type_label = paste0( "bold(alpha)~", label_tag_short(c(tag), instance) ), timestamp = begin, consumption = as.numeric(consumption), kind = "sram" ) rss_ram <- rbind.fill(rss_ram, row) } consumption <- rss_ram[ rss_ram$timestamp == end & is.na(as.character(rss_ram$tag)), "consumption" ] consumption <- ifelse(length(consumption) > 0, consumption, NA) if(!is.na(consumption)) { row <- data.frame( tag = as.character(tag), type = "stop", type_label = paste0( "bold(omega)~", label_tag_short(c(tag), instance) ), timestamp = end, consumption = as.numeric(consumption), kind = "sram" ) rss_ram <- rbind.fill(rss_ram, row) } if(include_hdd) { consumption <- rss_hdd[ rss_hdd$timestamp == begin & is.na(as.character(rss_hdd$tag)), "consumption" ] consumption <- ifelse(length(consumption) > 0, consumption, NA) if(!is.na(consumption)) { row <- data.frame( tag = as.character(tag), type = "start", type_label = paste0( "bold(alpha)~", label_tag_short(c(tag), instance) ), timestamp = begin, consumption = as.numeric(consumption), kind = "shdd" ) rss_hdd <- rbind.fill(rss_hdd, row) } consumption <- rss_hdd[ rss_hdd$timestamp == end & is.na(as.character(rss_hdd$tag)), "consumption" ] consumption <- ifelse(length(consumption) > 0, consumption, NA) if(!is.na(consumption)) { row <- data.frame( tag = as.character(tag), type = "stop", type_label = paste0( "bold(omega)~", label_tag_short(c(tag), instance) ), timestamp = end, consumption = as.numeric(consumption), kind = "shdd" ) rss_hdd <- rbind.fill(rss_hdd, row) } } } if(!is.na(likwid_data)) { consumption <- likwid[ likwid$timestamp == begin & is.na(as.character(likwid$tag)), "consumption" ] kind <- likwid[ likwid$timestamp == begin & is.na(as.character(likwid$tag)), "kind" ] consumption <- ifelse(length(consumption) > 0, consumption, NA) kind <- ifelse(length(kind) > 0, kind, NA) if(!is.na(consumption)) { row <- data.frame( tag = as.character(tag), type = "start", type_label = paste0( "bold(alpha)~", label_tag_short(c(tag), instance) ), timestamp = begin, consumption = as.numeric(consumption), kind = kind ) likwid <- rbind.fill(likwid, row) } consumption <- likwid[ likwid$timestamp == end & is.na(as.character(likwid$tag)), "consumption" ] kind <- likwid[ likwid$timestamp == end & is.na(as.character(likwid$tag)), "kind" ] consumption <- ifelse(length(consumption) > 0, consumption, NA) kind <- ifelse(length(kind) > 0, kind, NA) if(!is.na(consumption)) { row <- data.frame( tag = as.character(tag), type = "stop", type_label = paste0( "bold(omega)~", label_tag_short(c(tag), instance) ), timestamp = end, consumption = as.numeric(consumption), kind = kind ) likwid <- rbind.fill(likwid, row) } } } }
If the
aesthetics
argument is provided, we include constant columns described by the latter to the resulting data frame. Finally, we return a single data frame combininges_cpu
,es_ram
andes_rss
if therss_data
argument was provided. In this case, we also include an additional column to the three data frames to distinguish between power and storage resources consumption. This is useful for plotting figures (see Section 4.1.5.13).if(length(aesthetics) > 0) { for(aesthetic in aesthetics) { name <- names(aesthetics)[aesthetics == aesthetic] es_cpu[[name]] <- rep(as.character(aesthetic), nrow(es_cpu)) es_ram[[name]] <- rep(as.character(aesthetic), nrow(es_ram)) if(!is.na(rss_data)) { rss_ram[[name]] <- rep(as.character(aesthetic), nrow(rss_ram)) if(include_hdd) { rss_hdd[[name]] <- rep(as.character(aesthetic), nrow(rss_hdd)) } } if(!is.na(likwid_data)) { likwid[[name]] <- rep(as.character(aesthetic), nrow(likwid)) } } } es <- rbind(es_cpu, es_ram) es$kind_general <- "e" if(!is.na(likwid_data)) { likwid$kind_general <- likwid$kind es <- rbind.fill(es, likwid) } if(!is.na(rss_data)) { rss_ram$kind_general <- "s" if(include_hdd) { rss_hdd$kind_general <- "s" es <- rbind.fill(es, rbind(rss_ram, rss_hdd)) } else { es <- rbind.fill(es, rss_ram) } es$timestamp <- es$timestamp - min(rss_ram$timestamp) } return(es) }
- Reading timeline trace files
The function
read_timeline
allows us to read atest_FEMBEM
timeline tracetrace
, extract data about selected computationphases
from the latter and return it in form of a data frame suitable for drawing axis intercepting lines with the associated labels using therss_by_time
plot function (see Section 4.1.5.12).phases
represents a data frame of selected computation phases to extract information about. Note that one computation phaselabel
can appear more than once if the corresponding function was called multiple times. This is why the data frame contains theoccurrence
column (see Table 4) indicating which occurrence of a given phase should be considered. Thetype
column indicates whether a given row in the data frame designs the beginning (ENTER
) or the end (EXIT
) of the corresponding computation phase. In this data frame, one can provide additional information, i.e. prettified computation phase labels for better data readability (see Table 4). Thephases
data frame is used to construct the output data frame of this function.Table 4: Example of a data frame that can be passed to read_timeline
through thephases
argument.type name label occurrence EXIT mpf_solvegemm() Block-wise Schur complement computation 1 … … … … We begin by reading the timeline trace
trace
.read_timeline <- function(trace, phases) { complete_timeline <- read.delim(trace, header = FALSE)
For an easier column access, we add column names to the newly created data frame.
colnames(complete_timeline) <- c("rank", "time", "type", "duration", "path")
Then, we construct the resulting
timeline
data frame containing only the data we are interested in based on the inputphases
data frame. The new data frame contains three additional columns:time
providing the corresponding computation phase delimiter timestamp,zero
a constant zero column useful for plotting vertical or horizontal axis intercepts,middle
providing the values representing the middles of the intervals defined by the values of thetime
column useful to center phase labels in therss_by_time
plot function (see Section 4.1.5.12).
timeline <- phases timeline$time <- NA timeline$middle <- NA timeline$zero <- rep(0, nrow(timeline)) for(i in 1:length(phases)) { time <- complete_timeline[ complete_timeline$type == phases[[i]]["type"] & str_detect(complete_timeline$path, paste0(phases[[i]]["label"], "$")), "time" ] time <- time[[as.integer(phases[[i]]["occurrence"])]] middle <- ifelse( i > 1, time - ((time - timeline[i - 1, 1]) / 2), time / 2 ) timeline[i, "time"] <- as.integer(time) timeline[i, "middle"] <- as.double(middle) }
At the end, we return the
timeline
data frame.return(timeline) }
4.1.3. Formatting data
4.1.3.1. Benchmark results
Before actually visualizing data, we have to format it. This is the role of the
function gcvb_format
which takes a single argument - data
, an R data frame
containing benchmark results imported by the gcvb_import
function (see Section
4.1.2).
gcvb_format <- function(data) {
Due to the way certain benchmarks are defined in gcvb
, the solver
column may
contain non-uniform values, i.e., in some cases, coupled solver names use dashes
instead of slashes, which are the expected separators. For example, we need to
replace the dash in mumps-spido
by a slash so it becomes mumps/spido
.
data$solver <- ifelse( data$solver == "mumps-spido", "mumps/spido", ifelse(data$solver == "mumps-hmat", "mumps/hmat", data$solver) )
Names of columns holding the same information, such as factorization and solve time, expected result accuracy and so on, vary depending on the solver used during a given benchmark. To ease further data treatment, we fusion these into common columns.
data$tps_facto <- ifelse( is.na(data$tps_cpu_facto_mpf), data$tps_cpu_facto, data$tps_cpu_facto_mpf ) data$tps_solve <- ifelse( is.na(data$tps_cpu_solve_mpf), data$tps_cpu_solve, data$tps_cpu_solve_mpf ) data$desired_accuracy <- ifelse( !is.na(data$mumps_blr_accuracy), data$mumps_blr_accuracy, NA ) data$size_schur <- ifelse( is.na(data$size_schur) & data$coupled_method == "multi-facto", data$disk_block_size, data$size_schur ) if("cols_schur" %in% colnames(data)) { data$cols_schur <- ifelse( data$coupled_method == "multi-solve" & data$solver == "mumps/spido", data$coupled_nbrhs, data$cols_schur ) } data$desired_accuracy <- ifelse( (data$solver == "hmat-bem" | data$solver == "hmat-fem" | data$solver == "hmat-fem-nd") & !is.na(data$h_assembly_accuracy), data$h_assembly_accuracy, data$desired_accuracy )
Then, we must remove from the data frame the lines corresponding to unfinished jobs, i.e. lines without computation time information or where the computation time is null.
data <- subset(data, !is.na(tps_facto)) data <- subset(data, !is.na(tps_solve)) data <- subset(data, tps_facto > 0.0) data <- subset(data, tps_solve > 0.0)
If the dense_ooc
column is present, we have to take care of NA
values so that
we can make the column a factor. NA
appears in case of purely in-core benchmarks
that do not position this variable at all. Finaly, we replace NA
by -1
.
if("dense_ooc" %in% colnames(data)) { data$dense_ooc <- ifelse( is.na(data$dense_ooc), -1, data$dense_ooc ) data$dense_ooc <- as.character(data$dense_ooc) data$dense_ooc <- factor(data$dense_ooc, levels = c("-1", "0", "1")) }
The same goes for the energy_scope
column. In this case, NA
means that the
tool was not used.
if("energy_scope" %in% colnames(data)) { data$energy_scope <- ifelse( is.na(data$energy_scope), "off", data$energy_scope ) data$energy_scope <- as.character(data$energy_scope) data$energy_scope <- factor(data$energy_scope, levels = c("off", "on")) }
Some required columns are not present in the data frame by default, so we have to compute them based on existing data.
The total of cores used for computation is the combination of MPI process count and thread count.
data$p_units <- data$omp_thread_num * data$processes
The value of n_blocks
represents the number of blocks the dense submatrix
\(A_{ss}\) is split into during the Schur complement computation in the two-stage
multi-factorization implementation. When this value is not explicitly set by the
user, n_blocks
equals auto
. Although, for the plot drawing functions to
operate correctly, we need to recompute the corresponding numerical values.
if("n_blocks" %in% colnames(data)) { data$n_blocks <- ifelse( data$n_blocks == "auto", ceiling(data$nbem / data$size_schur), as.numeric(data$n_blocks) ) }
Eventually, factorization and solve efficiency computation is split into several steps:
- gathering best sequential times for each solver,
sequentials <- subset( data, select = c("solver", "tps_facto", "tps_solve", "nbpts"), data$p_units == 1 ) if(nrow(sequentials) > 0) { sequentials <- merge( aggregate(tps_facto ~ solver + nbpts, min, data = sequentials), sequentials ) sequentials <- sequentials %>% dplyr::rename( tps_facto_seq = tps_facto, tps_solve_seq = tps_solve )
- adding a column containing sequential time for every run in the data frame,
data <- dplyr::left_join(data, sequentials, by = c("solver", "nbpts"))
- computing the efficiency itself.
data$efficiency_facto <- data$tps_facto_seq / (data$p_units * data$tps_facto) data$efficiency_solve <- data$tps_solve_seq / (data$p_units * data$tps_solve) } return(data) }
4.1.4. Style presets
We also define a common set of data visualization elements to ensure a unified graphic output when drawing plots.
4.1.4.1. Label functions
Label functions allow us to format and prettify column names or values when used in plot legends.
label_percentage
This function multiplies
value
by 100 and returns the results as a string with appended percentage sign.label_percentage <- function(value) { return(paste0(value * 100, "%")) }
label_time
This function converts time values in seconds into more human-readable equivalents.
label_time <- function(labels) { out <- c() for (i in 1:length(labels)) { if(!is.na(labels[i])) { days <- as.integer(labels[i] / 86400) hours <- as.integer((labels[i] - (86400 * days)) / 3600) minutes <- as.integer((labels[i] - (86400 * days) - (3600 * hours)) / 60) seconds <- as.integer( labels[i] - (86400 * days) - (3600 * hours) - (60 * minutes) ) out[i] <- "" if(days > 0) { out[i] <- paste0(out[i], days, "d") } if(hours > 0) { out[i] <- paste0(out[i], hours, "h") } if(minutes > 0 && days < 1) { out[i] <- paste0(out[i], minutes, "m") } if(seconds > 0 && days < 1 && hours < 1) { out[i] <- paste0(out[i], seconds, "s") } if(seconds < 1 && minutes < 1 && hours < 1 && days < 1) { out[i] <- "0s" } } else { out[i] <- "?" } } return(out) }
label_time_short
This function converts time values in seconds into short more human-readable equivalents.
label_time_short <- function(labels) { out <- c() for (i in 1:length(labels)) { if(!is.na(labels[i])) { days <- as.integer(labels[i] / 86400) hours <- as.integer((labels[i] - (86400 * days)) / 3600) minutes <- as.integer((labels[i] - (86400 * days) - (3600 * hours)) / 60) seconds <- as.integer( labels[i] - (86400 * days) - (3600 * hours) - (60 * minutes) ) if(days > 0) { out[i] <- paste0(days, "d") } else if(hours > 0) { out[i] <- paste0(hours, "h") } else if(minutes > 0) { out[i] <- paste0(minutes, "m") } else if(seconds > 0) { out[i] <- paste0(seconds, "s") } else { out[i] <- "0s" } } else { out[i] <- "?" } } return(out) }
label_epsilon
This function converts solution accuracy threshold values from numeric values to strings having the form \(1 \times 10^{-n}\) or to ‘unset (no compression)’ if the value is not assigned like in the case of solvers not using data compression.
label_epsilon <- function(labels) { out <- c() for (i in 1:length(labels)) { if(!is.na(labels[i])) { out[i] <- formatC( as.numeric(labels[i]), format = "e", digits = 1 ) } else { out[i] <- "unset (no compression)" } } return(out) }
label_storage
This function prettifies storage support names.
label_storage <- function(labels) { out <- c() for(i in 1:length(labels)) { out[i] <- switch( as.character(labels[i]), "rm_peak" = "Memory (RAM)", "swap_peak" = "Swap (pagefile)", "hdd_peak" = "Disk drive" ) } return(out) }
label_ooc
This function prettifies the names of facet grids distinguishing between benchmarks with and without out-of-core computation (column
dense_ooc
in the data frame).label_ooc <- function(labels) { out <- c() for(i in 1:length(labels)) { out[i] <- switch( labels[i], "-1" = "All in-core", "0" = "Out-of-core except Schur complement", "1" = "All out-of-core" ) } return(out) }
label_solver
This function prettifies solver names.
label_solver <- function(labels) { out <- c() for(i in 1:length(labels)) { out[i] <- switch( as.character(labels[i]), "spido" = "SPIDO", "hmat-bem" = "HMAT", "hmat-fem" = "HMAT", "hmat-fem-nd" = "HMAT (proto-ND)", "hmat-oss" = "HMAT-OSS", "chameleon" = "Chameleon", "mumps" = "MUMPS", "mumps-blr" = "MUMPS (compressed)", "mumps-no-blr" = "MUMPS (uncompressed)", "qr_mumps" = "QRM", "mumps/spido" = "MUMPS/SPIDO", "qr_mumps/spido" = "QRM/SPIDO", "mumps/hmat" = "MUMPS/HMAT", "hmat/hmat" = "HMAT" ) } return(out) }
label_solver_config
This function prettifies solver configuration labels.
label_solver_config <- function(labels) { out <- c() for(i in 1:length(labels)) { out[i] <- switch( as.character(labels[i]), "vanilla" = "without advanced features", "sparse-rhs" = "sparse right-hand side", "blr-dense-rhs" = "compression", "blr-sparse-rhs" = "compression and sparse right-hand side" ) } return(out) }
label_tag
This function prettifies
energy_scope
tag names.label_tag <- function(labels) { out <- c() for(i in 1:length(labels)) { out[i] <- switch( as.character(labels[i]), # Multi-solve "CreateSchurComplement_MultiSolve_MUMPS" = expression( italic(S[i] == A[ss[i]] - A[sv[i]]*A[vv]^{-1}*A[sv[i]]^{T}) ~ "(Schur compl. block)" ), "CreateSchurComplement_MultiSolve_MUMPS_solve" = expression( italic(A[vv]^{-1}*A[sv[i]]^{T}) ~ "(sparse solve)" ), "CreateSchurComplement_MultiSolve_MUMPS_product" = expression( italic(A[sv[i]]*A[vv]^{-1}*A[sv[i]]^{T}) ~ "(SpMM)" ), "MPF_SolveGEMM_MUMPS_HMAT_assembly" = expression( italic(compress * group("(", A[sv[i]]*A[vv]^{-1}*A[sv[i]]^{T}, ")")) ~ "(SpMM)" ), "MPF_SolveGEMM_MUMPS_SPIDO_AXPY" = expression( italic(A[ss[i]] - A[sv[i]]*A[vv]^{-1}*A[sv[i]]^{T}) ~ "(AXPY)" ), "MPF_SolveGEMM_MUMPS_HMAT_AXPY" = expression( italic(A[ss[i]] - A[sv[i]]*A[vv]^{-1}*A[sv[i]]^{T}) ~ "(compressed AXPY)" ), "MPF_SolveGEMM_MUMPS" = expression( italic(S == A[ss] - A[sv]*A[vv]^{-1}*A[sv]^{T}) ~ "(Schur complement)" ), # Multi-factorization "CreateSchurComplement" = expression( italic(S[i] == A[ss[ij]] - A[sv[i]]*A[vv]^{-1}*A[sv[j]]^{T}) ~ "(Schur compl. block)" ), "CreateSchurComplement_factorization" = expression( italic(X[ij] == - A[sv[i]]*A[vv]^{-1}*A[sv[j]]^{T}) ~ "(Schur compl. block)" ), "MPF_multifact_MUMPS" = expression( italic(S == A[ss] - A[sv]*A[vv]^{-1}*A[sv]^{T}) ~ "(Schur complement)" ), "MPF_multifact_MUMPS_HMAT" = expression( italic(S == A[ss] - A[sv]*A[vv]^{-1}*A[sv]^{T}) ~ "(Schur complement)" ), # Common "testMPF_factorisation_Schur" = expression( italic(S^{-1}) ~ "(Schur factorization)" ), "testMPF_factorisation" = expression( italic(A^{-1}) ~ "(factorization of" ~ italic(A) * ")" ), "testMPF_solve" = expression( italic(A^{-1}*x) ~ "(system solution)" ) ) } return(out) }
label_tag_short
This function do the same thing as
label_tag
but returns shorter labels than the latter.label_tag_short <- function(labels, instance = "i") { out <- c() for(i in 1:length(labels)) { out[i] <- switch( as.character(labels[i]), # Multi-solve "CreateSchurComplement_MultiSolve_MUMPS" = paste0( "italic(S[", instance, "])" ), "CreateSchurComplement_MultiSolve_MUMPS_solve" = paste0( "italic(solve[", instance, "])" ), "CreateSchurComplement_MultiSolve_MUMPS_product" = paste0( "italic(SpMM[", instance, "])" ), "MPF_SolveGEMM_MUMPS_HMAT_assembly" = paste0( "italic(compress[", instance, "])" ), "MPF_SolveGEMM_MUMPS_HMAT_AXPY" = paste0( "italic(CAXPY[", instance, "])" ), "MPF_SolveGEMM_MUMPS_SPIDO_AXPY" = paste0( "italic(AXPY[", instance, "])" ), "MPF_SolveGEMM_MUMPS" = "italic(S)", "CreateSchurComplement" = paste0( "italic(S[", instance, "])" ), # Multi-factorization "CreateSchurComplement_factorization" = paste0( "italic(X[", instance, "])" ), "MPF_multifact_MUMPS" = "italic(S)", "MPF_multifact_MUMPS_HMAT" = "italic(S)", # Common "testMPF_factorisation_Schur" = "italic(S^{-1})", "testMPF_factorisation" = "italic(A^{-1})", "testMPF_solve" = "italic(A^{-1}*x)" ) } return(out) }
label_kind
This function prettifies power and storage resources consumption kinds (see Section 4.1.2.3.3).
label_kind <- function(labels) { out <- c() for(i in 1:length(labels)) { out[i] <- switch( as.character(labels[i]), "ecpu" = "CPU", "eram" = "RAM", "sram" = "RAM", "shdd" = "Disk", "flops" = "Flop rate", "bw" = "RAM bandwidth" ) } return(out) }
label_mapping
This function transforms MPI process mapping values into strings giving a more detailed information about the underlying parallel configuration.
label_mapping <- function(labels) { out <- c() for(i in 1:length(labels)) { out[i] <- switch( as.character(labels[i]), "node" = "1 unbound MPI process \U2A09 1 to 24 threads", "socket" = "2 MPI processes bound to sockets \U2A09 1 to 12 threads", "numa" = "4 MPI processes bound to NUMAs \U2A09 1 to 6 threads", "core" = "1 to 24 MPI processes bound to cores \U2A09 1 thread" ) } return(out) }
label_nbpts
This function converts problem's unknown counts into the form
N = <count>
where<count>
uses the scientific notation of the associated numerical values.label_nbpts <- function(labels) { out <- c() for (i in 1:length(labels)) { out[i] <- paste( "\U1D441 =", as.character( round(as.numeric(labels[i]) / 1e+06, digits = 2) ), "M" ) } return(out) }
label_nbpts_ticks
This function converts problem's unknown counts into the form
<count>M
where<count>
is converted to millions and rounded to two decimals.label_nbpts_ticks <- function(value) { return( paste0( as.character( round(value / 1e+06, digits = 2) ), "M" ) ) }
label_scientific
This function converts numeric label values to scientific notation.
label_scientific <- function(labels) { out <- c() for (i in 1:length(labels)) { out[i] <- formatC( as.numeric(labels[i]), format = "e", digits = 1 ) } return(out) }
label_coupling
andlabel_coupling_two_stage
These functions prettify names of the implementation schemes for the solution of coupled linear systems.
label_coupling <- function(labels) { out <- c() for(i in 1:length(labels)) { out[i] <- switch( as.character(labels[i]), "multi-solve" = "Multi-solve (two-stage)", "multi-facto" = "Multi-factorization (two-stage)", "full-hmat" = "Partially sparse-aware (single-stage)" ) } return(out) } label_coupling_short <- function(labels) { out <- c() for(i in 1:length(labels)) { out[i] <- switch( as.character(labels[i]), "multi-solve" = "Multi-solve", "multi-facto" = "Multi-factorization", "full-hmat" = "Partially sparse-aware" ) } return(out) }
label_peak_kind
This function prettifies peak memory usage value kinds.
label_peak_kind <- function(labels) { out <- c() for(i in 1:length(labels)) { out[i] <- switch( as.character(labels[i]), "assembly_estimation" = expression("assembled system"), "schur_estimation" = expression("Schur complement matrix " * italic(S)), "peak_symmetric_factorization" = expression( "symmetric factorization of " * italic(A[vv])), "peak_non_symmetric_factorization" = expression( "non-symmetric factorization of " * italic(A[vv])) ) } return(out) }
4.1.4.2. Label maps
A label map is a variation of a label function. It provides alternative names for given columns of a dataframe.
phase.labs
This label map provides prettified names for factorization and solve time columns
tps_facto
andtps_solve
, the corresponding efficiency columnsefficiency_facto
andefficiency_solve
as well as the column holding the associated execution time of the Schur block creation phaseCreateSchurComplement_exec_time
.phase.labs <- c( "Factorization", "Solve", "Factorization", "Solve", "Schur complement computation" ) names(phase.labs) <- c( "tps_facto", "tps_solve", "efficiency_facto", "efficiency_solve", "CreateSchurComplement_exec_time" )
4.1.4.3. Plot theme
To preserve a unique color palette as well as sets of point shapes and line types throughout all the plots, we provide each legend entry with the information on its:
- color,
colors <- c( "0.001" = "#F07E26", "1e-06" = "#9B004F", "node" = "#1488CA", "socket" = "#95C11F", "numa" = "#FFCD1C", "core" = "#6561A9", "1 (24 threads)" = "#E63312", "2 (48 threads)" = "#F07E26", "4 (96 threads)" = "#9B004F", "8 (192 threads)" = "#1488CA", "(32, 32)" = "#F07E26", "(48, 48)" = "#9B004F", "(64, 64)" = "#1488CA", "(128, 128)" = "#95C11F", "(256, 256)" = "#E63312", "(256, 512)" = "#FFCD1C", "(512, 512)" = "#800080", "(256, 1024)" = "#6561A9", "(256, 2048)" = "#89CCCA", "(256, 4096)" = "#384257", "1" = "#1488CA", "2" = "#95C11F", "3" = "#FFCD1C", "4" = "#6561A9", "5" = "#384257", "7" = "#89CCCA", "11" = "#9B004F", "250000" = "#384257", "5e+05" = "#1488CA", "1e+06" = "#E63312", "1200000" = "#95C11F", "1400000" = "#FFCD1C", "1500000" = "#6561A9", "2e+06" = "#F07E26", "2500000" = "#9B004F", "multi-facto" = "#6561A9", "multi-solve" = "#89CCCA", "full-hmat" = "#C7D64F", "spido" = "#384257", "hmat" = "#9B004F", "mumps" = "#F07E26", "mumps-blr" = "#1488CA", "qr_mumps" = "#95C11F", "mumps/spido" = "#95C11F", "mumps/hmat" = "#FFCD1C", "ecpu" = "#384257", "eram" = "#E63312", "sram" = "#1488CA", "shdd" = "#95C11F", "flops" = "#9B004F", "bw" = "#1488CA", "vanilla" = "#1488CA", "sparse-rhs" = "#95C11F", "blr-dense-rhs" = "#FFCD1C", "blr-sparse-rhs" = "#6561A9" )
- point shape style,
shapes <- c( "spido" = 16, "hmat-bem" = 15, "hmat-fem" = 15, "hmat-oss" = 15, "hmat-fem-nd" = 23, "chameleon" = 18, "mumps" = 16, "qr_mumps" = 17, "mumps/spido" = 16, "qr_mumps/spido" = 17, "mumps/hmat" = 15, "hmat/hmat" = 18, "rm_peak" = 16, "swap_peak" = 17, "hdd_peak" = 15, "CreateSchurComplement_MultiSolve_MUMPS" = 22, "CreateSchurComplement_MultiSolve_MUMPS_solve" = 22, "CreateSchurComplement_MultiSolve_MUMPS_product" = 21, "MPF_SolveGEMM_MUMPS_HMAT_assembly" = 23, "MPF_SolveGEMM_MUMPS_SPIDO_AXPY" = 24, "MPF_SolveGEMM_MUMPS_HMAT_AXPY" = 25, "MPF_SolveGEMM_MUMPS" = 23, "CreateSchurComplement" = 22, "CreateSchurComplement_factorization" = 22, "MPF_multifact_MUMPS" = 23, "MPF_multifact_MUMPS_HMAT" = 23, "testMPF_factorisation_Schur" = 24, "testMPF_factorisation" = 23, "testMPF_solve" = 24 )
- line type.
linetypes <- c( "spido" = "solid", "hmat-bem" = "dotted", "hmat-fem" = "dotted", "hmat-oss" = "dotted", "hmat-fem-nd" = "longdash", "chameleon" = "longdash", "mumps" = "solid", "qr_mumps" = "dashed", "mumps/spido" = "solid", "qr_mumps/spido" = "dashed", "mumps/hmat" = "dotted", "hmat/hmat" = "longdash", "rm_peak" = "solid", "swap_peak" = "dotted", "hdd_peak" = "longdash", "assembly_estimation" = "solid", "peak_symmetric_factorization" = "dotted", "peak_non_symmetric_factorization" = "longdash", "schur_estimation" = "dashed" )
Furthermore, each plot object uses a set of common theme layers provided by the
function generate_theme
. If needed, the latter allows us to provide custom
breaks for color, point shape or line type aesthetics, legend title, the number
of lines allowed in the legend as well as the legend box placement.
generate_theme <- function( color_breaks = waiver(), color_labels = waiver(), color_override_aes = list(), shape_breaks = waiver(), shape_labels = waiver(), shape_override_aes = list(), linetype_breaks = waiver(), linetype_labels = waiver(), linetype_override_aes = list(), legend_title = element_text(family = "CMU Serif", size = 14, face = "bold"), legend_text = element_text(family = "CMU Serif", size = 14), theme_text = element_text(family = "CMU Serif", size = 16), legend_rows = 1, legend_rows_color = NA, legend_rows_fill = NA, legend_rows_shape = NA, legend_rows_linetype = NA, legend_box = "vertical", legend_position = "bottom" ) { return(list( scale_color_manual( values = colors, na.value = "#384257", labels = color_labels, breaks = color_breaks ), scale_fill_manual( values = colors, na.value = "#384257", labels = color_labels, breaks = color_breaks ), scale_shape_manual( values = shapes, labels = shape_labels, breaks = shape_breaks, na.translate = F ), scale_linetype_manual( values = linetypes, labels = linetype_labels, breaks = linetype_breaks ), guides( shape = guide_legend( order = 1, nrow = ifelse( is.na(legend_rows_shape), legend_rows, legend_rows_shape ), byrow = TRUE, override.aes = shape_override_aes ), linetype = guide_legend( order = 1, nrow = ifelse( is.na(legend_rows_linetype), legend_rows, legend_rows_linetype ), byrow = TRUE, override.aes = linetype_override_aes ), color = guide_legend( order = 2, nrow = ifelse( is.na(legend_rows_color), legend_rows, legend_rows_color ), byrow = TRUE, override.aes = color_override_aes ), fill = guide_legend( order = 2, nrow = ifelse( is.na(legend_rows_fill), legend_rows, legend_rows_fill ), byrow = TRUE ) ),
We set the font family to Computer Modern Serif in order to match the font family used in LaTeX typeset PDF documents. Then, we set the font size to 16 points and the legend text size to 14 points.
theme_bw(), theme( text = theme_text, legend.text = legend_text, legend.title = legend_title,
We place the legend at the desired position and sourround it with a rectangle.
legend.position = legend_position, legend.background = element_rect(color = "gray40", size = 0.5),
We place legend items either side by side or one by line.
legend.box = legend_box,
For better visibility, we make legend symbols longer.
legend.key.width = unit(3, "line"),
Finally, we rotate X-axis text to avoid long label overwriting.
axis.text.x = element_text(angle = 60, hjust = 1) ) )) }
4.1.5. Plot functions
In performance analysis, the metrics we study are in general the same. So, we prepared a series of plot functions for each type of data visualization. This way, we can for example reuse the same function to plot disk usage peaks by problem unknown's count for any combination of solvers.
4.1.5.1. times_by_nbpts
This function returns a plot of factorization and solve times relatively to linear system's unknown count for a given set of either dense or sparse solvers.
We combine both metrics in one plot in which they are distinguished thanks to a facet grid. Although, to be able to use the latter, we need to convert the data frame from wide format into long format (see Section 4.1.2) RwideToLong.
The solver_config
option allows one to distinguish between different solver
configurations on the figure instead of different precision parameter
\(\epsilon\) values.
By default, the trans_x
and trans_y
arguments set the axis scales to
logarithmic using log10
. To use standard scale, the value identity
should be
used. The time_break_width
allows one to set custom Y-axis break widths (in
seconds).
times_by_nbpts <- function(dataframe, solver_config = FALSE, trans_x = "log10", trans_y = "log10", time_break_width = 3600) { dataframe_long <- gather( dataframe, key = "computation_phase", value = "time", c("tps_facto", "tps_solve") )
Now, we can begin to define the resulting ggplot2
plot object. The column
names featured in this plotting function are:
nbpts
linear system's unknown count,tps_facto
represents factorization time in seconds,tps_solve
represents solve time in seconds,solver_config
gives the solver configuration used,desired_accuracy
represents solution accuracy threshold \(\epsilon\),computation_phase
tells whether the row contains factorization or solve time (used in the long formatted data framedataframe_long
),time
contains factorization or solve time (used in the long formatted data framedataframe_long
).
plot <- ggplot(dataframe_long, aes(x = nbpts, y = time))
If solver_config
is TRUE
, we want to distinguish between different solver
configurations.
if(solver_config) { plot <- ggplot( dataframe_long, aes( x = nbpts, y = time, color = as.character(solver_config), shape = solver, linetype = solver ) ) } else if(!all(is.na(dataframe_long$desired_accuracy))) {
Otherwise, we will try to distinguish between various data compression levels.
plot <- ggplot( dataframe_long, aes( x = nbpts, y = time, color = as.character(desired_accuracy), shape = solver, linetype = solver ) ) }
Then, we scale the axes.
plot <- plot + geom_line() + geom_point(size = 2.5) + scale_x_continuous( "# Unknowns (\U1D441)", trans = trans_x, breaks = dataframe_long$nbpts, labels = label_nbpts_ticks ) + scale_y_continuous( "Computation time", trans = trans_y, breaks = breaks_width(time_break_width), labels = label_time ) +
Finally, using facet grid we distinguish between factorization and solve time
on horizontal and between each solver considered on vertical row. The plot theme
is provided by the generate_theme
function (see Section 4.1.4.3).
The color label functions depends on the value of solver_config
.
facet_grid( . ~ computation_phase, labeller = labeller(computation_phase = phase.labs) ) if(solver_config) { plot <- plot + labs(color = "Configuration", shape = "Solver", linetype = "Solver") + generate_theme( color_labels = label_solver_config, color_override_aes = list(linetype = 0, shape = 15, size = 8), shape_labels = label_solver, linetype_labels = label_solver, legend_rows = length(unique(na.omit(dataframe_long$solver_config))) / 2 ) } else { plot <- plot + labs(color = "\U1D700", shape = "Solver", linetype = "Solver") + generate_theme( color_labels = label_epsilon, color_override_aes = list(linetype = 0, shape = 15, size = 8), shape_labels = label_solver, linetype_labels = label_solver ) } return(plot) }
4.1.5.2. multisolve_times_by_nbrhs_and_nbpts
This function returns a plot of factorization time relatively to linear system's
unknown count for different counts of right-hand sides used during the Schur
complement computation when relying on the two-stage multi-solve implementation
scheme. Using the ooc
switch, we can include a facet grid to distinguish
between runs with and without out-of-core Schur complement computation. The
distributed
switch shall adapt the output figure for multi-node benchmarks if
set to TRUE
. y_break_width
allows one to customize the distance between
Y-axis breaks. By default, it corresponds to 20% of the maximum Y-axis value, i.e.
the maximum value of the tps_facto
column.
Now, we can begin to define the plot object. The column names featured in this plotting function are:
nbpts
linear system's unknown count,tps_facto
represents factorization time in seconds,tps_solve
represents solve time in seconds,processes
gives the number of MPI processes used for the computation,p_units
gives the total number of threads used for the computation,cols_schur
gives the number of columns in a Schur complement block,solver
contains the names of solvers featured in the coupling.
Note that we create an additional column p_labels
. It contains the legend
labels for the color aesthetics combining the values of processes
and
p_units
, e.g. 2 (48 threads)
for 2 nodes and 48 threads in total.
multisolve_times_by_nbpts_and_nbrhs <- function(dataframe, ooc = FALSE, distributed = FALSE) { dataframe$cols_schur_labels <- paste0( "(", dataframe$coupled_nbrhs, ", ", dataframe$cols_schur, ")" ) cbreaks <- unique(na.omit(dataframe$cols_schur_labels)) cbreaks <- cbreaks[order( as.integer(gsub("\\(([0-9]+), ([0-9]+)\\)", "\\1", cbreaks)), as.integer(gsub("\\(([0-9]+), ([0-9]+)\\)", "\\2", cbreaks)))] if(distributed) { dataframe$p_labels <- paste0( dataframe$processes, " (", dataframe$p_units, " threads)" ) plot <- ggplot( dataframe, aes( x = nbpts, y = tps_facto, color = p_labels, shape = solver, linetype = solver ) ) } else { plot <- ggplot( dataframe, aes( x = nbpts, y = tps_facto, color = as.character(cols_schur_labels), shape = solver, linetype = solver ) ) } plot <- plot + geom_line() + geom_point(size = 2.5) +
The X-axis shows the count of unknowns in the linear system and the Y-axis shows the computation time.
scale_x_continuous( "# Unknowns (\U1D441)", breaks = dataframe$nbpts, labels = label_nbpts_ticks ) + scale_y_continuous( "Factorization time [h]", breaks = breaks_log(n = 8, base = 10), trans = "log10", labels = label_time )
If the ooc
switch is set to TRUE
, include a facet grid based on the
dense_ooc
column values.
if(ooc) { plot <- plot + facet_grid( . ~ dense_ooc, labeller = labeller(dense_ooc = label_ooc) ) }
Finally, we set the legend title, apply the custom theme and return the plot
object. The color breaks are automatically determined based on the values of the
cols_schur
column.
if(distributed) { plot <- plot + labs( shape = "Solver coupling", linetype = "Solver coupling", color = "# Nodes" ) + generate_theme( color_breaks = as.character(sort(unique(na.omit(dataframe$p_labels)))), shape_labels = label_solver, linetype_labels = label_solver ) } else { plot <- plot + labs( shape = "Solver coupling", linetype = "Solver coupling", color = expression(bold(group("(", list(n[italic(c)], n[italic(S)]), ")"))) ) + generate_theme( color_breaks = cbreaks, shape_labels = label_solver, linetype_labels = label_solver ) } return(plot) }
4.1.5.3. multisolve_rss_by_nbrhs_and_nbpts
This function returns a plot of real memory (RAM) usage peaks relatively to
linear system's unknown count for different counts of right-hand sides used
during the Schur complement computation when relying on the two-stage
multi-solve implementation scheme. The function can take two extra arguments,
limit_max
and tick_values
, allowing to redefine the default Y-axis maximum
and tick values respectively. Using the ooc
switch, we can include a facet
grid to distinguish between runs with and without out-of-core Schur complement
computation as well as between RAM and disk space consumption.
The column names featured in this plotting function are:
nbpts
linear system's unknown count,rm_peak
real memory usage peaks, stored in mibibytes (MiB) but converted to gibibytes (GiB),hdd_peak
hard drive usage peaks, stored in mibibytes (MiB) but converted to gibibytes (GiB),processes
gives the number of MPI processes used for the computation,p_units
gives the total number of threads used for the computation,cols_schur
gives the number of columns in a Schur complement block,solver
contains the names of solvers featured in the coupling.
Note that we create an additional column p_labels
. It contains the legend
labels for the color aesthetics combining the values of processes
and
p_units
, e.g. 2 (48 threads)
for 2 nodes and 48 threads in total.
We begin by defining the plot object directly.
multisolve_rss_by_nbpts_and_nbrhs <- function(dataframe, limit_max = 126, tick_values = c( 0, 30, 60, 90, 120), ooc = FALSE, distributed = FALSE ) { df <- dataframe df$cols_schur_labels <- paste0( "(", df$coupled_nbrhs, ", ", df$cols_schur, ")" ) cbreaks <- unique(na.omit(df$cols_schur_labels)) cbreaks <- cbreaks[order(as.integer(gsub("\\(([0-9]+), ([0-9]+)\\)", "\\1", cbreaks)),as.integer(gsub("\\(([0-9]+), ([0-9]+)\\)", "\\2", cbreaks)))] y_axis_label <- "Real memory (RAM) usage peak [GiB]" if(distributed) { df$p_labels <- paste0( df$processes, " (", df$p_units, " threads)" ) plot <- ggplot( df, aes( x = nbpts, y = rm_peak / 1024., color = p_labels, shape = solver, linetype = solver ) ) } else if(ooc) { rss_kinds <- c("rm_peak", "hdd_peak") df <- gather( df, key = "rss_kind", value = "rss_peak", all_of(rss_kinds) ) df$rss_kind <- factor(df$rss_kind, levels = rss_kinds) y_axis_label <- "Storage usage peak [GiB]" plot <- ggplot( df, aes( x = nbpts, y = rss_peak / 1024., color = as.character(cols_schur_labels), shape = solver, linetype = solver ) ) } else { plot <- ggplot( df, aes( x = nbpts, y = rm_peak / 1024., color = as.character(cols_schur_labels), shape = solver, linetype = solver ) ) } plot <- plot + geom_line() + geom_point(size = 2.5) +
The X-axis shows the count of unknowns in the linear system and the Y-axis shows the RAM usage peaks. On the Y-axis, we round the values to 0 decimal places.
scale_x_continuous( "# Unknowns (\U1D441)", breaks = df$nbpts, labels = label_nbpts_ticks ) + scale_y_continuous( y_axis_label, labels = function(label) sprintf("%.0f", label), limits = c(NA, limit_max), breaks = tick_values )
If the ooc
switch is set to TRUE
, include a facet grid based on the
rss_kind
and the dense_ooc
column values.
if(ooc) { plot <- plot + facet_grid( rss_kind ~ dense_ooc, labeller = labeller(rss_kind = label_storage, dense_ooc = label_ooc) ) }
Finally, we set the legend title, apply the custom theme and return the plot
object. The color breaks are automatically determined based on the values of the
cols_schur
column.
if(distributed) { plot <- plot + labs( shape = "Solver coupling", linetype = "Solver coupling", color = "# Nodes" ) + generate_theme( color_breaks = as.character(sort(unique(na.omit(df$p_labels)))), shape_labels = label_solver, linetype_labels = label_solver ) } else { plot <- plot + labs( shape = "Solver coupling", linetype = "Solver coupling", color = expression(bold(group("(", list(n[italic(c)], n[italic(S)]), ")"))) ) + generate_theme( color_breaks = cbreaks, shape_labels = label_solver, linetype_labels = label_solver ) } return(plot) }
4.1.5.4. multisolve_memory_aware
This function returns a plot of factorization time relatively to residual memory
(RAM) usage peaks during the Schur complement computation when relying on the
two-stage multi-solve implementation scheme. In this case, we show the results
for a fixed problem's size, i.e. my_nbpts
. narrow
optimizes the output for
multi-column document layout. multi_node
optimizes the output for multi-node
benchmark results. hreadable
makes the Y-axis values more human readable.
The column names featured in this plotting function are:
tps_facto
represents factorization time in seconds,rm_peak
real memory usage peaks, stored in mibibytes (MiB) but converted to gibibytes (GiB),solver
contains the names of solvers featured in the coupling.
We begin by shrinking the data frame on input according to my_nbpts
.
multisolve_memory_aware <- function(dataframe, my_nbpts, narrow = FALSE, font = "CMU Serif", fontsize = 16, multi_node = FALSE, hreadable = FALSE) { dataframe_largest <- NA y_title <- "Factorization time [s]" y_labeller <- scientific if(hreadable) { y_title <- "Factorization time" y_labeller <- label_time_short } if(multi_node) { dataframe_largest <- subset( dataframe, coupled_method == "multi-solve" ) } else { dataframe_largest <- subset( dataframe, coupled_method == "multi-solve" & nbpts == my_nbpts ) }
Then, we create an additional column in the data frame to hold prettified labels providing the Schur complement computation phase parameter values.
if(multi_node) { dataframe_largest$label_content <- ifelse( dataframe_largest$solver != "mumps/spido", paste0( "group(\"(\", list(", dataframe_largest$coupled_nbrhs, ", ", "", dataframe_largest$cols_schur, "), \")\")" ), paste0( "group(\"(\", list(", dataframe_largest$coupled_nbrhs, ", ", "", dataframe_largest$coupled_nbrhs, "), \")\")" ) ) } else { dataframe_largest$label_content <- ifelse( dataframe_largest$solver != "mumps/spido", paste0( "atop(n[c]==", dataframe_largest$coupled_nbrhs, ", ", "n[S]==", dataframe_largest$cols_schur, ")" ), paste0("n[c]==", dataframe_largest$coupled_nbrhs) ) }
Now, we define the plot object.
plot <- NA if(multi_node) { dataframe_largest$p_labels <- paste0( dataframe_largest$processes, " (", dataframe_largest$p_units, " threads)" ) plot <- ggplot( dataframe_largest, aes( x = rm_peak / 1024., y = tps_facto, color = p_labels, label = label_content, shape = solver, linetype = solver ) ) } else { plot <- ggplot( dataframe_largest, aes( x = rm_peak / 1024., y = tps_facto, color = coupled_method, label = label_content, shape = solver, linetype = solver ) ) } plot <- plot + geom_line(size = 0.8) + geom_point(size = 2.5) + coord_cartesian(clip = "off") +
We make use of the geom_label_repel
function from the ggrepel
R package to
obtain better label layout.
geom_label_repel( parse = TRUE, xlim = c(NA, NA), ylim = c(NA, NA), color = "black", family = font, size = 3, min.segment.length = 0, max.overlaps = Inf ) +
The X-axis shows the RAM usage peaks and the Y-axis shows the count of unknowns in the linear system. We use custom break and limit values for better readability.
scale_x_continuous( "Random access memory (RAM) usage peak [GiB]", breaks = breaks_extended(10), limits = c(NA, NA), trans = "log10" ) + scale_y_continuous( y_title, trans = "log10", labels = y_labeller, breaks = breaks_log(6) )
Finally, we set the legend title, apply the custom theme and return the plot object. Note that we do not show the legend for the color aesthetics as there is only one item, the multi-solve implementation scheme. However, we still define the color aesthetics in order to preserve the general color scheme.
if(narrow) { plot <- plot + labs( shape = "Solver coupling", linetype = "Solver coupling" ) + generate_theme( shape_labels = label_solver, shape_override_aes = list(color = colors["multi-solve"]), linetype_labels = label_solver, linetype_override_aes = list(color = colors["multi-solve"]), legend_rows = 2, legend_box = "horizontal", legend_title = element_text( family = font, size = fontsize - 2, face = "bold" ), legend_text = element_text(family = font, size = fontsize - 2), theme_text = element_text(family = font, size = fontsize) ) + guides( color = FALSE, fill = FALSE ) } else if(multi_node) { plot <- plot + facet_wrap( . ~ nbpts, labeller = labeller(nbpts = label_nbpts), scales = "free" ) + labs ( shape = "Solver coupling", linetype = "Solver coupling", color = "Parallel configuration" ) + generate_theme( color_breaks = as.character( sort(unique(na.omit(dataframe_largest$p_labels))) ), shape_labels = label_solver, linetype_labels = label_solver, legend_title = element_text( family = font, size = fontsize - 2, face = "bold" ), legend_text = element_text(family = font, size = fontsize - 2), theme_text = element_text(family = font, size = fontsize) ) + guides( fill = FALSE ) } else { plot <- plot + labs( shape = "Solver coupling", linetype = "Solver coupling" ) + generate_theme( shape_labels = label_solver, shape_override_aes = list(color = colors["multi-solve"]), linetype_labels = label_solver, linetype_override_aes = list(color = colors["multi-solve"]), legend_title = element_text( family = font, size = fontsize - 2, face = "bold" ), legend_text = element_text(family = font, size = fontsize - 2), theme_text = element_text(family = font, size = fontsize) ) + guides( color = FALSE, fill = FALSE ) } return(plot) }
4.1.5.5. multifacto_memory_aware
This function returns a plot of factorization time relatively to residual memory
(RAM) usage peaks during the Schur complement computation when relying on the
two-stage multi-factorization implementation scheme. In this case, we show the
results for a fixed problem's size, i.e. my_nbpts
. narrow
optimizes the
output for multi-column document layout. hreadable
makes the Y-axis values
more human readable.
The column names featured in this plotting function are:
tps_facto
represents factorization time in seconds,rm_peak
real memory usage peaks, stored in mibibytes (MiB) but converted to gibibytes (GiB),solver
contains the names of solvers featured in the coupling.
We begin by shrinking the data frame on input according to my_nbpts
.
multifacto_memory_aware <- function(dataframe, my_nbpts, narrow = FALSE, font = "CMU Serif", fontsize = 16, hreadable = FALSE) { y_title <- "Factorization time [s]" y_labeller <- scientific if(hreadable) { y_title <- "Factorization time" y_labeller <- label_time_short } dataframe_largest = subset( dataframe, coupled_method == "multi-facto" & nbpts == my_nbpts )
Before plotting, we also need to:
- convert the
n_blocks
column from string to numeric type and use it to create a corresponding prettified label column,
dataframe_largest$n_blocks = as.numeric(dataframe_largest$n_blocks) dataframe_largest$label_content = paste0( "n[b]==", dataframe_largest$n_blocks )
- convert the values of
rm_peak
to gibibytes (GiB),
dataframe_largest$rm_peak = as.numeric(dataframe_largest$rm_peak / 1024.)
- order the rows of the data frame based on the
n_blocks
column for the data points to be correctly connected by line segments on the final plot.
dataframe_largest <- dataframe_largest[order(dataframe_largest$n_blocks),]
Then, we define the plot object. We also need to find the minimal and the
maximal values for the axes (columns rm_peak
and tps_facto
).
rm_min = as.integer(min(dataframe_largest$rm_peak)) rm_max = as.integer(max(dataframe_largest$rm_peak)) tps_min = as.integer(min(dataframe_largest$tps_facto)) tps_max = as.integer(max(dataframe_largest$tps_facto)) plot = ggplot( dataframe_largest, aes( x = rm_peak, y = tps_facto, color = coupled_method, label = label_content, shape = solver, linetype = solver ) ) + geom_path() + geom_point(size = 2.5) + coord_cartesian(clip = "off") +
We make use of the geom_label_repel
function from the ggrepel
R package to
obtain better label layout.
geom_label_repel( parse = TRUE, xlim = c(NA, NA), ylim = c(NA, NA), color = "black", family = font, size = 3, min.segment.length = 0, max.overlaps = Inf ) +
The X-axis shows the RAM usage peaks and the Y-axis shows the count of unknowns in the linear system. We use custom break and limit values for better readability.
scale_x_continuous( "Random access memory (RAM) usage peak [GiB]", breaks = seq( rm_min - 5, rm_max + 5, by = 5 ), limits = c(NA, NA), trans = "log10" ) + scale_y_continuous( y_title, trans = "log10", labels = y_labeller, breaks = breaks_log(6) ) +
Finally, we set the legend title, apply the custom theme and return the plot object. Note that we do not show the legend for the color aesthetics as there is only one item, the multi-factorization implementation scheme. However, we still define the color aesthetics in order to preserve the general color scheme.
labs( shape = "Solver coupling", linetype = "Solver coupling" ) if(narrow) { plot <- plot + generate_theme( shape_labels = label_solver, shape_override_aes = list(color = colors["multi-facto"]), linetype_labels = label_solver, linetype_override_aes = list(color = colors["multi-facto"]), legend_rows = 2, legend_box = "horizontal", legend_title = element_text( family = font, size = fontsize - 2, face = "bold" ), legend_text = element_text(family = font, size = fontsize - 2), theme_text = element_text(family = font, size = fontsize) ) } else { plot <- plot + generate_theme( shape_labels = label_solver, shape_override_aes = list(color = colors["multi-facto"]), linetype_labels = label_solver, linetype_override_aes = list(color = colors["multi-facto"]), legend_title = element_text( family = font, size = fontsize - 2, face = "bold" ), legend_text = element_text(family = font, size = fontsize - 2), theme_text = element_text(family = font, size = fontsize) ) } plot <- plot + guides( color = FALSE, fill = FALSE ) return(plot) }
4.1.5.6. multifacto_times_by_nbpts_and_schur_size
This function returns a plot of factorization time relatively to linear system's unknown count and the number of blocks \(n_b\) the Schur submatrix is split into during the Schur complement computation phase when relying on the two-stage multi-factorization implementation scheme.
The out-of-core block size column is not the same for all the solver couplings
we consider (column disk_block_size
in case of MUMPS/SPIDO and schur_size
in
case of MUMPS/HMAT), so we need to copy the data to the same column,
size_schur
.
multifacto_times_by_nbpts_and_schur_size <- function(dataframe, ooc = FALSE, distributed = FALSE) { local <- dataframe local$size_schur[local$solver == "mumps/spido"] = local$disk_block_size[local$solver == "mumps/spido"]
Now, we can begin to define the plot object. The column names featured in this plotting function are:
nbpts
linear system's unknown count,tps_facto
represents factorization time in seconds,n_blocks
gives the count of blocks the Schur complement matrix was split to,size_schur
gives the out-of-core block size in number of matrix lines or columnssolver
contains the names of solvers featured in the coupling.
if(distributed) { local$p_labels <- paste0( local$processes, " (", local$p_units, " threads)" ) plot <- ggplot( local, aes( x = nbpts, y = tps_facto, color = p_labels, shape = solver, linetype = solver ) ) } else { plot <- ggplot( local, aes( x = nbpts, y = tps_facto, color = as.character(n_blocks), shape = solver, linetype = solver ) ) } plot <- plot + geom_line() + geom_point(size = 2.5) +
The X-axis shows the count of unknowns in the linear system and the Y-axis shows the computation time.
scale_x_continuous( "# Unknowns (\U1D441)", breaks = local$nbpts, labels = label_nbpts_ticks ) + scale_y_continuous( "Factorization time [h]", breaks = breaks_log(n = 8, base = 10), trans = "log10", labels = label_time )
If the ooc
switch is set to TRUE
, include a facet grid based on the
dense_ooc
column values.
if(ooc) { plot <- plot + facet_grid( . ~ dense_ooc, labeller = labeller(dense_ooc = label_ooc) ) }
Finally, we set the legend title, apply the custom theme and return the plot
object. The color breaks are automatically determined based on the values of the
nbpts
column.
if(distributed) { plot <- plot + labs( shape = "Solver coupling", linetype = "Solver coupling", color = "# Nodes" ) + generate_theme( color_breaks = as.character(sort(unique(na.omit(local$p_labels)))), shape_labels = label_solver, linetype_labels = label_solver ) } else { plot <- plot + labs( shape = "Solver coupling", linetype = "Solver coupling", color = expression(n[italic(b)]) ) + generate_theme( color_breaks = as.character(sort(unique(na.omit(local$n_blocks)))), shape_labels = label_solver, linetype_labels = label_solver ) } return(plot) }
4.1.5.7. multifacto_rss_by_nbpts_and_schur_size
This function returns a plot of real memory (RAM) usage peaks relatively to
linear system's unknown count and the number of blocks \(n_b\) the Schur submatrix
is split into during the Schur complement computation phase when relying on the
two-stage multi-factorization implementation scheme. The function can take two
extra arguments, limit_max
and tick_values
, allowing to redefine the default
Y-axis maximum and tick values respectively.
The out-of-core block size column is not the same for all the solver couplings
we consider (column disk_block_size
in case of MUMPS/SPIDO and schur_size
in
case of MUMPS/HMAT), so we need to copy the data to the same column,
size_schur
.
multifacto_rss_by_nbpts_and_schur_size <- function(dataframe, limit_max = 126, tick_values = c( 0, 30, 60, 90, 120), ooc = FALSE, distributed = FALSE) { local <- dataframe local$size_schur[local$solver == "mumps/spido"] = local$disk_block_size[local$solver == "mumps/spido"]
Now, we can begin to define the plot object. The column names featured in this plotting function are:
nbpts
linear system's unknown count,rm_peak
real memory usage peaks, stored in mibibytes (MiB) but converted to gibibytes (GiB),n_blocks
gives the count of blocks the Schur complement matrix was split to,size_schur
gives the out-of-core block size in number of matrix lines or columnssolver
contains the names of solvers featured in the coupling.
if(distributed) { local$p_labels <- paste0( local$processes, " (", local$p_units, " threads)" ) plot <- ggplot( local, aes( x = nbpts, y = rm_peak / 1024., color = as.character(p_labels), shape = solver, linetype = solver ) ) } else if(ooc) { local <- gather( local, key = "rss_kind", value = "rss_peak", c("rm_peak", "hdd_peak") ) plot <- ggplot( local, aes( x = nbpts, y = rss_peak / 1024., color = as.character(n_blocks), shape = solver, linetype = solver ) ) } else { plot <- ggplot( local, aes( x = nbpts, y = rm_peak / 1024., color = as.character(n_blocks), shape = solver, linetype = solver ) ) } plot <- plot + geom_line() + geom_point(size = 2.5) +
The X-axis shows the count of unknowns in the linear system and the Y-axis shows the RAM usage peaks. On the Y-axis, we round the values to 0 decimal places.
scale_x_continuous( "# Unknowns (\U1D441)", breaks = local$nbpts, labels = label_nbpts_ticks ) + scale_y_continuous( "Real memory (RAM) usage peak [GiB]", labels = function(label) sprintf("%.0f", label), limits = c(NA, limit_max), breaks = tick_values )
If the ooc
switch is set to TRUE
, include a facet grid based on the
rss_kind
and the dense_ooc
column values.
if(ooc) { plot <- plot + facet_grid( rss_kind ~ dense_ooc, labeller = labeller(rss_kind = label_storage, dense_ooc = label_ooc) ) }
Finally, we set the legend title, apply the custom theme and return the plot
object. The color breaks are automatically determined based on the values of the
nbpts
column.
if(distributed) { plot <- plot + labs( shape = "Solver coupling", linetype = "Solver coupling", color = "# Nodes" ) + generate_theme( color_breaks = as.character(sort(unique(na.omit(local$p_labels)))), shape_labels = label_solver, linetype_labels = label_solver ) } else { plot <- plot + labs( shape = "Solver coupling", linetype = "Solver coupling", color = expression(n[italic(b)]) ) + generate_theme( color_breaks = as.character(sort(unique(na.omit(local$n_blocks)))), shape_labels = label_solver, linetype_labels = label_solver ) } return(plot) }
4.1.5.8. compare_coupled
This function returns a plot of the best factorization times relatively to linear system's unknown count for each implementation scheme for solving coupled systems.
For example, if we want to plot only the results regarding two-stage schemes, we
can use the legend_rows
argument to adjust the number of rows in the legend.
Then, the shorten
argument allows to remove legend titles and use shorter
legend key labels in order to reduce the final canvas size. check_epsilon
can
be used to show relative error instead of factorization times on the Y-axis.
hreadable
makes the Y-axis values more human readable.
We begin by restraining the input data frame to the best results of two-stage and single-stage schemes.
Note that as the latter imply the use of only one solver and not a coupling of solvers, we need to specify the coupling method name manually to enable data frame treatment.
compare_coupled <- function(dataframe, legend_rows = 3, shorten = FALSE, check_epsilon = FALSE, font = "CMU Serif", fontsize = 16, hreadable = FALSE) { y_title <- "Factorization time [s]" y_labeller <- scientific if(hreadable) { y_title <- "Factorization time" y_labeller <- label_time_short } dataframe_best <- dataframe dataframe_best$coupled_method <- as.character(dataframe_best$coupled_method) dataframe_best$coupled_method[dataframe_best$solver == "hmat/hmat"] <- "full-hmat" dataframe_best <- merge( aggregate( tps_facto ~ coupled_method + nbpts + solver, min, data = dataframe_best ), dataframe_best )
Now, we can begin to define the plot object. The column names featured in this plotting function are:
nbpts
linear system's unknown count,tps_facto
represents factorization time in seconds,coupled_method
gives the name of the approach used to solve coupled FEM/BEM systems (e. g. multi-solve, etc.),solver
contains the names of solvers featured in the coupling.
plot <- NA if(check_epsilon) { plot <- ggplot( dataframe_best, aes( x = nbpts, y = error, color = coupled_method, shape = solver, linetype = solver ) ) } else { plot <- ggplot( dataframe_best, aes( x = nbpts, y = tps_facto, color = coupled_method, shape = solver, linetype = solver ) ) } plot <- plot + geom_line() + geom_point(size = 2.5)
The X-axis shows the count of unknowns in the linear system and the Y-axis shows the computation time.
if(check_epsilon) { vjust <- -0.6 size <- 5 if(shorten) { vjust <- -0.3 size <- 4 } plot <- plot + geom_hline(yintercept = 1.0e-3) + geom_text( aes( 1.0e+6, 1.0e-3, label = as.character(expression(epsilon == 10^{-3})), vjust = vjust, hjust = 0 ), parse = TRUE, color = "black", family = font, size = size ) } plot <- plot + scale_x_continuous( "# Unknowns (\U1D441)", breaks = dataframe_best$nbpts, labels = scientific ) if(check_epsilon) { plot <- plot + scale_y_continuous( expression(paste("Relative error (", E[rel], ")")), limits = c(min(dataframe_best$error), 2.0e-3), labels = scientific, breaks = breaks_log(10), trans = "log10" ) } else { plot <- plot + scale_y_continuous( y_title, labels = y_labeller, trans = "log10", limits = c(NA, NA) ) }
Finally, we set the legend title, apply the default theme and return the plot
object. The color, shape and line type breaks are automatically determined based
on the values of the coupled_method
and solver
columns.
labs( shape = "Solvers", linetype = "Solvers", color = "Implementation\nscheme" ) if(shorten) { plot <- plot + generate_theme( color_breaks = unique(na.omit(dataframe_best$coupled_method)), color_labels = label_coupling_short, color_override_aes = list(linetype = 0, shape = 15, size = 6), shape_breaks = unique(na.omit(dataframe_best$solver)), shape_labels = label_solver, linetype_breaks = unique(na.omit(dataframe_best$solver)), linetype_labels = label_solver, legend_rows = legend_rows, legend_box = "horizontal", legend_title = element_blank(), legend_text = element_text(family = font, size = fontsize - 2), theme_text = element_text(family = font, size = fontsize) ) } else { plot <- plot + generate_theme( color_breaks = unique(na.omit(dataframe_best$coupled_method)), color_labels = label_coupling, color_override_aes = list(linetype = 0, shape = 15, size = 8), shape_breaks = unique(na.omit(dataframe_best$solver)), shape_labels = label_solver, linetype_breaks = unique(na.omit(dataframe_best$solver)), linetype_labels = label_solver, legend_rows = legend_rows, legend_box = "horizontal", legend_text = element_text(family = font, size = fontsize - 2), theme_text = element_text(family = font, size = fontsize) ) } return(plot) }
4.1.5.9. compare_rss_coupled
This function returns a plot of the real memory (RAM) usage peaks for the best factorization times relatively to linear system's unknown count for each implementation scheme for solving coupled systems.
We begin by restraining the input data frame to the best results of two-stage schemes and the results of the single-stage scheme implemented using HMAT.
Note that as the latter only implies the use of one single solver, we need to specify the coupling method name manually to enable the data frame treatment.
compare_rss_coupled <- function(dataframe) { dataframe_best <- dataframe dataframe_best$coupled_method <- as.character(dataframe_best$coupled_method) dataframe_best$coupled_method[dataframe_best$solver == "hmat/hmat"] <- "full-hmat" dataframe_best <- merge( aggregate( tps_facto ~ coupled_method + nbpts + solver, min, data = dataframe_best ), dataframe_best )
Now, we can begin to define the plot object. The column names featured in this plotting function are:
nbpts
linear system's unknown count,tps_facto
represents factorization time in seconds,rm_peak
real memory usage peaks, stored in mibibytes (MiB) but converted to gibibytes (GiB),coupled_method
gives the name of the approach used to solve coupled FEM/BEM systems (e. g. multi-solve, etc.),solver
contains the names of solvers featured in the coupling.
plot <- ggplot( dataframe_best, aes( x = nbpts, y = rm_peak / 1024., color = coupled_method, shape = solver, linetype = solver ) ) + geom_line() + geom_point(size = 2.5) +
The X-axis shows the count of unknowns in the linear system and the Y-axis shows the RAM usage peaks.
scale_x_continuous( "# Unknowns (\U1D441)", breaks = dataframe_best$nbpts, labels = scientific ) + scale_y_continuous( "Real memory (RAM) usage peak [GiB]", labels = function(label) sprintf("%.0f", label), limits = c(NA, 126), breaks = c(0, 30, 60, 90, 120) ) +
Finally, we set the legend title, apply the default theme and return the plot
object. The color, shape and line type breaks are automatically determined based
on the values of the coupled_method
and solver
columns.
labs( shape = "Solvers", linetype = "Solvers", color = "Implementation\nscheme" ) + generate_theme( color_breaks = length(unique(na.omit(dataframe_best$coupled_method))), color_labels = label_coupling, shape_breaks = length(unique(na.omit(dataframe_best$solver))), shape_labels = label_solver, linetype_breaks = length(unique(na.omit(dataframe_best$solver))), linetype_labels = label_solver, legend_rows = 3, legend_box = "horizontal" ) return(plot) }
4.1.5.10. accuracy_by_nbpts
This function returns a plot of relative solution error with respect to linear system's unknown count.
The solver_config
option allows one to distinguish between different solver
configurations on the figure instead of different precision parameter
\(\epsilon\) values.
The column names featured in this plotting function are:
nbpts
linear system's unknown count,solver_config
gives the solver configuration used,error
relative forward error of the solution approximation.
accuracy_by_nbpts <- function(dataframe, solver_config = FALSE) { if(solver_config) { plot <- ggplot( dataframe, aes( x = nbpts, y = error, color = as.character(solver_config), shape = solver, linetype = solver ) ) } else { plot <- ggplot( dataframe, aes( x = nbpts, y = error, color = as.character(desired_accuracy), shape = solver, linetype = solver ) ) } plot <- plot + geom_line() + geom_point(size = 2.5) +
We draw a horizontal line representing the maximal admitted value of \(E_{rel}\) which is 10-3.
geom_hline(yintercept = 1.0e-3) + geom_text( aes( min(dataframe$nbpts), 1.0e-3, label = as.character(expression(epsilon == 10^{-3})), vjust = -0.6, hjust = 0 ), parse = TRUE, family = "CMU Serif", color = "black", size = 5 ) +
For the Y-axis, we define the tick values manually for better comparison to the associated relative error values.
scale_x_continuous( "# Unknowns (\U1D441)", trans = "log10", breaks = dataframe$nbpts, labels = label_nbpts_ticks ) + scale_y_continuous( expression(paste("Relative error (", E[rel], ")")), limits = c(min(dataframe$error), 1e-1), breaks = c(1e-16, 1e-13, 1e-10, 1e-6, 1e-3), labels = scientific, trans = "log10" )
Finally, we set the legend labels, apply the common theme parameters and return the plot object.
if(solver_config) { plot <- plot + labs(color = "Configuration", shape = "Solver", linetype = "Solver") + generate_theme( color_labels = label_solver_config, color_override_aes = list(linetype = 0, shape = 15, size = 8), shape_labels = label_solver, linetype_labels = label_solver, legend_rows = length(unique(na.omit(dataframe$solver_config))) / 2 ) } else { plot <- plot + labs(color = "\U1D700", shape = "Solver", linetype = "Solver") + generate_theme( color_labels = label_epsilon, color_override_aes = list(linetype = 0, shape = 15, size = 8), shape_labels = label_solver, linetype_labels = label_solver ) } return(plot) }
4.1.5.11. rss_peaks_by_nbpts
This function returns a plot of real memory (RAM) and disk space usage peaks
relatively to linear system's unknown count. The function can take five extra
arguments, limit_max
, tick_values
, solver_config
, trans_x
and trans_y
.
The first two of them can be used to redefine the default Y-axis maximum and
tick values respectively. The solver_config
option allows one to distinguish
between different solver configurations on the figure. By default, the
trans_x
argument sets the X-axis scale to logarithmic. To use standard scale,
the value identity
should be used.
The column names featured in this plotting function are:
nbpts
linear system's unknown count,solver_config
gives the solver configuration used,rm_peak
real memory usage peaks, stored in mibibytes (MiB) but converted to gibibytes (GiB),hdd_peak
disk space usage peaks, stored in MiB but converted to GiB.
At first, we convert the data frame from wide to long format.
rss_peaks_by_nbpts <- function(dataframe, limit_max = 126, tick_values = c(0, 30, 60, 90, 120), solver_config = FALSE, trans_x = "log10") { dataframe_long <- gather( dataframe, key = "memory_type", value = "memory_usage", c("rm_peak", "hdd_peak") )
Also, we fix the order of the values in memory_type
so that real memory (RAM)
usage peaks come before disk usage peaks in the facet grid.
dataframe_long$memory_type <- factor( dataframe_long$memory_type, c('rm_peak', 'hdd_peak') )
Then, we configure the plot object itself as described above and handle
different levels of expected relative solution error in case of solvers using
data compression by adding the shape aesthetics. If solver_config
is TRUE
,
we want to distinguish between different solver configurations instead.
if(solver_config) { plot <- ggplot( dataframe_long, aes( x = nbpts, y = memory_usage / 1024., color = as.character(solver_config), shape = solver, linetype = solver ) ) } else { plot <- ggplot( dataframe_long, aes( x = nbpts, y = memory_usage / 1024., color = as.character(desired_accuracy), shape = solver, linetype = solver ) ) } plot <- plot + geom_line() + geom_point(size = 2.5) +
The X-axis shows the count of unknowns in the linear system and the Y-axis shows the storage usage peaks. On the Y-axis, we round the values to 0 decimal places.
scale_x_continuous( "# Unknowns (\U1D441)", trans = trans_x, breaks = dataframe_long$nbpts, labels = label_nbpts_ticks ) + scale_y_continuous( "Storage usage peak [GiB]", labels = function(label) sprintf("%.0f", label), limits = c(NA, limit_max), breaks = tick_values ) +
Storage types are distinguished using a facet grid.
facet_grid( . ~ memory_type, labeller = labeller(memory_type = label_storage) )
We end by setting legend titles, applying the common theme parameters and returning the plot object.
if(solver_config) { plot <- plot + labs(color = "Configuration", shape = "Solver", linetype = "Solver") + generate_theme( color_labels = label_solver_config, color_override_aes = list(linetype = 0, shape = 15, size = 8), shape_labels = label_solver, linetype_labels = label_solver, legend_rows = length(unique(na.omit(dataframe_long$solver_config))) / 2 ) } else { plot <- plot + labs(color = "\U1D700", shape = "Solver", linetype = "Solver") + generate_theme( color_labels = label_epsilon, color_override_aes = list(linetype = 0, shape = 15, size = 8), shape_labels = label_solver, linetype_labels = label_solver ) } return(plot) }
4.1.5.12. rss_by_time
This function returns a plot of real memory (RAM) usage relatively to the
execution time. The dataframe
argument represents the data frame containing
RAM consumption data read using the read_rss
function (see Section
4.1.2.3.1). Moreover, using limit_max
and tick_values
one
can redefine the default Y-axis maximum and tick values respectively to improve
readibility of the plot. The timeline
argument is useful when dataframe
contains coupled FEM/BEM benchmark results. timeline
represents an additional
data frame that can be used to highlight selected computation phases and
estimated reference RAM usage peaks of a sparse FEM factorization and Schur
matrix size in the plot (see Section 4.1.2.3.4).
The column names featured in this plot function are:
time
timestamp in seconds giving the time elapsed since the beginning of the execution,rss
global real memory usage in mibibytes (MiB) but converted to gibibytes (GiB).
We begin directly by defining the plot object.
rss_by_time <- function(dataframe, limit_max = 126, tick_values = c(0, 30, 60, 90, 120), estimated_peak_factorization = NA, estimated_schur = NA, timeline = NA) { is_coupled <- FALSE fill_legend <- "Solver" color_labeller <- label_solver if("coupled_method" %in% colnames(dataframe)) { is_coupled <- TRUE fill_legend <- "Implementation scheme" color_labeller <- label_coupling } if(is_coupled) { plot <- ggplot( dataframe, aes( x = timestamp / 1000, y = consumption, fill = coupled_method ) ) } else { plot <- ggplot( dataframe, aes( x = timestamp / 1000, y = consumption, fill = solver ) ) }
We draw the RAM usage as an area chart.
plot <- plot + geom_area()
In the case the extra timeline
argument is present, we draw additional
labelled vertical lines corresponding to selected computation phases using the
geom_vline
and geom_text
layers. Note that for the geom_text
layer we also
need to adjust the text position and rotation. To enable the processing of the
labels as math expressions, we use the parse
parameter.
if(!is.na(timeline)) { plot <- plot + geom_vline( data = timeline, aes(xintercept = time), linetype = "dotted", size = 0.4 ) + geom_text( data = timeline, aes( x = middle, y = 0.75 * limit_max, label = label ), parse = T, angle = 90, vjust = "center" )
Then, we draw also the horizontal segments corresponding to estimated RAM usage peaks of a sparse FEM factorization and estimated Schur matrix size.
However, at first, we have to convert the timeline
data frame to the long
format (see Section 4.1.2). This allows us to use different sizes
to distinguish different segments based on the kind of value they represent.
columns <- c("assembly_estimation", "peak_symmetric_factorization", "peak_non_symmetric_factorization", "schur_estimation") timeline_long <- gather( timeline, key = "peak_kind", value = "peak_value", columns ) plot <- plot + geom_spoke( data = timeline_long, aes( x = middle - 0.5 * (time - middle), y = peak_value, linetype = peak_kind, radius = time - middle ), angle = 2 * pi ) }
Next, we set up the axis and their scaling. We round the Y-axis values to two decimal places.
plot = plot + scale_x_continuous("Execution time", labels = label_time) + scale_y_continuous( "Random access memory (RAM) usage [GiB]", labels = function(label) sprintf("%.0f", label), limits = c(NA, limit_max), breaks = tick_values )
We end by configuring the visual aspect of the plot.
if(is.data.frame(timeline)) { plot <- plot + labs( fill = fill_legend, linetype = "Estimated peak usage" ) + generate_theme( color_labels = color_labeller, linetype_labels = label_peak_kind, legend_rows_linetype = length(columns) ) } else { plot <- plot + labs(fill = fill_legend) + generate_theme(color_labels = color_labeller) } return(plot) }
4.1.5.13. eprofile
This function returns a plot of power consumtion relatively to execution time.
The first and only argument es_data
of the function corresponds to the data
frame containing power consumption data as well as tag information, if any (see
Section 4.1.2.3.3).
eprofile <- function(es_data, zoom_on = c(NA, NA), TDP = NA, rss = TRUE, flops = TRUE, bw = TRUE) { zoomed <- FALSE solvers <- unique(na.omit(es_data$solver)) es <- es_data if(!is.na(zoom_on[1]) && !is.na(zoom_on[2])) { zoomed <- TRUE es <- subset( es, timestamp >= zoom_on[1] * 1000 & timestamp <= zoom_on[2] * 1000 ) } edata <- subset(es, kind_general == "e") power <- ggplot( data = subset(edata, is.na(type)) ) sdata <- NA storage <- NA if(rss) { sdata <- subset(es, kind_general == "s") sdata$kind <- factor(sdata$kind, levels = c("sram", "shdd")) storage <- ggplot( data = subset(sdata, is.na(type)) ) } fdata <- NA if(flops) { fdata <- subset(es, kind_general == "flops") fdata$kind <- factor(fdata$kind, levels = c("flops")) } bwdata <- NA if(bw) { bwdata <- subset(es, kind_general == "bw") bwdata$kind <- factor(bwdata$kind, levels = c("bw")) }
At first, we draw the power consumption curve. The tags are represented by
colored points surrounded with black stroke and annotated with labels. Note that
the latter are stored as math expressions and need to be parsed by the
geom_label_repel
function.
if(flops) { power <- power + geom_line( data = fdata, aes(x = timestamp, y = consumption / 1000, color = kind) ) } if(bw) { power <- power + geom_line( data = bwdata, aes(x = timestamp, y = consumption / 1024, color = kind) ) } power <- power + geom_line(aes(x = timestamp, y = consumption, color = kind)) + geom_point( data = subset(edata, !is.na(type) & kind == "ecpu"), aes(x = timestamp, y = consumption, shape = tag), color = "black", fill = "white", size = 3 ) + geom_label_repel( data = subset(edata, !is.na(type) & kind == "ecpu"), aes(x = timestamp, y = consumption, label = type_label), force_pull = 0, segment.color = "black", min.segment.length = 0, max.overlaps = 20, family = "CMU Serif", parse = TRUE, show.legend = FALSE ) if(rss) { storage <- storage + geom_line(aes(x = timestamp, y = consumption, color = kind)) + geom_point( data = subset(sdata, !is.na(type)), aes(x = timestamp, y = consumption, shape = tag), color = "black", fill = "white", size = 3 ) + geom_label_repel( data = subset(sdata, !is.na(type)), aes(x = timestamp, y = consumption, label = type_label), force_pull = 0, segment.color = "black", min.segment.length = 0, max.overlaps = 20, family = "CMU Serif", parse = TRUE, show.legend = FALSE ) }
If the TDP
parameter is provided, we add a horizontal line matching its
value.
if(!is.na(TDP)) { power <- power + geom_hline(yintercept = TDP) + geom_text( aes( 0, TDP, label = paste("TDP:", as.character(TDP), "W"), vjust = 2, hjust = 0 ), color = "black", family = "CMU Serif", size = 5 ) }
Next, we set up the axis and their scaling. We round the Y-axis values to two decimal places.
power <- power + scale_x_continuous( "Execution time [s]", labels = function(label) sprintf("%d", as.integer(label / 1000)), limits = c(NA, NA), breaks = breaks_extended(n = 10) ) if(flops) { coeff <- 0.2 if(zoomed) { coeff <- 1.1 } power <- power + scale_y_continuous( "Power consumption [W]", labels = function(label) sprintf("%.0f", label), breaks = breaks_width( as.integer(0.25 * max(edata$consumption)) ), sec.axis = sec_axis( ~ . * 1, name = "Flop rate [Gflop/s]", labels = function(label) sprintf("%.0f", label), breaks = breaks_width( as.integer(coeff * (max(fdata$consumption) / 1000)) ) ) ) } else { power <- power + scale_y_continuous( "Power consumption [W]", labels = function(label) sprintf("%.0f", label), breaks = breaks_width( as.integer(0.25 * max(edata$consumption)) ) ) } if(rss) { y_axis_title <- ifelse( length(unique(na.omit(sdata$kind))) < 2, "RAM usage [GiB]", "Storage usage [GiB]" ) storage <- storage + scale_x_continuous( "Execution time [s]", labels = function(label) sprintf("%d", as.integer(label / 1000)), limits = c(NA, NA), breaks = breaks_extended(n = 10) ) + scale_y_continuous( y_axis_title, labels = function(label) sprintf("%.0f", label), breaks = breaks_width( as.integer(0.2 * max(sdata$consumption)) ) ) }
We end by configuring the visual aspect of the plot. We use a facet grid to draw a separate plot for each solver coupling.
if("coupled_method" %in% colnames(es_data)) { power <- power + facet_grid( . ~ coupled_method + solver, labeller = labeller( coupled_method = label_coupling, solver = label_solver ), scales = "free_x", switch = "y" ) } else { power <- power + facet_grid( . ~ solver, labeller = labeller( solver = label_solver ), scales = "free_x", switch = "y" ) } if(rss) { storage <- storage + facet_grid( . ~ solver, labeller = labeller( solver = label_solver ), scales = "free_x", switch = "y" ) + labs( shape = "Computation phase" ) } power <- power + generate_theme( color_labels = label_kind, color_override_aes = list(size = 8), shape_labels = label_tag, legend_rows_shape = 1, legend_box = "horizontal" ) if(rss) { power <- power + labs( color = "Device" ) + guides( fill = "none", shape = "none", linetype = "none" ) + theme( axis.title.x = element_blank() ) storage <- storage + generate_theme( shape_labels = label_tag, legend_rows_shape = 1, legend_box = "horizontal" ) + guides(color = "none", fill = "none") + theme( strip.text.x = element_blank(), strip.background.x = element_blank() ) return( plot_grid( power, storage, nrow = 2, ncol = 1, align = "v", axis = "b", rel_heights = c(1.1, 1) ) ) } power <- power + labs( color = "Device", shape = "Computation phase" ) + guides( fill = "none" ) return(power) }
4.1.5.14. es_comparison
es_comparison <- function(dataframe) { extended <- dataframe extended$etotal <- as.numeric(0) for(i in 1:nrow(extended)) { gcvb_retrieve_files(extended[i, ]) eprofile <- fromJSON(file = "./eprofile.txt") extended[i, "etotal"] <- as.numeric( eprofile$data$data$tags$es_total$`joule(J)` ) } extended$solver <- ifelse( extended$solver == "mumps" & extended$error < 10e-12, "mumps-blr", extended$solver ) postamble <- generate_theme( color_labels = label_solver, legend_rows = 2, legend_title = element_blank() ) energy <- ggplot( data = extended, aes( x = nbpts, y = etotal, fill = solver ) ) + geom_bar(position = "dodge", stat = "identity") + scale_x_continuous( "# Unknowns (\U1D441)", breaks = extended$nbpts, labels = scientific ) + scale_y_continuous( "Energy consumption [J]", labels = scientific ) + postamble + guides(fill = "none") + theme(axis.title.x = element_blank()) time <- ggplot( data = extended, aes( x = nbpts, y = tps_facto + tps_solve, fill = solver ) ) + geom_bar(position = "dodge", stat = "identity") + scale_x_continuous( "# Unknowns (\U1D441)", breaks = extended$nbpts, labels = scientific ) + scale_y_continuous( "Computation time [s]" ) + postamble rss <- ggplot( data = extended, aes( x = nbpts, y = rm_peak / 1024., fill = solver ) ) + geom_bar(position = "dodge", stat = "identity") + scale_x_continuous( "# Unknowns (\U1D441)", breaks = extended$nbpts, labels = scientific ) + scale_y_continuous( "RAM usage peak [GiB]" ) + postamble + guides(fill = "none") + theme(axis.title.x = element_blank()) return( plot_grid( energy, time, rss, nrow = 1, ncol = 3, align = "h", axis = "b", rel_widths = c(1, 1.15, 1) ) ) }
4.1.5.15. scalability
This function returns a plot of factorization and solve computation times relatively to available processor core count.
The column names featured in this plotting function are:
p_units
giving the count of cores,tps_facto
represents factorization time in seconds,tps_solve
represents solve time in seconds,mapping
provides the mapping configuration of MPI processes.
Then, data frame is converted to long format (see Section 4.1.2)
so we are able to combine times of both phases using a facet grid. Also, we
force the order of facet labels for the solver
column by refactoring the
latter.
scalability <- function(dataframe) { dataframe_long <- gather( subset(dataframe, !is.na(dataframe$nbpts)), key = "computation_phase", value = "time", c("tps_facto", "tps_solve") ) dataframe_long$solver <- factor( dataframe_long$solver, levels = unique(na.omit(dataframe_long$solver)) ) plot <- ggplot( dataframe_long, aes( x = p_units, y = time, group = mapping, color = mapping ) ) + geom_line() + geom_point(size = 2.5, shape = 16) +
As there may be a considerable scale difference between factorization and solve time values, we use log10 scale on the Y-axis to be able to combine both metrics on the same axis without loosing on readability.
scale_x_continuous( "# Cores (\U1D450)", breaks = (dataframe_long$p_units) ) + scale_y_continuous( "Computation time [s]", labels = scientific, trans = "log10" ) +
Finally, using facet grid we distinguish between factorization and solve times on horizontal and between each solver considered on vertical.
For the sake of readability, we specify that scales on Y-axis can be different. It is useful when multiple solvers are considered having each considerably different time values.
Also, before returning the plot we apply the common theme parameters. For scalability and for parallel efficiency plots, we split the legend items to two lines as the labels are too long.
facet_grid( computation_phase ~ solver + nbpts, labeller = labeller( solver = label_solver, computation_phase = phase.labs, nbpts = label_nbpts ), scales = "free" ) + labs(color = "Parallel\nconfiguration") + generate_theme( color_breaks = c("node", "socket", "numa", "core"), color_labels = label_mapping, legend_rows = 4 ) return(plot) }
4.1.5.16. scalability_ram
This function is a variation of scalability
(see Section 4.1.5.15) and
returns a plot of real memory (RAM) usage peaks relatively to available
processor core count.
The column names featured in this plotting function are:
p_units
giving the count of cores,rm_peak
represents real memory usage peaks, stored in mibibytes (MiB) but converted to gibibytes (GiB),mapping
provides the mapping configuration of MPI processes.
scalability_ram <- function(dataframe) { plot <- ggplot( dataframe, aes( x = p_units, y = rm_peak / 1024., group = mapping, color = mapping ) ) + geom_line() + geom_point(size = 2.5, shape = 16) +
Note that we round the values on the Y-axis to two decimal places and define custom axis limits and breaks adapted for the current type of plot.
scale_x_continuous( "# Cores (\U1D450)", breaks = (dataframe$p_units) ) + scale_y_continuous( "Real memory (RAM) usage peaks [GiB]", labels = function(label) sprintf("%.0f", label), limits = c(NA, 126), breaks = c(0, 30, 60, 90, 120) ) +
In this case, we use using facet grid exclusively as a mean to better annotate the resulting plot.
facet_grid( . ~ solver + nbpts, labeller = labeller( solver = label_solver, nbpts = label_nbpts ) ) + labs(color = "Parallel\nconfiguration") + generate_theme( color_breaks = c("node", "socket", "numa", "core"), color_labels = label_mapping, legend_rows = 4 ) return(plot) }
4.1.5.17. efficiency
This function returns a plot of parallel efficiency, i.e. the percentage of computational resource usage relatively to available processor core count.
The column names featured in this plotting function are:
p_units
giving the count of cores,efficiency_facto
represents factorization phase efficiency,efficiency_solve
represents solve phase efficiency,mapping
provides the mapping configuration of MPI processes.
The function itself follows the exact same logic as the scalability
function
(see Section 4.1.5.15).
efficiency <- function(dataframe) { dataframe_long <- gather( subset(dataframe, !is.na(dataframe$nbpts)), key = "computation_phase", value = "efficiency", c("efficiency_facto", "efficiency_solve") ) dataframe_long$solver <- factor( dataframe_long$solver, levels = unique(na.omit(dataframe_long$solver)) ) plot <- ggplot( dataframe_long, aes( x = p_units, y = efficiency, group = mapping, color = mapping ) ) + geom_line() + geom_point(size = 2.5, shape = 16) + geom_hline(yintercept = 1) + scale_x_continuous( "# Cores (\U1D450)", breaks = (dataframe_long$p_units) ) +
There is only one additional step, i. e. to convert efficiency values from \(x\)
such as \(0 \leq x \leq 1\) to percentages using the label_percentage
label
function (see Section 4.1.4.1). Also, we fix our own Y-axis tick
values (breaks).
scale_y_continuous( "Parallel efficiency [%]", breaks = c(0.0, 0.2, 0.4, 0.6, 0.8, 1.0), labels = label_percentage ) + facet_grid( computation_phase ~ solver + nbpts, labeller = labeller( solver = label_solver, computation_phase = phase.labs, nbpts = label_nbpts ) ) + labs(color = "Parallel\nconfiguration") + generate_theme( color_breaks = c("node", "socket", "numa", "core"), color_labels = label_mapping, legend_rows = 4 ) return(plot) }
See the complete R script file plot.R
18.
4.2. StarPU execution traces
To visualize StarPU execution traces in FXT format, we also make use of R, the ggplot2 package and of the dedicated StarVZ library starvz16,starvz18.
To configure the output plot produced by StarVZ, we use a Yaml configuration
file. We leave cosmetic options take their default values, except the base font
size (see base_size
value), and we override those specifying what kind of
information shall be included in resulting plots. In our case, we want to plot
only the StarPU worker statistics (items st
and starpu
).
default: base_size: 14 starpu: active: TRUE legend: TRUE ready: active: FALSE submitted: active: FALSE st: active: TRUE
Showing critical path time bound is not necessary.
cpb: FALSE
On the other hand, we want to display all the labels, the legend and the application time span.
labels: "ALL" legend: TRUE makespan: TRUE
To reduce the size of the resulting plot, we enable the aggregation of visualized tasks in a time step.
aggregation: active: TRUE method: "static" step: 500
See the complete configuration file starvz.yaml
19.
The StarVZ configuration begins by determining the path to the latest gcvb
benchmark session directory.
output = try( system("ls -t ~/benchmarks/results | head -n 1", intern = TRUE) ) latest_session_path = paste( "~/benchmarks/results", output, sep = "/" )
Then, we import the required R libraries and set the starvz_config
variable
holding the path to the Yaml configuration file of the StarVZ package.
library(ggplot2) library(starvz) library(grid) starvz_config = "starvz.yaml"
4.2.1. Generating plots
Eventually, we generate the plots. To do this, we reuse the above code thanks to
the :noweb
reference <<starvz-header>>
:
execution traces of the HMAT solver on a BEM system,
<<starvz-header>> hmat_bem_node = panel_st_runtime( starvz_read( paste( latest_session_path, "fxt-scalability-hmat-bem-node-1x4-25000", sep = "/" ), starvz_config ) ) hmat_bem_socket = panel_st_runtime( starvz_read( paste( latest_session_path, "fxt-scalability-hmat-bem-socket-2x2-25000", sep = "/" ), starvz_config ) ) hmat_bem_core = panel_st_runtime( starvz_read( paste( latest_session_path, "fxt-scalability-hmat-bem-core-4x1-25000", sep = "/" ), starvz_config ) ) hmat_bem_node = hmat_bem_node + theme_bw() + theme(text = element_text(family = "Arial", size = 16)) hmat_bem_node$scales$scales[[1]]$range = c(NA, 44000) hmat_bem_node$scales$scales[[1]]$limits = c(NA, 44000) hmat_bem_node$theme$legend.position = "none" hmat_bem_node$theme$axis.title.x = element_blank() hmat_bem_node$theme$axis.text.x = element_blank() hmat_bem_node$theme$axis.ticks.x = element_blank() hmat_bem_socket = hmat_bem_socket + theme_bw() + theme(text = element_text(family = "Arial", size = 16)) hmat_bem_socket$scales$scales[[1]]$range = c(NA, 44000) hmat_bem_socket$scales$scales[[1]]$limits = c(NA, 44000) hmat_bem_socket$theme$legend.position = "none" hmat_bem_socket$theme$axis.title.x = element_blank() hmat_bem_socket$theme$axis.text.x = element_blank() hmat_bem_socket$theme$axis.ticks.x = element_blank() hmat_bem_core = hmat_bem_core + theme_bw() + theme( text = element_text(family = "Arial", size = 16), legend.text = element_text(family = "Arial", size = 14) ) hmat_bem_core$scales$scales[[1]]$range = c(NA, 44000) hmat_bem_core$scales$scales[[1]]$limits = c(NA, 44000) hmat_bem_core$theme$legend.title = element_blank() hmat_bem_core$theme$legend.position = "bottom" grid.newpage() grid.draw( rbind( ggplotGrob(hmat_bem_node), ggplotGrob(hmat_bem_socket), ggplotGrob(hmat_bem_core), size = "max" ) )
execution traces of the HMAT solver on a FEM system.
<<starvz-header>> hmat_fem_node = panel_st_runtime( starvz_read( paste( latest_session_path, "fxt-scalability-hmat-fem-node-1x4-50000", sep = "/" ), starvz_config ) ) hmat_fem_socket = panel_st_runtime( starvz_read( paste( latest_session_path, "fxt-scalability-hmat-fem-socket-2x2-50000", sep = "/" ), starvz_config ) ) hmat_fem_core = panel_st_runtime( starvz_read( paste( latest_session_path, "fxt-scalability-hmat-fem-core-4x1-50000", sep = "/" ), starvz_config ) ) hmat_fem_node = hmat_fem_node + theme_bw() + theme(text = element_text(family = "Arial", size = 16)) hmat_fem_node$scales$scales[[1]]$range = c(NA, 101000) hmat_fem_node$scales$scales[[1]]$limits = c(NA, 101000) hmat_fem_node$theme$legend.position = "none" hmat_fem_node$theme$axis.title.x = element_blank() hmat_fem_node$theme$axis.text.x = element_blank() hmat_fem_node$theme$axis.ticks.x = element_blank() hmat_fem_socket = hmat_fem_socket + theme_bw() + theme(text = element_text(family = "Arial", size = 16)) hmat_fem_socket$scales$scales[[1]]$range = c(NA, 101000) hmat_fem_socket$scales$scales[[1]]$limits = c(NA, 101000) hmat_fem_socket$theme$legend.position = "none" hmat_fem_socket$theme$axis.title.x = element_blank() hmat_fem_socket$theme$axis.text.x = element_blank() hmat_fem_socket$theme$axis.ticks.x = element_blank() hmat_fem_core = hmat_fem_core + theme_bw() + theme( text = element_text(family = "Arial", size = 16), legend.text = element_text(family = "Arial", size = 14) ) hmat_fem_core$scales$scales[[1]]$range = c(NA, 101000) hmat_fem_core$scales$scales[[1]]$limits = c(NA, 101000) hmat_fem_core$theme$legend.title = element_blank() hmat_fem_core$theme$legend.position = "bottom" grid.newpage() grid.draw( rbind( ggplotGrob(hmat_fem_node), ggplotGrob(hmat_fem_socket), ggplotGrob(hmat_fem_core), size = "max" ) )
Note that we do not tangle the source code in this section. Each figure can be generated by executing the associated code block from within the Emacs editor.
5. Appendix
5.1. Example standard output of a test_FEMBEM
execution
Testing : with MPF matrices. *** *** mpf version (commit ID ) *** Built on Jan 1 1970 at 00:00:01 *** [mpf] MPI thread level provided = MPI_THREAD_MULTIPLE [mpf] OpenMP thread number = 24 [mpf] MKL thread number = 24 [mpf] MPF_MAX_MEMORY = 64324 Mb 0 : [ test_FEMBEM@miriel060.plafrim.cluster is online... ] tmpdir=/tmp/felsoci/p-spido-25000 pid=4614 [mpf] The direct solver has been selected [mumps] Version : 5.2.1 *** 10 Command Line Parameters not read by MPFinit : *** -nbrhs 50 -z -radius 2 -height 4 --bem -nbpts 25000 *** *** scab version (commit ID ) *** Built on Jan 1 1970 at 00:00:01 *** Submit bug reports at https://imacs.polytechnique.fr/bugzilla/enter_bug.cgi *** [Scab] CPUID detection: SSE2=1 AVX=1 AVX2=1 [Scab] D01 structure stored out-of-core (forced by globalTmp=0, nodeTmp=1 or D01SingleReader=1) *** 10 Command Line Parameters not read by SCAB_Init : *** -nbrhs 50 -z -radius 2 -height 4 --bem -nbpts 25000 *** *** test_FEMBEM for scab version (commit ID ) *** Built on Jan 1 1970 at 00:00:01 *** Submit bug reports at https://imacs.polytechnique.fr/bugzilla/enter_bug.cgi *** Testing: BEM (dense matrix). Reading nbPts = 25000 Reading radius = 2.000000 Reading height = 4.000000 Reading nbRHS = 50 Setting lambda = 0.448399 (with 10.000000 points per wavelength) Double Complex Setting sparseRHS = 71 <PERFTESTS> MPF_Version = <PERFTESTS> TEST_FEMBEM_Version = <PERFTESTS> HMAT_Version = 2019.1.0 <PERFTESTS> ScalarType = DOUBLE COMPLEX Number of points for the test = 25000 Number of right hand sides for the test = 50 <PERFTESTS> StepMesh = 4.483993e-02 <PERFTESTS> NbPts = 25000 <PERFTESTS> NbPtsBEM = 25000 <PERFTESTS> NbRhs = 50 <PERFTESTS> nbPtLambda = 1.000000e+01 <PERFTESTS> Lambda = 4.483993e-01 Computing RHS... **** Computing Classical product... BEM: 0% ......... 10% ......... 20% ......... 30% ......... 40% ......... 50% ......... 60% ......... 70% ......... 80% ......... 90% ......... 100% Done. <PERFTESTS> TpsCpuClassic = 1.653491 Size of thread block : [358 x 358] Size of proc block : [3572 x 3572] Size of disk block : [3572 x 3572] MatAssembly_SPIDO : 0% ......... 10% ......... 20% ......... 30% ......... 40% ......... 50% ......... 60% ......... 70% ......... 80% ......... 90% ......... 100% Done. Assembly done. Making the matrix symmetric... <PERFTESTS> TpsCpuMPFCreation = 6.426428 VecAssembly_EMCP2 : 0% ......... 10% ......... 20% ......... 30% ......... 40% ......... 50% ......... 60% ......... 70% ......... 80% ......... 90% ......... 100% Done. VecAssembly_EMCP2 : 0% ......... 10% ......... 20% ......... 30% ......... 40% ......... 50% ......... 60% ......... 70% ......... 80% ......... 90% ......... 100% Done. MatFactorLDLt_SPIDO : 0% ......... 10% ......... 20% ......... 30% ......... 40% ......... 50% ......... 60% ......... 70% ......... 80% ......... 90% ......... 100% Done. MIN = 2.4028e+00 MAX = 3.5494e+00 COND = 1.4772e+00 <PERFTESTS> TpsCpuFactoMPF = 85.717232 <PERFTESTS> TpsCpuSolveMPF = 5.684338 **** Comparing results... <PERFTESTS> Error = 6.8930e-15 test_FEMBEM : end of computation Writing tracing report in traceCall.log Writing tracing report in traceCall.json