Ph.D. thesis of Marek Felšöci: Experimental environment

1. Literate programming
2. Building reproducible software environments
3. Performing benchmarks
4. Post-processing results
- 4.1. Benchmarks
- 4.2. StarPU execution traces
  - 4.2.1. Generating plots
5. Appendix
- 5.1. Example standard output of a test_FEMBEM execution

Reproducibility of numerical experiments has always been a complex matter. Building exactly the same software environment on multiple computing platforms may be long, tedious and sometimes virtually impossible to be done manually.

Within our research work, we put strong emphasis on ensuring the reproducibility of the experimental environment. On one hand, we make use of the GNU Guix transactional package manager allowing for a complete reproducibility of software environments across different machines. On the other hand, we rely on the principles of literate programming in an effort to maintain an exhaustive, clear and accessible documentation of our experiments and the associated environment.

1. Literate programming

We choose to write the source code of scripts and various configuration files allowing us to design and automatize numerical experiments in respect of the paradigm known as literate programming Knuth84. The idea of this approach is to associate source code with an explanation of its purpose written in a natural language.

There are numerous software tools designed for literate programming. We rely on Org mode for Emacs Dominik18,emacs which defines the Org markup language allowing to combine formatted text, images and figures with traditional source code. Files containing documents written in Org mode should end with the .org extension.

Extracting a compilable or interpretable source code from an Org document is called tangling OrgTangle. It is also possible to evaluate a particular source code block directly from the Emacs editor OrgEval while editing. For example, this is particularly useful for the visualization of experimental results.

Eventually, an Org document can be exported to various output formats OrgExport such as LaTeX or Beamer, HTML and so on.

The .org source this document is based on ¹ features the source code blocks of multiple scripts and configuration files that needs to be tangled into separate source files before constructing our software environment and realizing the experiments.

2. Building reproducible software environments

We are likely to work on various computing platforms and need to manage multiple software packages and their dependencies across different machines. To keep our software environment reproducible, we rely on the GNU Guix guix package manager.

2.1. GNU Guix

Guix is a transactional package manager and a stand-alone GNU Linux distribution where each user can install its own packages without any impact on the others and with the possibility to switch between multiple system or package versions. An environment created using Guix is fully reproducible across different computing platforms. In other words, a package built on a computer with a given commit remains the same when rebuilt on another machine.

2.2. Channels

Software packages in Guix are available through dedicated Git repositories, called channels, containing package definitions.

The first and default channel is the system channel providing Guix itself as well as the definitions of some commonly used packages such as system libraries, compilers, text editors and so on. Afterwards, we need to include additional channels using a custom channel file channels.scm.

For each channel, we specify the commit of the associated repository to acquire. This way, we make sure to always build the environment using the exact same versions of every single package in the system and guarantee a better reproducibility.

(list
 (channel
  (name 'guix)
  (url "https://git.savannah.gnu.org/git/guix.git")
  (commit "9235dd136e70dfa97684aff4e9af4c0ce366ad68"))

Following the system channel, we include guix-hpc and guix-hpc-non-free providing various open-source and non-free scientific software and libraries.

 (channel
  (name 'guix-hpc)
  (url "https://gitlab.inria.fr/guix-hpc/guix-hpc.git")
  (commit "7e7fdfb641f5009dcf21cd8f8edc0d41d1faa7f5"))
 (channel
  (name 'guix-hpc-non-free)
  (url "https://gitlab.inria.fr/guix-hpc/guix-hpc-non-free.git")
  (commit "1305692c778810747c903cfa7e1d729d74ff2fb4"))

We need also the private channel guix-hpc-airbus defining proprietary Airbus packages providing the benchmark software we use. The channel is private and one must have access to the corresponding repository and a duly configured SSH key in $HOME/.ssh on the target machine.

 (channel
  (name 'guix-hpc-airbus)
  (url "git@gitlab.inria.fr:mfelsoci/guix-hpc-airbus.git")
  (commit "014bccdfebc4aa4ec4cc3f523301e55759e7a9ac"))

Eventually, we import the guix-extra channel providing some utility packages we use mainly during the post-processing of benchmark results.

 (channel
  (name 'guix-extra)
  (url "https://gitlab.inria.fr/mfelsoci/guix-extra.git")
  (commit "35cdd19f80516ad7b89a68be5184910719324aca")))

The channel file must be saved to $HOME/.config/guix/channels.scm and to make the system aware of the changes, it is necessary to construct a new system generation using guix pull.

See the complete source file ².

2.3. Environments

To enter a particular software environment using Guix, we use the command guix shell. Options of the latter allows us to specify the packages to include together with the desired version, commit or branch.

To avoid typing lot of command line options anytime we want to enter the environment, we use the setenv.sh shell script to automatize the process. It also allows us to adjust the environment using specific options.

We rely on the benchmark suite test_FEMBEM from Airbus provided in one of their proprietary software packages, namely scab. test_FEMBEM allows us to evaluate numerous solver configurations. There is also an open-source version of test_FEMBEM connected exclusively to open-source solvers. It is available in Guix as a separate package.

scab as well as its direct dependencies, mpf and hmat, are proprietary packages with closed source code residing in private Git repositories. Before setting up the environment, one has to acquire their local copies so that Guix can build the environment.

The setenv.sh script begins with a help message function accessible through the -h option.

function help() {
  echo "Set up the experimental environment for the Ph.D. thesis of Marek" \
       "Felšöci." >&2
  echo "Usage: $0 [options]" >&2
  echo >&2
  echo "Options:" >&2
  echo "  -A                Instead of the mainstream version of 'hmat', use" \
       "its derived version containing the developments made by Aurélien" \
       "Falco during his Ph.D. thesis." >&2
  echo "  -e ENVIRONMENT    Switch the software environment to ENVIRONMENT." \
       "Available environments are: 'benchmark' (includes packages for" \
       "performing benchmarks, default choice), 'gcvb' (includes packages" \
       "only for generating and launching benchmark job scripts, does not" \
       "allow for running benchmark executables) 'ppp' (include packages for" \
       "pre-processing and post-processing actions)." >&2
  echo "  -E                Instead of the mainstream versions of 'mpf' and" \
       "'scab', use their derived versions supporting the energy_scope" \
       "tool." >&2
  echo "  -g                Get the final 'guix environment' command line" \
       "that can be followed by an arbitrary command to be executed inside" \
       "the environment. The trailing '--' should be added manually!" >&2
  echo "  -h                Show this help message." >&2
  echo "  -M                Instead of the mainstream versions of 'hmat'," \
       "'mpf' and 'scab' use their derived versions containing our work," \
       "e. g. the 'mf/devel' branches of the associated Git repositories." >&2
  echo "  -o                Use exclusively open-source software" \
       "(open-source 'test_FEMBEM' with sequential version of 'hmat')."
  echo "  -O                Use OpenBLAS instead of Intel(R) MKL." >&2
  echo "  -p                Prepare the source tarballs of the Airbus" \
       "packages. If combined with '-r', the tarballs shall be placed into" \
       "the directory specified by the latter. Also, the option can be used" \
       "together with the '-A' or the '-M' option." >&2
  echo "  -r ROOT           Search for Airbus sources at ROOT in lieu of the" \
       "default location at '$HOME/src'." >&2
  echo "  -s PATH           Instead of entering the specified environment," \
       "create the corresponding Singularity image at PATH." >&2
  echo "  -S                If the '-s' option is set, build the Singularity" \
       "image using the same version of Open MPI as the one available on the" \
       "Curta super-computer, e. g. replace the 'openmpi' and the" \
       "'openmpi-mpi1-compat' packages by 'openmpi-curta'." >&2
  echo "  -x                Use StarPU with the FXT support." >&2
}

We use a generic error message function. The error message to print is expected to be the first argument to the function. If not present, the function displays a generic error message.

function error() {
  if test $# -lt 1;
  then
    echo "An unknown error occurred!" >&2
  else
    echo "Error: $1" >&2
  fi
}

The variable WITH_INPUT_STARPU_FXT contains the --with-input option for the guix environment command necessary to replace the usage of the ordinary StarPU Starpu2010 package by the one compiled with FXT trace generation support whenever the -x option is passed on the command line.

WITH_INPUT_STARPU_FXT=""

Similarly, the WITH_INPUT_MKL variable contains the --with-input options necessary to replace the usage of OpenBLAS OpenBLAS by Intel(R) Math kernel library MKL in concerned packages.

WITH_INPUT_MKL="--with-input=pastix-5=pastix-5-mkl \
--with-input=mumps-scotch-openmpi=mumps-mkl-scotch-openmpi \
--with-input=openblas=mkl
"

OPEN_SOURCE handles the -o option allowing to use exclusively open-source solvers and the open-source version of the test_FEMBEM suite. By default, the option is disabled.

OPEN_SOURCE=0

AIRBUS_ROOT specifies the location where to search for the source tarballs of the Airbus packages. This can be modified using the -r option. The default value is $HOME/src.

AIRBUS_ROOT="$HOME/src"

By default, the script suppose the tarballs exist already and tries to set up the environment directly. PREPARE_TARBALLS is a boolean switch indicating whether the source tarballs of the Airbus packages should be generated instead of setting up the environment. For machines without access to the concerned private Airbus repositories, the tarballs can be generated on an another machine using the -p option.

PREPARE_TARBALLS=0

Normally, we want to set up the environment using the mainstream release of hmat. The -A option and the associated boolean AIRBUS_AF allows to switch to the version containing the developments made by Aurélien Falco Falco2019 that have not been merged into the mainstream version of the package yet.

AIRBUS_AF=0

The -M option and the associated boolean AIRBUS_MF allows to switch to our development versions of Airbus packages.

AIRBUS_MF=0

The -E option and the associated boolean AIRBUS_ES allows to switch to the versions of mpf and scab having the support for the energy_scope tool (see Section 3.7).

AIRBUS_ES=0

At this stage, we are ready to parse the options and check the validity of option arguments where applicable.

GET_COMMAND=0
CURTA=0

ENVIRONMENT="benchmark"

while getopts ":Ae:EgMoOhpr:s:Sx" option;
do
  case $option in
    A)
      AIRBUS_AF=1
      ;;

The -e option allows to choose among multiple software environments available.

    e)
      ENVIRONMENT=$OPTARG
      ;;
    E)
      AIRBUS_ES=1
      ;;

The -g option allows to print out the final guix shell command instead of directly entering the environment. This is useful for writing one-line commands, for example, in the GitLab continuous integration configuration.

    g)
      GET_COMMAND=1
      ;;
    M)
      AIRBUS_MF=1
      ;;
    o)
      OPEN_SOURCE=1
      ;;
    O)
      WITH_INPUT_MKL=""
      ;;
    p)
      PREPARE_TARBALLS=1
      ;;
    r)
      AIRBUS_ROOT=$OPTARG

      if test ! -d "$AIRBUS_ROOT";
      then
        error "'$AIRBUS_ROOT' is not a valid directory!"
        exit 1
      fi
      ;;

The -s option allows to create a Singularity container corresponding to the selected environment instead of directly entering it. If the container is intended to be used on the Curta supercomputer curta, the -S option allows to ensure the same version of Open MPI during the build as the one available on Curta.

    s)
      SINGULARITY=$OPTARG
      SINGULARITY_DIR=$(dirname "$SINGULARITY")

      if test ! -d $SINGULARITY_DIR || test ! -w $SINGULARITY_DIR;
      then
        error "'$SINGULARITY_DIR' is not a valid and writable directory!"
        exit 1
      fi
      ;;
    S)
      CURTA=1
      ;;
    x)
      WITH_INPUT_STARPU_FXT="--with-input=starpu=starpu-fxt"
      ;;

We must also take into account unknown options, missing option arguments, syntax mismatches as well as the case when the -h option is specified.

    \?)
      error "Arguments mismatch! Invalid option '-$OPTARG'."
      echo
      help
      exit 1
      ;;
    :)
      error "Arguments mismatch! Option '-$OPTARG' expects an argument!"
      echo
      help
      exit 1
      ;;
    h | *)
      help
      exit 0
      ;;
  esac
done

The following variables indicate the commit numbers, branch names and archive locations to use by default for the generation of the Airbus source tarballs.

HMAT_BASENAME="hmat-2021.1.0-21.52c4f6c"
HMAT_TARBALL="$AIRBUS_ROOT/$HMAT_BASENAME.tar.gz"
HMAT_COMMIT="52c4f6c9dfa084ad67bb9876c30a049edbd82b07"
HMAT_BRANCH="master"
MPF_BASENAME="mpf-2021.1.0-26.a158c8c"
MPF_TARBALL="$AIRBUS_ROOT/$MPF_BASENAME.tar.gz"
MPF_COMMIT="a158c8c65c223b732479ff3f321c9b0cb5dd3549"
MPF_BRANCH="master"
SCAB_BASENAME="scab-2021.1.0-30.6f669fe"
SCAB_TARBALL="$AIRBUS_ROOT/$SCAB_BASENAME.tar.gz"
SCAB_COMMIT="6f669fe98773013265240cdca4ce274f4dd85237"
SCAB_BRANCH="master"
ACTIPOLE_BASENAME="actipole-2020.0.1-105.c84cd1a"
ACTIPOLE_TARBALL="$AIRBUS_ROOT/$ACTIPOLE_BASENAME.tar.gz"
ACTIPOLE_COMMIT="c84cd1a84224ea6eb34c1a4f4ed59c818d66c3fe"
ACTIPOLE_BRANCH="master"

However, when the -A, the -M or the -E option is used, we need to update these specifications accordingly.

if test $AIRBUS_AF -ne 0;
then
  HMAT_BASENAME="hmat-af-1.6.0-382.60743b1"
  HMAT_TARBALL="$AIRBUS_ROOT/$HMAT_BASENAME.tar.gz"
  HMAT_COMMIT="60743b119afb966b6987c4421b949482dfbbf04f"
  HMAT_BRANCH="mf/af/bcsf"
  MPF_BASENAME="mpf-af-1.25.0-134.6fbcc2e"
  MPF_TARBALL="$AIRBUS_ROOT/$MPF_BASENAME.tar.gz"
  MPF_COMMIT="6fbcc2e67713205b1c3bbebdda1f6377d9698070"
  MPF_BRANCH="af/devel"
  SCAB_BASENAME="scab-af-1.9.0-175.2971244"
  SCAB_TARBALL="$AIRBUS_ROOT/$SCAB_BASENAME.tar.gz"
  SCAB_COMMIT="29712447d805d957d715a1f5ab7d8e6903997652"
  SCAB_BRANCH="af/ND"
elif test $AIRBUS_MF -ne 0;
then
  HMAT_BASENAME="hmat-mf-2021.1.0-23.db095d4"
  HMAT_TARBALL="$AIRBUS_ROOT/$HMAT_BASENAME.tar.gz"
  HMAT_COMMIT="db095d49d6282582701e84ab64a9a03b65780271"
  HMAT_BRANCH="mf/devel"
  MPF_BASENAME="mpf-mf-2021.1.0-27.54cd92b"
  MPF_TARBALL="$AIRBUS_ROOT/$MPF_BASENAME.tar.gz"
  MPF_COMMIT="54cd92b1976f634053771a98f4af8afe0bc71c90"
  MPF_BRANCH="mf/devel"
  SCAB_BASENAME="scab-mf-2021.1.0-30.6f669fe"
  SCAB_TARBALL="$AIRBUS_ROOT/$SCAB_BASENAME.tar.gz"
  SCAB_COMMIT="6f669fe98773013265240cdca4ce274f4dd85237"
  SCAB_BRANCH="mf/devel"
elif test $AIRBUS_ES -ne 0;
then
  MPF_BASENAME="mpf-energy_scope-2021.1.0-27.9aba318"
  MPF_TARBALL="$AIRBUS_ROOT/$MPF_BASENAME.tar.gz"
  MPF_COMMIT="9aba3184bb806dc0a05e2a933bfe858472bb207a"
  MPF_BRANCH="mf/energy_scope"
  SCAB_BASENAME="scab-energy_scope-2021.1.0-32.3056f3c"
  SCAB_TARBALL="$AIRBUS_ROOT/$SCAB_BASENAME.tar.gz"
  SCAB_COMMIT="3056f3c7a986cc2fabb6663707272ccf16fbb614"
  SCAB_BRANCH="mf/energy_scope"
  ACTIPOLE_BASENAME="actipole-energy_scope-2020.0.1-105.c84cd1a"
  ACTIPOLE_TARBALL="$AIRBUS_ROOT/$ACTIPOLE_BASENAME.tar.gz"
  ACTIPOLE_COMMIT="c84cd1a84224ea6eb34c1a4f4ed59c818d66c3fe"
  ACTIPOLE_BRANCH="master"
fi

If the -p option is specified, we get a clone of the Airbus repositories and create the source tarballs using the specified commit number and branch name instead of trying to setup up the environment.

if test $PREPARE_TARBALLS -ne 0;
then

We begin by removing any previous clones of the Airbus repositories in AIRBUS_ROOT.

  rm -rf $AIRBUS_ROOT/$HMAT_BASENAME $AIRBUS_ROOT/$MPF_BASENAME \
     $AIRBUS_ROOT/$SCAB_BASENAME $AIRBUS_ROOT/$ACTIPOLE_BASENAME $HMAT_TARBALL \
     $MPF_TARBALL $SCAB_TARBALL $ACTIPOLE_TARBALL

Then, we make fresh clones, checkout the required revisions

  git clone --recurse-submodules --single-branch --branch $HMAT_BRANCH \
      https://private-server.com/airbus/hmat $AIRBUS_ROOT/$HMAT_BASENAME
  cd $AIRBUS_ROOT/$HMAT_BASENAME
  git checkout $HMAT_COMMIT
  git submodule update

  git clone --single-branch --branch $MPF_BRANCH \
      https://private-server.com/airbus/mpf $AIRBUS_ROOT/$MPF_BASENAME
  cd $AIRBUS_ROOT/$MPF_BASENAME
  git checkout $MPF_COMMIT

  git clone --single-branch --branch $SCAB_BRANCH \
      https://private-server.com/airbus/scab $AIRBUS_ROOT/$SCAB_BASENAME
  cd $AIRBUS_ROOT/$SCAB_BASENAME
  git checkout $SCAB_COMMIT

  git clone --single-branch --branch $ACTIPOLE_BRANCH \
      https://private-server.com/airbus/actipole $AIRBUS_ROOT/$ACTIPOLE_BASENAME
  cd $AIRBUS_ROOT/$ACTIPOLE_BASENAME
  git checkout $ACTIPOLE_COMMIT

and verify that the cloned repositories are valid directories.

  if test ! -d $AIRBUS_ROOT/$HMAT_BASENAME || \
      test ! -d $AIRBUS_ROOT/$MPF_BASENAME || \
      test ! -d $AIRBUS_ROOT/$SCAB_BASENAME || \
      test ! -d $AIRBUS_ROOT/$ACTIPOLE_BASENAME;
  then
    error "Failed to clone the Airbus reporitories!"
    exit 1
  fi

We remove the .git folders from inside the clones to shrink the size of the final tarball created using the tar utility.

  rm -rf $AIRBUS_ROOT/$HMAT_BASENAME/.git \
     $AIRBUS_ROOT/$HMAT_BASENAME/hmat-oss/.git $AIRBUS_ROOT/$MPF_BASENAME/.git \
     $AIRBUS_ROOT/$SCAB_BASENAME/.git $AIRBUS_ROOT/$ACTIPOLE_BASENAME/.git

  tar -czf $HMAT_TARBALL -C $AIRBUS_ROOT $HMAT_BASENAME
  tar -czf $MPF_TARBALL -C $AIRBUS_ROOT $MPF_BASENAME
  tar -czf $SCAB_TARBALL -C $AIRBUS_ROOT $SCAB_BASENAME
  tar -czf $ACTIPOLE_TARBALL -C $AIRBUS_ROOT $ACTIPOLE_BASENAME

At the end of the procedure, we check if the tarballs were created and remove the clones.

  if test ! -f $HMAT_TARBALL || test ! -f $MPF_TARBALL || \
      test ! -f $SCAB_TARBALL || test ! -f $ACTIPOLE_TARBALL;
  then
    error "Failed to create tarballs!"
    exit 1
  fi

  rm -rf $AIRBUS_ROOT/$HMAT_BASENAME $AIRBUS_ROOT/$MPF_BASENAME \
     $AIRBUS_ROOT/$SCAB_BASENAME $AIRBUS_ROOT/$ACTIPOLE_BASENAME

  exit 0
fi

Eventually comes the guix shell command itself. We use a variable to hold the name of the package containing the solver test suite test_FEMBEM. Indeed, by default we use its mainstream proprietary version from the scab package. When the -A option is used, we rely on scab-af. When the -M option is used, we rely on scab-mf. When the -E option is used, we rely on scab-energy_scope and actipole-energy_scope. Finally, if the -o option is used, we rely on the open-source version of test_FEMBEM available directly as a standalone Guix package of the same name.

SCAB="scab"
ACTIPOLE="actipole"
if test $AIRBUS_AF -ne 0;
then
  SCAB="scab-af"
elif test $AIRBUS_MF -ne 0;
then
  SCAB="scab-mf"
elif test $AIRBUS_ES -ne 0;
then
  SCAB="scab-energy_scope"
  ACTIPOLE="actipole-energy_scope"
elif test $OPEN_SOURCE -ne 0;
then
  SCAB="test_FEMBEM"
fi

In order to access the additional features we implemented into gcvb (see Section 3), we switch to our fork of the package's repository, namely gcvb-felsocim. Sometimes, a local clone of the latter is necessary. Being hosted on GitHub, it can not be acquired online by Guix on some computing platforms having too restrictive proxy settings.

if test ! -d $AIRBUS_ROOT/gcvb;
then
  git clone https://github.com/felsocim/gcvb.git $AIRBUS_ROOT/gcvb
fi

The list of packages to include into the resulting environment as well as the package modifiers and options to pass to the guix shell command or to the guix pack command, if the -s option is set, are based on the environment switch -e and the -x option. Also, if the -S option is used together with -s, the version of Open MPI should match the one available on the Curta cluster.

WITH_INPUT_OPENMPI=""

if test $CURTA -ne 0 && test "$SINGULARITY" != "";
then
  WITH_INPUT_OPENMPI="--with-input=hwloc=hwloc@1 
--with-input=openmpi=openmpi-curta
--with-input=openmpi-mpi1-compat=openmpi-curta"
fi

Available environments are listed below. Note that, the --preserve option allows us to inherit selected environment variables from the parent environment.

benchmark: environment for performing benchmarks,

MODIFIERS_BENCHMARK="$WITH_INPUT_OPENMPI $WITH_INPUT_MKL $WITH_INPUT_STARPU_FXT
--with-git-url=gcvb-minimal-felsocim=$AIRBUS_ROOT/gcvb"
OPTIONS_BENCHMARK="--preserve=^SLURM"
PACKAGES_BENCHMARK="bash coreutils inetutils findutils grep sed bc jq openssh
tar gzip likwid python python-psutil python-numpy python-matplotlib
gcvb-minimal-felsocim $SCAB $ACTIPOLE"

if test "$SINGULARITY" == "";
then
  OPTIONS_BENCHMARK="$OPTIONS_BENCHMARK --with-input=slurm=slurm@19"
  PACKAGES_BENCHMARK="$PACKAGES_BENCHMARK slurm openmpi"
fi

if test "$WITH_INPUT_STARPU_FXT" != "";
then
  PACKAGES_BENCHMARK="$PACKAGES_BENCHMARK r r-starvz"
fi

gcvb: environment for generating and launching benchmak job scripts, but not for running the benchmark jobs themselves which are expected to run in a Singularity container,

MODIFIERS_GCVB="--with-input=slurm=slurm@19
--with-git-url=gcvb-minimal-felsocim=$AIRBUS_ROOT/gcvb"
OPTIONS_GCVB="--preserve=SINGULARITY_EXEC --preserve=SINGULARITY_IMAGE"
PACKAGES_GCVB="bash coreutils inetutils findutils grep sed bc jq openssh tar
gzip likwid openmpi slurm python python-psutil gcvb-minimal-felsocim"

ppp: environment for pre-processing and post-processing actions, e.g. tangling source code from Org documents, HTML and LaTeX export, …

MODIFIERS_PPP="--with-input=r-ggplot2=r-ggplot2@git.bd50a551"
OPTIONS_PPP="--preserve=TZDIR"
PACKAGES_PPP="bash coreutils sed which emacs emacs-org2web emacs-org
emacs-htmlize emacs-biblio emacs-org-ref emacs-ess python-pygments texlive r
r-dbi r-rsqlite r-plyr r-dplyr r-readr r-tidyr r-ggplot2@git.bd50a551 r-scales
r-cowplot r-stringr r-gridextra r-ggrepel r-rjson r-starvz r-ascii r-r-utils
inkscape svgfix graphviz"

Based on the value of $ENVIRONMENT, we select the environment to set up.

MODIFIERS=""
OPTIONS=""
PACKAGES=""

case $ENVIRONMENT in
  benchmark)
    MODIFIERS="$MODIFIERS_BENCHMARK"
    OPTIONS="$OPTIONS_BENCHMARK"
    PACKAGES="$PACKAGES_BENCHMARK"
    ;;
  gcvb)
    MODIFIERS="$MODIFIERS_GCVB"
    OPTIONS="$OPTIONS_GCVB"
    PACKAGES="$PACKAGES_GCVB"
    ;;
  ppp)
    MODIFIERS="$MODIFIERS_PPP"
    OPTIONS="$OPTIONS_PPP"
    PACKAGES="$PACKAGES_PPP"
    ;;
  *)
    error "'$ENVIRONMENT' is not a valid software environment switch!"
    exit 1
    ;;
esac

Now it is possible to assemble the guix shell command, its options and the list of package to include in the resulting environment. To unset any existing environment variables of the current environment, we use the --pure option.

ENVIRONMENT_COMMAND="guix shell --pure $MODIFIERS $OPTIONS $PACKAGES"

If the -g option is set, we do only print the command on the standard output. If the -s option is set, we generate the corresponding Singularity image using the guix pack command and exit. Otherwise, we directly enter the new environment and launch a shell interpreter. The --norc option of bash prevents the sourcing of the current user's .bashrc file which could compromise the final environment with unwanted environment variables.

if test $GET_COMMAND -ne 0 && test "$SINGULARITY" == "";
then
  echo $ENVIRONMENT_COMMAND
  exit 0
fi

Note that the gcvb environment is intended for running benchmarks in a Singularity container. Therefore generating a container for the environment itself makes no sense!

if test "$SINGULARITY" != "" && test "$ENVIRONMENT" == "gcvb";
then
  error "Generating a container for the 'gcvb' environment makes no sense!"
  exit 1
fi

if test "$SINGULARITY" != "";
then
  PACK_COMMAND="guix pack -f squashfs -S /usr/bin/env=/bin/env $MODIFIERS
  $PACKAGES"
  PACK_COMMAND=$(echo $PACK_COMMAND | sed 's/\n/ /g')

  if test $GET_COMMAND -ne 0;
  then
    echo $PACK_COMMAND
    exit 0
  fi
  
  echo "Building Singularity container using '$PACK_COMMAND'..."
  
  $PACK_COMMAND > .packing
  if test $? -ne 0;
  then
    error "Failed to build the Singularity container!"
    exit 1
  fi

  echo "Done"
  echo "Copying the container to '$SINGULARITY'..."

  cp $(tail -n 1 .packing) $SINGULARITY
  if test $? -ne 0;
  then
    error "Failed to copy the Singularity container to '$SINGULARITY'!"
    exit 1
  fi

  echo "Done"
  echo "Making the container readable..."

  chmod 644 $SINGULARITY
  if test $? -ne 0;
  then
    error "Failed to make the Singularity container '$SINGULARITY' readable!"
    exit 1
  fi

  echo "Done"
  exit 0
fi

$ENVIRONMENT_COMMAND -- bash --norc

See the complete source file ³.

3. Performing benchmarks

To automatize the generation and the computation of benchmarks, we use gcvb gcvb, an open-source tool developed at Airbus. gcvb allows us to define benchmarks, generate corresponding shell job scripts for every benchmark or a selected group of benchmarks, submit these job scripts for execution, then gather and optionally validate or post-process the results. To generate multiple variants of the same benchmark, gcvb supports templates.

3.1. gcvb

gcvb uses a specific file and directory structure. A minimal setup requires a configuration file and a benchmark definition Yaml file. These files must be placed in the same folder. Furthermore, the name of the configuration file must be config.yaml. On the other hand, the benchmark definition file may have an arbitrary name. The folder we place thesefiles in represent the root of our gcvb filesystem:

benchmarks/ represents the root of the benchmark filesystem.
- data/ contains data necessary to generate and perform benchmarks.
  - all/ represents one of possibly more folders containing benchmark data. For the sake of simplicity, we use one single folder for all benchmarks.
    - input holds any input file necessary to generate benchmarks.
    - references holds any reference file needed for benchmark results validation.
    - templates provides file templates for template-based benchmarks. There is one subfolder for each file template. Note that we use templates to produce specific batch job script header directives for the workload manager on the target computing platform (see Section 3.2) as well as to generate wrapper scripts for launching benchmarks (see Section 3.11).
- results/ contains benchmark results. Here, one sub-folder is produced every time a new session of benchmarks is generated based on the definition file. It contains job scripts and one folder per generated benchmark. These folders may hold any templated-based input file as well as the result of the corresponding benchmark execution.
  - 1/
  - …
- config.yaml represents the configuration file.
- gcvb.db represents an auto-generated NoSQL database which can be used to store benchmark results.
- benchmarks.yaml represents the benchmark definition file.

3.2. `sbatch` templates

We use slurm slurm to schedule and execute our experiments on the target high-performance computing platforms. gcvb produces a job script for each benchmark described in the definition file. This script is then passed to slurm for scheduling on a computation node or nodes.

Each job script produced by gcvb is prepended with a header contaning the configuration statements for the sbatch command of slurm slurmGuide used to submit jobs for computation. We take advantage of the template feature in order to be able to dynamically generate #SBATCH headers specific to a given set of benchmarks.

An sbatch template begins as a standard shell script.

#! /usr/bin/env bash
#
# /slurm/ batch job script
#

We use multiple template files but most of the #SBATCH directives are common to all of them:

count of computational nodes to reserve,
```
#SBATCH -N {scheduler[nodes]}
```
count of task slots per node to reserve,
```
#SBATCH -n {scheduler[tasks]}
```
exclusion of the other users from the usage of the reserved resources,
```
#SBATCH --exclusive
```
reservation time,
```
#SBATCH --time={scheduler[time]}
```
location to place the slurm log files in where %x is the corresponding job name and %j the identifier of the job,
```
#SBATCH -o slurm/%x-%j.log
```
we exclude selected nodes because of inconsistent configuration (less RAM than expected).
```
#SBATCH --exclude=miriel019,miriel023,miriel030,miriel056
```

Note that {scheduler[nodes]} and so on represent placeholders for values replaced by actual values based on the definition Yaml file during template expansion (see Section 3.5).

One of the #SBATCH directives specific to each template file is the job name. Based on the job name, we distinguish different sets of benchmarks. Grouping individual benchmarks into a single job script allows us to submit less jobs. This way, they are more balanced in terms of the computation time we need to reserve for them on the target computing platform. For example, instead of submitting 12 jobs having each the time limit of 10 minutes, we submit two jobs with the time limit of 1 hour each. Benchmarks to be placed into a given common job script are identified by matching their job name against a regular expression. Finally, the --constraint switch allows us to specify the node family to rely on.

In the sbatch header template monobatch used for benchmarks with all the jobs running on single computational node, the job name is simply composed of a prefix which typically corresponds to the constant part of a benchmark name (see Section 3.5):

<<sbatch-beginning>>
#SBATCH --job-name={scheduler[prefix]}
#SBATCH --constraint={scheduler[family]}
<<sbatch-end>>

When a template-based benchmark definition yields a large amount of benchmarks, we prefer to group them into multiple job scripts and launch the latter in parallel. The value of {job[batch]} in the polybatch header determines which benchmark belongs to which job script:

<<sbatch-beginning>>
#SBATCH --job-name={scheduler[prefix]}-{job[batch]}
#SBATCH --constraint={scheduler[family]}
<<sbatch-end>>

polybatch-distributed is a variation of polybatch for distributed parallel jobs requiring to specify additional scheduling constraints.

<<sbatch-beginning>>
#SBATCH -N {scheduler[nodes]}
#SBATCH --ntasks-per-node {scheduler[nt]}
#SBATCH --exclusive
#SBATCH --time={scheduler[time]}
#SBATCH -o slurm/%x-%j.log
#SBATCH --job-name={scheduler[prefix]}-{job[batch]}
#SBATCH --constraint={scheduler[family]}{scheduler[constraint]}
ulimit -c 0

For coupled solver benchmarks, we need a more fine grained distribution of the latter among slurm jobs. So, in the coupled sbatch header, we add to the job name also the names of sparse and dense solvers involved in the benchmark:

<<sbatch-beginning>>
#SBATCH --job-name={scheduler[prefix]}-{job[batch]}-{sparse[name]}-{dense[name]}
#SBATCH --constraint={scheduler[family]}
<<sbatch-end>>

However, in some cases, the sparse and dense solver names are provided through a single placeholder:

<<sbatch-beginning>>
#SBATCH --job-name={scheduler[prefix]}-{job[batch]}-{solver[name]}
#SBATCH --constraint={scheduler[family]}
<<sbatch-end>>

For coupled solver benchmarks relying on out-of-core computation, we need to be able to specify extra constraints additionally to the node family name as well as to exclude specific nodes from the list of available computational nodes due to the fact that not all of the nodes of one node family have the desired hard disk type.

<<sbatch-beginning>>
#SBATCH --job-name={scheduler[prefix]}-{job[batch]}-{sparse[name]}-{dense[name]}-{dense[ooc]}
#SBATCH --constraint={scheduler[family]}{scheduler[constraint]}
<<sbatch-end>>

In case of multi-node parallel distributed benchmarks, we specify task count per node instead of specifying the total amount of tasks. Furthermore, just like for out-of-core benchmarks, we need to be able to specify additional node constraints.

<<sbatch-beginning>>
#SBATCH -N {scheduler[nodes]}
#SBATCH --ntasks-per-node {scheduler[nt]}
#SBATCH --exclusive
#SBATCH --time={scheduler[time]}
#SBATCH -o slurm/%x-%j.log
#SBATCH --job-name={scheduler[prefix]}-{sparse[name]}-{dense[name]}-{scheduler[nodes]}-{job[nbpts]}
#SBATCH --constraint={scheduler[family]}{scheduler[constraint]}
ulimit -c 0

Same for scalability benchmarks. In scalability, we add to the job name the name of the solver used:

<<sbatch-beginning>>
#SBATCH --job-name={scheduler[prefix]}-{solver[name]}-{parallel[map]}
#SBATCH --constraint={scheduler[family]}
<<sbatch-end>>

At the end of the header, we want to record the date and the time when the job was scheduled and on which node.

echo "Job scheduled on $(hostname), on $(date)"
echo

Then, we disable the creation of memory dump files in case of a memory error. Even if they can be particulary useful, in some cases they consume too much disk space and prevent other jobs from running.

ulimit -c 0

See the complete source files monobatch ⁴, polybatch

3.3. Ensuring filesystem

3.3.1. Initialization script

We wrote the shell script mkgcvbfs.sh to automatize the initialization of a gcvb filesystem or to check if a specific gcvb filesystem is valid.

Traditionally, the script begins with a help message function that can be triggered using the -h option.

function help() {
  echo "Initialize a gcvb file system described in FSTAB at FSPATH." >&2
  echo "Usage: ./$(basename $0) [options]" >&2
  echo >&2
  echo "Options:" >&2
  echo "  -h           Show this help message." >&2
  echo "  -c           Check if a valid gcvb filesystem is present in PATH." >&2
  echo "  -f FSTAB     Initialize the gcvb filesystem specified in FSTAB." >&2
  echo "  -o FSPATH    Set the output path for the filesystem to create." >&2
}

We use a generic error message function. The error message to print is expected to be the first argument to the function. If not present, the function displays a generic error message.

function error() {
  if test $# -lt 1;
  then
    echo "An unknown error occurred!" >&2
  else
    echo "Error: $1" >&2
  fi
}

The script requires an .fstab file describing the filesystem to create (see Section 3.3.2), e. g. the entries to initialize the filesystem with and the destination path of the latter.

FSTAB holds the path to an .fstab description file provided using the -f option.

FSTAB=""

FSPATH holds the destination path to create the filesystem in specified using the -o option.

FSPATH=""

The -c option, corresponding to the CHECK_ONLY boolean variable, allows to check an existing gcvb filesystem against an .fstab description instead of creating it.

CHECK_ONLY=0

At this stage, we are ready to parse the options and check the validity of option arguments where applicable.

while getopts ":hcf:o:" option;
do
  case $option in
    c)
      CHECK_ONLY=1
      ;;
    f)
      FSTAB=$OPTARG

      if test ! -f $FSTAB;
      then
        error "'$FSTAB' is not a valid file!"
        exit 1
      fi
      ;;
    o)
      FSPATH=$OPTARG
      ;;

We must also take into account unknown options, missing option arguments, syntax mismatches as well as the case when the -h option is specified.

    \?) # Unknown option
      error "Arguments mismatch! Invalid option '-$OPTARG'."
      echo
      help
      exit 1
      ;;
    :) # Missing option argument
      error "Arguments mismatch! Option '-$OPTARG' expects an argument!"
      echo
      help
      exit 1
      ;;
    h | *)
      help
      exit 0
      ;;
  esac
done

Next, we have to check if the user has provided the path to the .fstab file

if test "$FSTAB" == "";
then
  error "No filesystem description file was specified!"
  exit 1
fi

as well as the destination path of the gcvb filesystem to create.

if test "$FSPATH" == "";
then
  error "No output location for the filesystem was specified!"
  exit 1
fi

Eventually, we process all of the entries in the .fstab description file. Each line represents a specification of an entry in the gcvb filesystem to initialize (see Section 3.3.2). Notice that, to separate information in an entry specification we use colons.

for entry in $(cat $FSTAB);
do

The first information tells us whether a file or a directory should be initialized.

  ACTION=$(echo $entry | cut -d':' -f 1)
  case $ACTION in

If it is a file, follows its source path and its destination in the target filesystem.

    F|f)
      SOURCE=$(echo $entry | cut -d':' -f 2)
      DESTINATION=$(echo $entry | cut -d':' -f 3)

If the -c option is passed (see variable CHECK_ONLY), we only check that the target filesystem contains the file.

      if test $CHECK_ONLY -ne 0;
      then
        if test ! -f $FSPATH/$DESTINATION;
        then
          error "Filesystem is incomplete! Missing '$FSPATH/$DESTINATION'."
          exit 1
        fi

        continue
      fi

Otherwise, we need to check if the source file exists

      if test ! -f $SOURCE;
      then
        error "Failed to initialize file '$SOURCE'!"
        exit 1
      fi

before creating it at the desired path in the destination filesystem.

      mkdir -p $FSPATH/$(dirname $DESTINATION) && \
        cp $SOURCE $FSPATH/$DESTINATION
      if test $? -ne 0;
      then
        error "Failed to initialize file '$FSPATH/$DESTINATION'!"
        exit 1
      fi
      ;;

If the entry specifies a directory, follows its destination path in the filesystem being initialized.

    D|d)
      DESTINATION=$(echo $entry | cut -d':' -f 2)

If the -c option is passed (see variable CHECK_ONLY), we only check that the target filesystem contains the directory.

      if test $CHECK_ONLY -ne 0;
      then
        if test ! -d $FSPATH/$DESTINATION;
        then
          error "Filesystem is uncomplete! Missing '$FSPATH/$DESTINATION'."
          exit 1
        fi

        continue
      fi

Otherwise, we create the directory at the specified path.

      mkdir -p $FSPATH/$DESTINATION
      if test $? -ne 0;
      then
        error "Failed to initialize directory '$FSPATH/$DESTINATION'!"
        exit 1
      fi
      ;;

We also need to take care of the case where the action specified in the description file is not known.

    *)
      error "Failed to initialize filesystem! '$ACTION' is not a valid action."
      exit 1
      ;;
  esac
done

We finish by printing an information about successful filesystem initialization or check in case the -c option is passed.

if test $CHECK_ONLY -ne 0;
then
  echo "Successfully checked the filesystem '$FSPATH'."
else
  echo "Successfully initialized a fresh gcvb filesystem at '$FSPATH'."
fi

See the complete source file mkgcvbfs.sh ⁵.

3.3.2. Description file

The format of an .fstab description file is very straightforward. Each line must begin with either a D=/=d or a F=/=f indicating whether a directory or a file should be initialized. In case of a directory, the letter is followed by a colon and the destination path of the directory. In case of a file, follows a colon, the source path of the file, a colon and the destination path in the target filesystem.

To describe the filesystem from Section 3.1, we use the benchmarks.fstab description file:

D:data/all/input
D:data/all/references
D:data/all/templates
D:results
D:slurm
F:monobatch:data/all/templates/monobatch/sbatch
F:polybatch:data/all/templates/polybatch/sbatch
F:polybatch-distributed:data/all/templates/polybatch-distributed/sbatch
F:coupled:data/all/templates/coupled/sbatch
F:coupled-simple:data/all/templates/coupled-simple/sbatch
F:coupled-ooc:data/all/templates/coupled-ooc/sbatch
F:coupled-distributed:data/all/templates/coupled-distributed/sbatch
F:scalability:data/all/templates/scalability/sbatch
F:wrapper-in-core.sh:data/all/templates/wrapper-in-core/wrapper.sh
F:wrapper-ooc.sh:data/all/templates/wrapper-ooc/wrapper.sh
F:wrapper-fxt.sh:data/all/templates/wrapper-fxt/wrapper.sh
F:wrapper-in-core-distributed.sh:data/all/templates/wrapper-in-core-distributed/wrapper.sh
F:es_config.json:data/all/templates/es/es_config.json
F:setenv.sh:scripts/setenv.sh
F:submit.sh:scripts/submit.sh
F:rss.py:scripts/rss.py
F:inject.py:scripts/inject.py
F:parse.sh:scripts/parse.sh
F:config.yaml:config.yaml
F:benchmarks.yaml:benchmarks.yaml

See the complete description file benchmarks.fstab ⁶.

3.4. Configuration file

The configuration file is designed to provide a machine-specific information for a gcvb benchmark collection such as the submit command for job scripts, etc. Nevertheless, our configuration does not vary from machine to machine, so we use the same config.yaml everywhere.

The configuration of a gcvb benchmark collection is simple. It usually holds in a few lines of code beginning by a machine identifier.

machine_id: generic

The most important is to define the path to the executable used to submit job scripts produced by gcvb. We rely on slurm as workload manager and we use its sbatch command to submit job scripts.

submit_command: sbatch

Sometimes, we need to re-run the validation phase (see Section 3.5) of benchmarks without repeating the computation itself, e.g. in case of a change in the result parsing script (see Section 3.8). This phase does not have to be performed as a Slurm job on a separate node. Therefore, we do not want to use sbatch as submit command but simply execute the job script produced by gcvb in a Bash shell on the current node.

va_submit_command: bash

Eventually, an associative list of executables can be defined for a handy access from definition file. Although, as the executables are not available in the validation phase (see Section 3.5), we can not make use of the mechanism and initialize executables as an empty list.

executables: []

See the complete configuration file config.yaml ⁷.

3.5. Definition file

The benchmark definition file begins by a set of default values automatically set for each benchmark defined in the file.

At first, we make all the benchmarks use the same data folder (see Section 3.1). Defining the benchmarks as of type template allows us to make gcvb automatically generate benchmarks for different set of parameters (see Section 3). We address this functionality further in this section too. Also, we want to keep resource usage and energy consumption logs in the gcvb SQLite databse.

default_values:
  test:
    description: "A test_FEMBEM benchmark."
    data: "all"
    type: "template"
    keep: ["rss*.log", "hdd*.log", "likwid*.csv",
           "tmp_energy_scope*/energy_scope_eprofile*.txt"]

For each task, we define the MPI parallel configuration options using the nprocs key. The placeholders {parallel[np]}, {parallel[map]}, {parallel[rank]} and {parallel[bind]} shall be replaced during the generation of the benchmark job script (see below).

  task:
    nprocs: "-np {parallel[np]} -map-by ppr:1:{parallel[map]}
    -rank-by {parallel[rank]} -bind-to {parallel[bind]}"

The nthreads key can be used to specify the number of threads to put in action. Nevertheless, we set these values in the wrapper script used to launch benchmark tasks (see Section 3.11). As nthreads must be initialized even if it is not used, we initialize it with an empty string.

    nthreads: ""

The main executable to run benchmark is given by the executable key. Although we use the test_FEMBEM solver test suite, we do not launch the corresponding executable directly. We use a Shell wrapper to perform some preparation and completion actions (see Section 3.11) as well as a Python wrapper to trace memory and storage resource consumption (see Section 3.6). At the end, all of this is run using mpirun.

    executable: mpirun

executable and nprocs are then reused in the final launch_command of a given benchmark. Below, we define the global launch command for all the benchmark tasks based on the keys from the @job_creation alias allowing us to access task attributes potentially specific to each benchmark, i.e. executable, options and nprocs, from within launch_command.

Note that commands are run from within benchmark-specific folders under the results/<session> directories (see Section 3.1). Also, we redirect standard and error outputs to dedicated log files, i.e. stdout.log and stderr.log respectively. stdout.log is used later for parsing benchmark results (see Section 3.8). Another copy of the standard output is saved to slurm-$SLURM_JOBID.out where SLURM_JOBID is an environment variable set by the job scheduler slurm. This copy is required by the power consumption monitoring tool energy_scope we rely on (see Section 3.7).

The wrapper script generated by gcvb based on the corresponding template is not executable. Thus, we need to make it executable before running the launch_command.

    launch_command: "chmod +x wrapper.sh && {@job_creation[executable]}
    {@job_creation[nprocs]} $SINGULARITY_EXEC $SINGULARITY_IMAGE bash
    ./wrapper.sh python3 ../../../scripts/rss.py test_FEMBEM -radius 2 -height 4
    {@job_creation[options]} 2>&1 | tee stdout.log | tee slurm-$SLURM_JOBID.out"

In our case, we do not perform any result validation in terms of value check. Although, we take advantage of the validation phase in gcvb to gather data from log files into a separate data.csv file per benchmark and inject them into the gcvb NoSQL database (see Section 3.1) using a Python script (see Section 3.9). The latter calls our parsing script (see Section 3.8) to extract data from the output logs.

We begin by defining the type of each validation. The most adapted type for our needs is the generic configuration_independent type gcvb.

  validation:
    type: "configuration_independent"

Then, we set the validation executable. The executable, i.e. inject.py, takes at least two arguments. The first one is the *.csv file containing parsed benchmark results. Follows the call to the parsing script together with its arguments. In this case, we specify here the output log file to be stdout.log (see above). The -r . parameter tells the parsing script to look for the resource monitoring logs produced by the associated Python script (see Section 3.6) in the current working directory and the -o data.csv parameter defines the output file for the parsing result. We shall define the validation launch_command individually for each benchmark.

    executable: "../../../scripts/inject.py data.csv ../../../scripts/parse.sh
    -s stdout.log -r . -o data.csv"

At this stage, we can define actual benchmarks. They are structured in packs. We define four packs, i.e. in-core, out-of-core, multi-node parallel distributed and test benchmarks. Each pack then contains a list of benchmarks and each benchmark may be composed of one or more tasks having each one or more validation tasks.

3.5.1. In-core benchmarks

Packs:
  -
    pack_id: "in-core"
    description: "In-core benchmarks."
    Tests:

Firstly, we want to benchmark the SPIDO solver on dense linear systems resulting from BEM discretization for various unknown counts. Under template_instantiation there are four maps later expanded by gcvb to generate multiple variants of the benchmark, e.g. for various problem sizes.

The solver map provides the name of the solver being evaluated. This information is not uniformely present in the standard output of test_FEMBEM. Therefore, for the sake of clarity and simplicity, we manually specify the solver map in all benchmarks, not only when we evaluate multiple solvers within one benchmark.

scheduler holds the common job name prefix and the scheduling information used for the generation of the associated sbatch header file monobatch (see Section 3.2) as well as the wrapper script (see Section 3.11).

parallel specifies the parallel configuration. The parallel{np} key gives the number of MPI processes. The parallel{nt} key gives the number of threads per MPI process. The parallel{map}, parallel{rank} and parallel{bind} keys indicate the mapping, the ranking and the binding of the MPI processes, respectively.

The nbpts array defines the problem sizes to generate the benchmark for. Note that {scheduler[prefix]}, {scheduler[platform]}, {nbpts} and so on are the placeholders for the values defined in template_instantiation.

The Cartesian product of all the map tuplets under template_instantiation gives the total number of generated benchmarks. Here, we generate 1 $\times$ 1 $\times$ 3 = 3 variants grouped into a single job script with a time limit of 2 hours.

Note that &MONO and &PARALLEL_DEFAULT are Yaml aliases to the corresponding data allowing us to reuse them later in the document using *MONO and *PARALLEL_DEFAULT, respectively.

      -
        id: "spido-{nbpts}"
        template_files: &MONO [ "wrapper-in-core", "monobatch" ]
        template_instantiation:
          scheduler:
            - { prefix: "spido", platform: "plafrim", family: "miriel",
                nodes: 1, tasks: 24, time: "0-02:00:00" }
          parallel:
            - &PARALLEL_DEFAULT { np: 1, nt: 24, map: "node", rank: "node",
              bind: "none" }
          nbpts: [ 25000, 50000, 100000 ]

Follows the task corresponding to this benchmark. In this case, we only have to indicate the options of test_FEMBEM specific to this set of benchmarks.

        Tasks:
          -
            options: "-z -nbrhs 50 --bem -withmpf -nbpts {nbpts}"

For the validation phase we only need to specify an id and the corresponding launch_command. Here it consists of the validation executable obtained through the {@job_creation[va_executable]} placeholder followed by an option of the parsing script specific to this benchmark. The -K option generally allows us to include custom key-value pairs in the result output. Here, we use it to specify the solver being evaluated.

            Validations:
              -
                id: "va-spido-{nbpts}"
                launch_command: "{@job_creation[va_executable]} -K solver=spido"

In the next definition, we benchmark the HMAT solver on dense BEM systems and under similar conditions as SPIDO. Although, here we also vary the precision parameter $\epsilon$ (see the epsilon map) in template_instantiation.

      -
        id: "hmat-bem-{epsilon}-{nbpts}"
        template_files: *MONO
        template_instantiation:
          scheduler:
            - { prefix: "hmat-bem", platform: "plafrim", family: "miriel",
                nodes: 1, tasks: 24, time: "0-03:00:00" }
          parallel:
            - *PARALLEL_DEFAULT
          epsilon: [ "1e-3", "1e-6" ]
          nbpts: [ 25000, 50000, 100000, 200000, 400000, 800000, 1000000 ]
        Tasks:
          -
            options: "-z -nbrhs 50 --bem -withmpf --hmat --hmat-eps-assemb
            {epsilon} --hmat-eps-recompr {epsilon} -nbpts {nbpts}"
            Validations:
              -
                id: "va-hmat-bem-{epsilon}-{nbpts}"
                launch_command: "{@job_creation[va_executable]}
                -K solver=hmat-bem"

As of sparse solvers, we begin by benchmarking MUMPS on sparse linear systems arising from FEM discretization. Besides unknown count (see the nbpts map), we the value of the precision parameter $\epsilon$ when the low-rank compression mechanism is enabled (see the epsilon key in the compression map). The options key in the compression map defines the options for test_FEMBEM to enable or disable the compression.

      -
        id: "mumps-{compression[epsilon]}-{nbpts}"
        template_files: *MONO
        template_instantiation:
          scheduler:
            - { prefix: "mumps", platform: "plafrim", family: "miriel",
                nodes: 1, tasks: 24, time: "0-04:00:00" }
          parallel:
            - *PARALLEL_DEFAULT
          compression:
            - { epsilon: "1e-3", options: "--mumps-blr --mumps-blr-variant 1
            --mumps-blr-accuracy" }
            - { epsilon: "1e-6", options: "--mumps-blr --mumps-blr-variant 1
            --mumps-blr-accuracy" }
            - { epsilon:     "", options: "--no-mumps-blr" }
          nbpts: [ 250000, 500000, 1000000, 2000000, 4000000, 8000000,
                   10000000 ]
        Tasks:
          -
            options: "-z -nbrhs 50 --fem -withmpf {compression[options]}
            {compression[epsilon]} --mumps-verbose -nbpts {nbpts}"
            Validations:
              -
                id: "va-mumps-{compression[epsilon]}-{nbpts}"
                launch_command: "{@job_creation[va_executable]} -K solver=mumps"

For illustrative purposes, we define a couple of additional benchmarks of MUMPS on the same kind of linear systems. In this case, the problem size is fixed to 962 831 which matches the count of unknowns related to the FEM-discretized domain in case of a coupled FEM/BEM system 1,000,000 unknowns. Moreover, we consider both symmetric and non-symmetric systems (see the symmetry map). The precision parameter $\epsilon$ is fixed to 10^-3.

      -
        id: "additional-mumps-{symmetry[label]}"
        template_files: *MONO
        template_instantiation:
          scheduler:
            - { prefix: "additional-mumps", platform: "plafrim",
                family: "miriel", nodes: 1, tasks: 24, time: "0-00:20:00" }
          parallel:
            - *PARALLEL_DEFAULT
          symmetry:
            - { label: "symmetric", options: "" }
            - { label: "non-symmetric", options: "-nosym" }
        Tasks:
          -
            options: "-z -nbrhs 50 --fem -withmpf --mumps-blr
            --mumps-blr-variant 1 --mumps-blr-accuracy 1e-3 --mumps-verbose
            -nbpts 962831 {symmetry[options]}"
            Validations:
              -
                id: "va-additional-mumps-{symmetry[label]}"
                launch_command: "{@job_creation[va_executable]} -K solver=mumps"

For HMAT on FEM systems, we evaluate the impact of the same parameters as in the case of MUMPS. In the benchmark prefix, we specify that we are working with symmetric matrices as we later define further HMAT benchmarks involving non-symmetric matrices.

      -
        id: "hmat-fem-symmetric-{epsilon}-{nbpts}"
        template_files: *MONO
        template_instantiation:
          scheduler:
            - { prefix: "hmat-fem-symmetric", platform: "plafrim",
                family: "miriel", nodes: 1, tasks: 24, time: "0-02:00:00" }
          parallel:
            - *PARALLEL_DEFAULT
          epsilon: [ "1e-3", "1e-6" ]
          nbpts: [ 250000, 500000, 1000000, 2000000 ]
        Tasks:
          -
            options: "-z -nbrhs 50 --fem -withmpf --hmat --hmat-eps-assemb
            {epsilon} --hmat-eps-recompr {epsilon} -nbpts {nbpts}"
            Validations:
              -
                id: "va-hmat-fem-symmetric-{epsilon}-{nbpts}"
                launch_command: "{@job_creation[va_executable]}
                -K solver=hmat-fem"

We also want to evaluate the performance of the prototype implementation of the Nested Dissection (ND) ordering technique in HMAT for the solution of sparse systems (see the variant map). The current implementation limits the application of the algorithm to non-symmetric matrices. To be able to compare it to runs without using the Nested Dissection, we must redo the latter using non-symmetric matrices as well (see the variant map). Moreover, when the ND is enabled, the HMAT solver uses a significantly higher amount of memory. Consequently, cases counting more than 250,000 unknowns cause a memory overflow. Therefore, we benchmark the algorithm on smaller systems.

      -
        id: "hmat-fem-non-symmetric{variant[ND]}-{epsilon}-{nbpts}"
        template_files: *MONO
        template_instantiation:
          scheduler:
            - { prefix: "hmat-fem-non-symmetric", platform: "plafrim",
                family: "miriel", nodes: 1, tasks: 24, time: "0-01:00:00" }
          parallel:
            - *PARALLEL_DEFAULT
          variant:
            - { ND: "", options: "--hmat-lu -nosym" }
            - { ND: "-nd", options: "--hmat-nd" }
          epsilon: [ "1e-3", "1e-6" ]
          nbpts: [ 25000, 50000, 100000, 200000, 250000 ]
        Tasks:
          -
            options: "-z -nbrhs 50 --fem {variant[options]} -withmpf --hmat
            --hmat-eps-assemb {epsilon} --hmat-eps-recompr {epsilon}
            -nbpts {nbpts}"
            Validations:
              -
                id: "va-hmat-fem-non-symmetric{variant[ND]}-{epsilon}-{nbpts}"
                launch_command: "{@job_creation[va_executable]}
                -K solver=hmat-fem{variant[ND]}"

In the next part, we benchmark the solvers for coupled sparse/dense FEM/BEM systems. For solvers allowing data compression, we always set the precision parameter $\epsilon$ to 10^-3 depending on the goal of benchmark. However, as of the sparse part of the system, we consider both the case when it is compressed (by MUMPS) and when it is not (see the BLR and option keys in the sparse map).

At first, we consider the two-stage implementation scheme multi-solve using MUMPS as sparse solver and SPIDO as dense solver. We vary problem's unknown count (see the nbpts key in the job map) as well as the count of right-hand sides to be processed at once by MUMPS during the Schur complement computation (see the nbrhs key in the job map). The maps sparse and dense define the options for the sparse and the dense solver respectively. Finally, the track key in the job map allows us to enable execution timeline tracing for selected runs.

      -
        id: "multi-solve-{job[batch]}-{sparse[BLR]}{sparse[name]}-\
        {dense[name]}-{job[nbrhs]}-{job[nbpts]}"
        template_files: &COUPLED [ "wrapper-in-core", "coupled" ]
        template_instantiation:
          scheduler:
            - { prefix: "multi-solve", platform: "plafrim", family: "miriel",
                nodes: 1, tasks: 24, time: "1-03:00:00" }
          parallel:
            - *PARALLEL_DEFAULT
          job:
            # N = 1M
            - { nbpts: 1000000, nbrhs:   32, batch:  1, track: "" }
            - { nbpts: 1000000, nbrhs:   64, batch:  1, track: "" }
            - { nbpts: 1000000, nbrhs:  128, batch:  1, track: "" }
            - { nbpts: 1000000, nbrhs:  256, batch:  1,
                track: "--timeline-trace-calls" }
            - { nbpts: 1000000, nbrhs:  512, batch:  1, track: "" }
            # N = 2M
            - { nbpts: 2000000, nbrhs:   32, batch:  2, track: "" }
            - { nbpts: 2000000, nbrhs:   64, batch:  2, track: "" }
            - { nbpts: 2000000, nbrhs:  128, batch:  2, track: "" }
            - { nbpts: 2000000, nbrhs:  256, batch:  2, track: "" }
            - { nbpts: 2000000, nbrhs:  512, batch:  3, track: "" }
            - { nbpts: 2000000, nbrhs: 1024, batch:  3, track: "" }
            # N = 4M
            - { nbpts: 4000000, nbrhs:   32, batch:  4, track: "" }
            - { nbpts: 4000000, nbrhs:   64, batch:  4, track: "" }
            - { nbpts: 4000000, nbrhs:  128, batch:  5, track: "" }
            - { nbpts: 4000000, nbrhs:  256, batch:  6, track: "" }
            - { nbpts: 4000000, nbrhs:  512, batch:  6, track: "" }
            # N = 7M
            - { nbpts: 7000000, nbrhs:   32, batch:  7, track: "" }
            - { nbpts: 7000000, nbrhs:   64, batch:  8, track: "" }
            - { nbpts: 7000000, nbrhs:  128, batch:  9, track: "" }
            - { nbpts: 7000000, nbrhs:  256, batch: 10, track: "" }
          sparse:
            - { name: "mumps", BLR: "blr-", options: "--mumps-verbose
            --mumps-blr --mumps-blr-variant 1 --mumps-blr-accuracy 1e-3
            --mumps-multi-solve -nbrhsmumps" }
            - { name: "mumps", BLR: "", options: "--mumps-verbose --no-mumps-blr
            --mumps-multi-solve -nbrhsmumps" }
          dense:
            - { name: "spido", options: "" }
        Tasks:
          -
            options: "-d -nbrhs 50 --fembem -withmpf --coupled {sparse[options]}
            {job[nbrhs]} {dense[options]} -nbpts {job[nbpts]} {job[track]}"
            Validations:
              -
                id: "va-multi-solve-{job[batch]}-{sparse[BLR]}{sparse[name]}-\
                {dense[name]}-{job[nbrhs]}-{job[nbpts]}"
                launch_command: "{@job_creation[va_executable]}
                -K solver={sparse[name]}/{dense[name]}"

Due to differences in parameter specification, we define a separate benchmark for the two-stage multi-solve implementation relying on HMAT as dense solver. Here, nbrhs does not vary except for the 2,000,000 unknowns problem. For all the other runs, we set nbrhs to 256 which seems to be the optimal value based on the runs involding SPIDO as dense solver. However, when using HMAT, the parameter we are interested in is referred to as schur here (see the job map). The value of schur must be at least as high as nbrhs. In the 2,000,000 unknowns case, we want to demostrate the effect of too low values of schur. That is why we lower schur (together with nbrhs) to 32 in this case.

      -
        id: "multi-solve-{job[batch]}-{sparse[name]}-{dense[name]}-\
        {job[nbrhs]}-{job[schur]}-{job[nbpts]}"
        template_files: *COUPLED
        template_instantiation:
          scheduler:
            - { prefix: "multi-solve", platform: "plafrim", family: "miriel",
                nodes: 1, tasks: 24, time: "1-00:00:00" }
          parallel:
            - *PARALLEL_DEFAULT
          job:
            # N = 1M
            - { nbpts: 1000000, schur:  512, nbrhs: 256, batch:  1 }
            - { nbpts: 1000000, schur: 1024, nbrhs: 256, batch:  1 }
            - { nbpts: 1000000, schur: 2048, nbrhs: 256, batch:  1 }
            - { nbpts: 1000000, schur: 4096, nbrhs: 256, batch:  1 }
            # N = 2M
            - { nbpts: 2000000, schur:   32, nbrhs:  32, batch:  2 }
            - { nbpts: 2000000, schur:   64, nbrhs:  64, batch:  2 }
            - { nbpts: 2000000, schur:  128, nbrhs: 128, batch:  2 }
            - { nbpts: 2000000, schur:  256, nbrhs: 256, batch:  2 }
            - { nbpts: 2000000, schur:  512, nbrhs: 256, batch:  1 }
            - { nbpts: 2000000, schur: 1024, nbrhs: 256, batch:  1 }
            - { nbpts: 2000000, schur: 2048, nbrhs: 256, batch:  1 }
            - { nbpts: 2000000, schur: 4096, nbrhs: 256, batch:  1 }
            # N = 4M
            - { nbpts: 4000000, schur:  512, nbrhs: 256, batch:  3 }
            - { nbpts: 4000000, schur: 1024, nbrhs: 256, batch:  3 }
            - { nbpts: 4000000, schur: 2048, nbrhs: 256, batch:  4 }
            - { nbpts: 4000000, schur: 4096, nbrhs: 256, batch:  3 }
            # N = 7M
            - { nbpts: 7000000, schur:  512, nbrhs: 256, batch:  5 }
            - { nbpts: 7000000, schur: 1024, nbrhs: 256, batch:  6 }
            - { nbpts: 7000000, schur: 2048, nbrhs: 256, batch:  7 }
            - { nbpts: 7000000, schur: 4096, nbrhs: 256, batch:  8 }
            # N = 9M
            - { nbpts: 9000000, schur:  512, nbrhs: 256, batch:  9 }
            - { nbpts: 9000000, schur: 1024, nbrhs: 256, batch: 10 }
            - { nbpts: 9000000, schur: 2048, nbrhs: 256, batch: 11 }
            - { nbpts: 9000000, schur: 4096, nbrhs: 256, batch: 12 }
          sparse:
            - { name: "mumps", options: "--mumps-verbose --mumps-blr
            --mumps-blr-variant 1 --mumps-blr-accuracy 1e-3
            --mumps-multi-solve" }
          dense:
            - { name: "hmat", options: "--hmat --hmat-eps-assemb 1e-3
            --hmat-eps-recompr 1e-3 --coupled-mumps-hmat" }
        Tasks:
          -
            options: "-d -nbrhs 50 --fembem -withmpf --coupled {sparse[options]}
            -nbrhsmumps {job[nbrhs]} --mumps-nbcols-schur-hmat {job[schur]}
            {dense[options]} -nbpts {job[nbpts]}"
            Validations:
              -
                id: "va-multi-solve-{job[batch]}-{sparse[name]}-{dense[name]}-\
                {job[nbrhs]}-{job[schur]}-{job[nbpts]}"
                launch_command: "{@job_creation[va_executable]}
                -K solver={sparse[name]}/{dense[name]}"

Then, we benchmark the two-stage implementation scheme multi-factorization using MUMPS as sparse solver and either SPIDO or HMAT as dense solver. We vary problem's unknown count (see the nbpts key in the job map) as well as the size of row and column blocks of the Schur complement matrix during the computation (see the schur key in the job map). The nb key in job indicates the corresponding count of blocks per block row or block column of the Schur complement matrix. We use the IOblock key under job to set the size of disk blocks when using SPIDO as dense solver. This is to ensure that the value is a multiple of the chosen Schur complement block size. As we have only three combinations possible, we use here the common solver map to provide both sparse and dense solver specifications.

      -
        id: "multi-factorization-{job[batch]}-{solver[BLR]}{solver[name]}-\
        {job[nbpts]}-{job[schur]}"
        template_files: [ "wrapper-in-core", "coupled-simple" ]
        template_instantiation:
          scheduler:
            - { prefix: "multi-factorization", platform: "plafrim",
                family: "miriel", nodes: 1, tasks: 24, time: "0-05:00:00" }
          parallel:
            - *PARALLEL_DEFAULT
          job:
            # N = 1.0M, n_BEM = 37,169
            - { nbpts: 1000000, schur:  9296, IOblock: 2324, nb: 4, batch:  1 }
            - { nbpts: 1000000, schur: 12390, IOblock: 2478, nb: 3, batch:  1 }
            - { nbpts: 1000000, schur: 18585, IOblock: 3717, nb: 2, batch:  1 }
            - { nbpts: 1000000, schur: 37170, IOblock: 3717, nb: 1, batch:  1 }
            # N = 1.5M, n_BEM = 48,750
            - { nbpts: 1500000, schur: 12188, IOblock: 3047, nb: 4, batch:  3 }
            - { nbpts: 1500000, schur: 16250, IOblock: 3250, nb: 3, batch:  3 }
            - { nbpts: 1500000, schur: 24375, IOblock: 4875, nb: 2, batch:  2 }
            - { nbpts: 1500000, schur: 48750, IOblock: 4875, nb: 1, batch:  2 }
            # N = 2.0M, n_BEM = 58,910
            - { nbpts: 2000000, schur: 11784, IOblock: 1964, nb: 5, batch:  4 }
            - { nbpts: 2000000, schur: 14728, IOblock: 2104, nb: 4, batch:  4 }
            - { nbpts: 2000000, schur: 19638, IOblock: 3273, nb: 3, batch:  5 }
            - { nbpts: 2000000, schur: 29456, IOblock: 4208, nb: 2, batch:  6 }
            # N = 2.5M, n_BEM = 68,524
            - { nbpts: 2500000, schur: 11421, IOblock: 1269, nb: 6, batch:  7 }
            - { nbpts: 2500000, schur: 13705, IOblock: 2741, nb: 5, batch:  8 }
            - { nbpts: 2500000, schur: 17135, IOblock: 3427, nb: 4, batch:  9 }
            - { nbpts: 2500000, schur: 22842, IOblock: 3807, nb: 3, batch: 10 }
            - { nbpts: 2500000, schur: 34264, IOblock: 4283, nb: 2, batch: 11 }
          solver:
            - { name: "mumps-spido", BLR: "blr-", options: "--mumps-verbose
            --mumps-blr --mumps-blr-variant 1 --mumps-blr-accuracy 1e-3
            --mumps-multi-facto" }
            - { name: "mumps-spido", BLR: "", options: "--mumps-verbose
            --no-mumps-blr --mumps-multi-facto" }
            - { name: "mumps-hmat", BLR: "blr-", options: "--mumps-verbose
            --mumps-blr --mumps-blr-variant 1 --mumps-blr-accuracy 1e-3
            --mumps-multi-facto --hmat --hmat-eps-assemb 1e-3
            --hmat-eps-recompr 1e-3 --coupled-mumps-hmat" }
        Tasks:
          -
            options: "-d -nbrhs 50 --fembem -withmpf --coupled {solver[options]}
            --mumps-sizeschur {job[schur]} -diskblock {job[IOblock]} -nbpts
            {job[nbpts]}"
            Validations:
              -
                id: "va-multi-factorization-{job[batch]}-\
                {solver[BLR]}{solver[name]}-{job[nbpts]}-{job[schur]}"
                launch_command: "{@job_creation[va_executable]}
                -K solver={solver[name]},n_blocks={job[nb]}"

As of the partially sparse-aware single-stage scheme, we benchmark the implementation using HMAT on both sparse and dense parts. Here, we vary solely the problem's unknown count (see the nbpts map). The track key in the nbpts map allows us to enable execution timeline tracing for selected runs.

      -
        id: "full-hmat-{nbpts[count]}"
        template_files: *MONO
        template_instantiation:
          scheduler:
            - { prefix: "full-hmat", platform: "plafrim", family: "miriel",
                nodes: 1, tasks: 24, time: "3-00:00:00" }
          parallel:
            - *PARALLEL_DEFAULT
          nbpts:
            - { count: 1000000, track: "--timeline-trace-calls" }
            - { count: 2000000, track: "" }
            - { count: 4000000, track: "" }
            - { count: 7000000, track: "" }
            - { count: 9000000, track: "" }
        Tasks:
          -
            options: "-d -nbrhs 50 --fembem -withmpf --coupled --hmat
            --hmat-eps-assemb 1e-3 --hmat-eps-recompr 1e-3 -nbpts {nbpts[count]}
            {nbpts[track]}"
            Validations:
              -
                id: "va-full-hmat-{nbpts[count]}"
                launch_command: "{@job_creation[va_executable]}
                -K solver=hmat/hmat"

To evaluate the scalability of the solvers, we consider a large dense BEM or sparse FEM linear system (see the nbpts key in the solver map) and vary the parallel configuration. Note that for solvers allowing data compression, we set the precision parameter ε to 10^-3.

      -
        id: "scalability-{solver[name]}-{parallel[map]}-\
        {parallel[np]}x{parallel[nt]}-{solver[nbpts]}"
        template_files: &SCALA [ "wrapper-in-core", "scalability" ]
        template_instantiation:
          scheduler:
            - { prefix: "scalability", platform: "plafrim", family: "miriel",
                nodes: 1, tasks: 24, time: "0-20:00:00" }
          solver:
            - { name:    "spido", nbpts:  100000, options: "--bem" }
            - { name:    "mumps", nbpts: 2000000, options: "--fem
                --mumps-verbose --mumps-blr --mumps-blr-variant 1
                --mumps-blr-accuracy 1e-3" }
            - { name:    "mumps", nbpts: 4000000, options: "--fem
                --mumps-verbose --mumps-blr --mumps-blr-variant 1
                --mumps-blr-accuracy 1e-3" }

We consider four kinds of parallel configurations:

1 MPI process mapped and ranked by node and 1 to 24 OpenMP and MKL threads;
- mpirun configuration example: OMP_NUM_THREADS=24 MKL_NUM_THREADS=24 mpirun -np 1 -map-by ppr:1:node -rank-by node -bind-to none test_FEMBEM...

          parallel:
            - { np:  1, nt:  1, map:   "node", rank:   "node", bind:   "none" }
            - { np:  1, nt:  6, map:   "node", rank:   "node", bind:   "none" }
            - { np:  1, nt: 12, map:   "node", rank:   "node", bind:   "none" }
            - { np:  1, nt: 18, map:   "node", rank:   "node", bind:   "none" }
            - { np:  1, nt: 24, map:   "node", rank:   "node", bind:   "none" }

2 MPI processes mapped by, ranked by and bound to sockets and 1 to 12 OpenMP and MKL threads;
- mpirun configuration example: OMP_NUM_THREADS=12 MKL_NUM_THREADS=12 mpirun -np 2 -map-by ppr:1:socket -rank-by socket -bind-to socket test_FEMBEM...

            - { np:  2, nt:  1, map: "socket", rank: "socket", bind: "socket" }
            - { np:  2, nt:  2, map: "socket", rank: "socket", bind: "socket" }
            - { np:  2, nt:  4, map: "socket", rank: "socket", bind: "socket" }
            - { np:  2, nt:  8, map: "socket", rank: "socket", bind: "socket" }
            - { np:  2, nt: 12, map: "socket", rank: "socket", bind: "socket" }

4 MPI processes mapped by, ranked by and bound to numa sub-nodes and 1 to 6 OpenMP and MKL threads;
- mpirun configuration example: OMP_NUM_THREADS=6 MKL_NUM_THREADS=6 mpirun -np 4 -map-by ppr:1:numa -rank-by numa -bind-to numa test_FEMBEM...

            - { np:  4, nt:  1, map:   "numa", rank:   "numa", bind:   "numa" }
            - { np:  4, nt:  2, map:   "numa", rank:   "numa", bind:   "numa" }
            - { np:  4, nt:  4, map:   "numa", rank:   "numa", bind:   "numa" }
            - { np:  4, nt:  6, map:   "numa", rank:   "numa", bind:   "numa" }

1 to 24 MPI processes mapped by, ranked by and bound to cores and 1 OpenMP and MKL thread.
- mpirun configuration example: OMP_NUM_THREADS=1 MKL_NUM_THREADS=1 mpirun -np 24 -map-by ppr:1:core -rank-by core -bind-to core test_FEMBEM...

            - { np:  1, nt:  1, map:   "core", rank:   "core", bind:   "core" }
            - { np:  6, nt:  1, map:   "core", rank:   "core", bind:   "core" }
            - { np: 12, nt:  1, map:   "core", rank:   "core", bind:   "core" }
            - { np: 18, nt:  1, map:   "core", rank:   "core", bind:   "core" }
            - { np: 24, nt:  1, map:   "core", rank:   "core", bind:   "core" }

Finally, we have to set solver options (see the options value) and configure the validation phase.

        Tasks:
          -
            options: "-z -nbrhs 50 -withmpf {solver[options]} -nbpts
            {solver[nbpts]}"
            Validations:
              -
                id: "va-scalability-{solver[name]}-{parallel[map]}-\
                {parallel[np]}x{parallel[nt]}-{solver[nbpts]}"
                launch_command: "{@job_creation[va_executable]}
                -K solver={solver[name]}"

We define another two separate scalability benchmarks for HMAT as the MPI parallelization of the latter is not adapted for execution on one single node and consequently not studied here.

      -
        id: "scalability-{solver[name]}-{parallel[map]}-\
        {parallel[np]}x{parallel[nt]}-{solver[nbpts]}"
        template_files: *SCALA
        template_instantiation:
          scheduler:
            - { prefix: "scalability", platform: "plafrim", family: "miriel",
                nodes: 1, tasks: 24, time: "0-14:00:00" }
          solver:
            - { name: "hmat-bem", nbpts:  100000, options: "--bem --hmat
                --hmat-eps-assemb 1e-3 --hmat-eps-recompr 1e-3" }
            - { name: "hmat-bem", nbpts: 1000000, options: "--bem --hmat
                --hmat-eps-assemb 1e-3 --hmat-eps-recompr 1e-3" }
            - { name: "hmat-fem", nbpts: 2000000, options: "--fem --hmat
                --hmat-eps-assemb 1e-3 --hmat-eps-recompr 1e-3 --no-hmat-nd" }

We consider only one kind of parallel configuration here: 1 MPI process mapped and ranked by node and 1 to 24 OpenMP and MKL threads and StarPU workers.

          parallel:
            - { np:  1, nt:  1, map:   "node", rank:   "node", bind:   "none" }
            - { np:  1, nt:  6, map:   "node", rank:   "node", bind:   "none" }
            - { np:  1, nt: 12, map:   "node", rank:   "node", bind:   "none" }
            - { np:  1, nt: 18, map:   "node", rank:   "node", bind:   "none" }
            - { np:  1, nt: 24, map:   "node", rank:   "node", bind:   "none" }
        Tasks:
          -
            options: "-z -nbrhs 50 -withmpf {solver[options]} -nbpts
            {solver[nbpts]}"
            Validations:
              -
                id: "va-scalability-{solver[name]}-{parallel[map]}-\
                {parallel[np]}x{parallel[nt]}-{solver[nbpts]}"
                launch_command: "{@job_creation[va_executable]}
                -K solver={solver[name]}"

By the means of an another HMAT scalability benchmark, we run the solver on one BEM and one FEM system in three different parallel configurations and generate an FXT execution trace for each one of them. Here, we consider smaller systems (25,000 unknowns for BEM and 50,000 unknowns for FEM) on a lower count of cores (4 cores maximum) than in the previous benchmark. The goal is to show the difference in performance between an exclusively StarPU-parallelized execution and an execution involving multiple MPI processes on one single node.

      -
        id: "fxt-scalability-{solver[name]}-{parallel[map]}-\
        {parallel[np]}x{parallel[nt]}-{solver[nbpts]}"
        template_files: [ "wrapper-fxt", "scalability" ]
        template_instantiation:
          scheduler:
            - { prefix: "fxt-scalability", platform: "plafrim",
                family: "miriel", nodes: 1, tasks: 4, time: "0-00:30:00" }
          solver:
            - { name: "hmat-bem", nbpts:  25000, options: "--bem --hmat
                --hmat-eps-assemb 1e-3 --hmat-eps-recompr 1e-3" }
            - { name: "hmat-fem", nbpts:  50000, options: "--fem --hmat
                --hmat-eps-assemb 1e-3 --hmat-eps-recompr 1e-3 --no-hmat-nd" }

We consider three parallel configurations here:

1 MPI process mapped and ranked by node and 4 OpenMP and MKL threads and StarPU workers;
2 MPI processes mapped, ranked and bound to sockets and 2 OpenMP and MKL threads and StarPU workers;
4 MPI processes mapped, ranked and bound to cores.

          parallel:
            - { np:  1, nt:  4, map:   "node", rank:   "node", bind:   "none" }
            - { np:  2, nt:  2, map: "socket", rank: "socket", bind: "socket" }
            - { np:  4, nt:  1, map:   "core", rank:   "core", bind:   "core" }

In this case, we have an extra validation task to perform the analysis of produced FXT traces by the StarVZ library of R (see Section 4.2).

        Tasks:
          -
            options: "-z -nbrhs 50 -withmpf {solver[options]} -nbpts
            {solver[nbpts]}"
            Validations:
              -
                id: "va-1-fxt-scalability-{solver[name]}-{parallel[map]}-\
                {parallel[np]}x{parallel[nt]}-{solver[nbpts]}"
                launch_command: "{@job_creation[va_executable]}
                -K solver={solver[name]}"
              -
                id: "va-2-fxt-scalability-{solver[name]}-{parallel[map]}-\
                {parallel[np]}x{parallel[nt]}-{solver[nbpts]}"
                launch_command: "phase1-workflow.sh $(pwd)/ &&
                phase2-workflow.R $(pwd)/
                $GUIX_ENVIRONMENT/site-library/starvz/etc/default.yaml"

3.5.2. `energy_scope` benchmarks

To experiment energy profiling of the solvers we use the energy_scope tool (see Section 3.7). In the first place, we want to measure the potential overhead energy_scope may have on the performance of test_FEMBEM. We consider a coupled FEM/BEM system with a fixed size of 1,000,000 unknowns. We use MUMPS as sparse and either SPIDO or HMAT as dense solver while applying the multi-solve two-stage implementation scheme. Low-rank compression is enabled and the precision parameter ε is set to 10^-3. The number of right-hand sides in the system is set to 150 for a more representative final solve phase. We execute the benchmarks both with and without energy_scope.

  -
    pack_id: "energy_scope"
    description: "energy_scope benchmarks."
    Tests:
      -
        id: "es-overhead-{es[switch]}-{es[frequency]}-{es[interval]}-\
        {es[profile]}-{dense[solver]}-{iteration}"
        template_files: ["wrapper-in-core", "monobatch", "es"]
        template_instantiation:
          scheduler:
            - { prefix: "es-overhead", platform: "plafrim", family: "miriel",
                nodes: 1, tasks: 24, time: "0-12:00:00" }
          parallel:
            - *PARALLEL_DEFAULT
          dense: &ES_DENSE
            - { solver: "spido", options: "" }
            - { solver: "hmat", options: "--mumps-nbcols-schur-hmat 512
            --hmat --hmat-eps-assemb 1e-3 --hmat-eps-recompr 1e-3
            --coupled-mumps-hmat" }
          es:
            - { switch:  "on", frequency: "1Hz", interval: 1000,
                idle_time: 0.99, profile: "e",
                executable: "~/energy_scope/energy_scope_slurm.sh mpirun
                --report-bindings" }
            - { switch:  "on", frequency: "2Hz", interval: 500,
                idle_time: 0.49, profile: "et",
                executable: "~/energy_scope/energy_scope_slurm.sh mpirun
                --report-bindings" }
            - { switch: "off", frequency: "1Hz", interval: 1000,
                idle_time: 0.99, profile: "e", executable: "mpirun" }
          iteration: [ 1, 2, 3 ]
        Tasks:
          -
            executable: "mkdir -p wrk && mv es_config.json wrk/es_config.json &&
            export ENERGY_SCOPE_TRACES_PATH=$(pwd) && {es[executable]}"
            options: "-d -nbrhs 50 --fembem -withmpf --coupled --mumps-verbose
            --mumps-blr --mumps-blr-variant 1 --mumps-blr-accuracy 1e-3
            --mumps-multi-solve -nbrhsmumps 256 -nbpts 1000000 {dense[options]}
            --energy-scope-tags"
            Validations:
              -
                id: "va-es-overhead-{es[switch]}-{es[frequency]}-\
                {es[interval]}-{es[profile]}-{dense[solver]}-{iteration}"
                launch_command: "{@job_creation[va_executable]}
                -K solver=mumps/{dense[solver]},energy_scope={es[switch]}
                -K es_frequency={es[frequency]},es_profile={es[profile]}"
              -
                id: "va-es-overhead-eprofile-{es[switch]}-{es[frequency]}-\
                {es[interval]}-{es[profile]}-{dense[solver]}-{iteration}"
                launch_command: "export ENERGY_SCOPE_TRACES_PATH=$(pwd) &&
                ~/energy_scope/energy_scope_run_analysis_all.sh"

The next stage cosists of creating the energetic profile of test_FEMBEM when solving purely sparse FEM and purely dense BEM systems, respectively. In this case, we consider a FEM and a BEM system of fixed size proportional to corresponding coupled FEM/BEM test cases with $1,000,000 \leq N \leq 5,000,000$. We use MUMPS, both with and without compression, to solve the FEM systems and SPIDO or HMAT to solve the BEM systems. When the compression is enabled, the precision parameter ε is set to 10^-3. The number of right-hand sides is set to 150 to ensure a more representative solve phase.

      -
        id: "es-fem-{job[batch]}-{nbpts}"
        template_files: ["wrapper-in-core", "polybatch", "es"]
        template_instantiation:
          scheduler:
            - { prefix: "es-fem", platform: "plafrim", family: "miriel",
                nodes: 1, tasks: 24, time: "0-08:00:00" }
          parallel:
            - *PARALLEL_DEFAULT
          es: &ES_DEFAULT
            - { interval: 1000, idle_time: 0.99, profile: "e",
                executable: "~/energy_scope/energy_scope_slurm.sh mpirun
                --report-bindings" }
          job:
            - { batch: 1, options: "--no-mumps-blr" }
            - { batch: 2, options: "--mumps-blr --mumps-blr-variant 1
                --mumps-blr-accuracy 1e-3" }
          nbpts: [ 962831, 2922756, 4891562, 6864384 ]
        Tasks:
          -
            executable: "mkdir -p wrk && mv es_config.json wrk/es_config.json &&
            export ENERGY_SCOPE_TRACES_PATH=$(pwd) && {es[executable]}"
            options: "-d -nbrhs 150 -nbrhsmumps 150 -withmpf --fem
            --mumps-verbose {job[options]} -nbpts {nbpts} --energy-scope-tags"
            Validations:
              -
                id: "va-es-fem-{job[batch]}-{nbpts}"
                launch_command: "{@job_creation[va_executable]} -K solver=mumps"
              -
                id: "va-es-fem-eprofile-{job[batch]}-{nbpts}"
                launch_command: "export ENERGY_SCOPE_TRACES_PATH=$(pwd) &&
                ~/energy_scope/energy_scope_run_analysis_all.sh"
      -
        id: "es-bem-{job[batch]}-{nbpts}"
        template_files: ["wrapper-in-core", "polybatch", "es"]
        template_instantiation:
          scheduler:
            - { prefix: "es-bem", platform: "plafrim", family: "miriel",
                nodes: 1, tasks: 24, time: "0-08:00:00" }
          parallel:
            - *PARALLEL_DEFAULT
          es: *ES_DEFAULT
          job:
            - { batch: 1, solver: "spido", options: "" }
            - { batch: 2, solver:  "hmat", options: "--hmat
                --hmat-eps-assemb 1e-3 --hmat-eps-recompr 1e-3" }
          nbpts: [ 37169, 77244, 108438, 135616 ]
        Tasks:
          -
            executable: "mkdir -p wrk && mv es_config.json wrk/es_config.json &&
            export ENERGY_SCOPE_TRACES_PATH=$(pwd) && {es[executable]}"
            options: "-d -nbrhs 150 -withmpf --bem {job[options]} -nbpts {nbpts}
            --energy-scope-tags"
            Validations:
              -
                id: "va-es-bem-{job[batch]}-{nbpts}"
                launch_command: "{@job_creation[va_executable]}
                -K solver={job[solver]}"
              -
                id: "va-es-bem-eprofile-{job[batch]}-{nbpts}"
                launch_command: "export ENERGY_SCOPE_TRACES_PATH=$(pwd) &&
                ~/energy_scope/energy_scope_run_analysis_all.sh"

For the energetic profiling of test_FEMBEM when solving coupled systems, we consider both the multi-solve and the multi-factorization two-stage implementation schemes. We profile the schemes a system with $N$ = 1,000,000. The number of right-hand sides is set to 150 for a more representative final solve phase. We measure the impact of low-rank compression on the dense part but the compression is always enabled for the sparse part. When applicable, the precision parameter ε is set to 10^-3.

      -
        id: "es-coupled-single-node-{job[scheme]}-{job[solver]}-{job[config]}"
        template_files: ["wrapper-in-core", "monobatch", "es"]
        template_instantiation:
          scheduler:
            - { prefix: "es-coupled-single-node", platform: "plafrim",
                family: "miriel", nodes: 1, tasks: 24, time: "0-03:00:00" }
          parallel:
            - *PARALLEL_DEFAULT
          es: *ES_DEFAULT
          job:
            - { scheme: "multi-solve", solver: "spido", config: "opti",
                options: "--mumps-multi-solve -nbrhsmumps 256" }
            - { scheme: "multi-solve", solver: "hmat", config: "ssize",
                options: "--mumps-multi-solve -nbrhsmumps 256
                --mumps-nbcols-schur-hmat 256 --hmat --hmat-eps-assemb 1e-3
                --hmat-eps-recompr 1e-3 --coupled-mumps-hmat" }
            - { scheme: "multi-solve", solver: "hmat", config: "opti",
                options: "--mumps-multi-solve -nbrhsmumps 256
                --mumps-nbcols-schur-hmat 1024 --hmat --hmat-eps-assemb 1e-3
                --hmat-eps-recompr 1e-3 --coupled-mumps-hmat" }
            - { scheme: "multi-facto", solver: "spido", config: "opti",
                options: "--mumps-multi-facto --mumps-sizeschur 12390" }
            - { scheme: "multi-facto", solver: "hmat", config: "opti",
                options: "--mumps-multi-facto --mumps-sizeschur 12390 --hmat
                --hmat-eps-assemb 1e-3 --hmat-eps-recompr 1e-3
                --coupled-mumps-hmat" }
        Tasks:
          -
            executable: "mkdir -p wrk && mv es_config.json wrk/es_config.json &&
            export ENERGY_SCOPE_TRACES_PATH=$(pwd) && {es[executable]}"
            options: "-d -nbrhs 50 -nbpts 1000000 -withmpf --fembem --coupled
            --mumps-verbose --mumps-blr --mumps-blr-variant 1
            --mumps-blr-accuracy 1e-3 {job[options]} --energy-scope-tags"
            launch_command: &ES_LC_TIMELINE "chmod +x wrapper.sh &&
            {@job_creation[executable]} {@job_creation[nprocs]}
            $SINGULARITY_EXEC $SINGULARITY_IMAGE bash ./wrapper.sh
            likwid-perfctr -g FLOPS_AVX -g MEM -o likwid-%r.csv -O -t 500ms
            python3 ../../../scripts/rss.py test_FEMBEM -radius 2 -height 4
            {@job_creation[options]} 2>&1 | tee stdout.log | tee
            slurm-$SLURM_JOBID.out"
            Validations:
              -
                id: "va-es-coupled-single-node-{job[scheme]}-{job[solver]}-\
                {job[config]}"
                launch_command: "{@job_creation[va_executable]}
                -K solver=mumps/{job[solver]}"
              -
                id: "va-es-coupled-single-node-eprofile-{job[scheme]}-\
                {job[solver]}-{job[config]}"
                launch_command: "export ENERGY_SCOPE_TRACES_PATH=$(pwd) &&
                ~/energy_scope/energy_scope_run_analysis_all.sh"
      -
        id: "es-coupled-gc-single-node-{job[batch]}-{job[scheme]}-\
        {job[solver]}-{job[nbpts]}"
        template_files: ["wrapper-in-core", "polybatch", "es"]
        template_instantiation:
          scheduler:
            - { prefix: "es-coupled-gc-single-node", platform: "plafrim",
                family: "miriel", nodes: 1, tasks: 24, time: "0-18:00:00" }
          parallel:
            - *PARALLEL_DEFAULT
          es: *ES_DEFAULT
          job:
            # Multi-solve
            # MUMPS/SPIDO
            - { batch: 1, scheme: "multi-solve", solver: "spido",
                nbpts: 1000000, options: "--mumps-multi-solve -nbrhsmumps 256" }
            - { batch: 1, scheme: "multi-solve", solver: "spido",
                nbpts: 3000000, options: "--mumps-multi-solve -nbrhsmumps 256" }
            - { batch: 1, scheme: "multi-solve", solver: "spido",
                nbpts: 5000000, options: "--mumps-multi-solve -nbrhsmumps 256" }
            # MUMPS/HMAT
            - { batch: 1, scheme: "multi-solve", solver: "hmat",
                nbpts: 1000000, options: "--mumps-multi-solve -nbrhsmumps 256
                --mumps-nbcols-schur-hmat 1024 --hmat --hmat-eps-assemb 1e-3
                --hmat-eps-recompr 1e-3 --coupled-mumps-hmat" }
            - { batch: 1, scheme: "multi-solve", solver: "hmat",
                nbpts: 3000000, options: "--mumps-multi-solve -nbrhsmumps 256
                --mumps-nbcols-schur-hmat 1024 --hmat --hmat-eps-assemb 1e-3
                --hmat-eps-recompr 1e-3 --coupled-mumps-hmat" }
            - { batch: 1, scheme: "multi-solve", solver: "hmat",
                nbpts: 5000000, options: "--mumps-multi-solve -nbrhsmumps 256
                --mumps-nbcols-schur-hmat 1024 --hmat --hmat-eps-assemb 1e-3
                --hmat-eps-recompr 1e-3 --coupled-mumps-hmat" }
            # Multi-factorization
            # MUMPS/SPIDO
            - { batch: 2, scheme: "multi-facto", solver: "spido",
                nbpts: 1000000, options: "--mumps-multi-facto
                --mumps-sizeschur 12390" }
            - { batch: 2, scheme: "multi-facto", solver: "spido",
                nbpts: 1500000, options: "--mumps-multi-facto
                --mumps-sizeschur 16250" }
            - { batch: 2, scheme: "multi-facto", solver: "spido",
                nbpts: 2000000, options: "--mumps-multi-facto
                --mumps-sizeschur 19683" }
            # MUMPS/HMAT
            - { batch: 2, scheme: "multi-facto", solver: "hmat",
                nbpts: 1000000, options: "--mumps-multi-facto
                --mumps-sizeschur 12390 --hmat --hmat-eps-assemb 1e-3
                --hmat-eps-recompr 1e-3 --coupled-mumps-hmat" }
            - { batch: 2, scheme: "multi-facto", solver: "hmat",
                nbpts: 1500000, options: "--mumps-multi-facto
                --mumps-sizeschur 16250 --hmat --hmat-eps-assemb 1e-3
                --hmat-eps-recompr 1e-3 --coupled-mumps-hmat" }
            - { batch: 2, scheme: "multi-facto", solver: "hmat",
                nbpts: 2000000, options: "--mumps-multi-facto
                --mumps-sizeschur 19683 --hmat --hmat-eps-assemb 1e-3
                --hmat-eps-recompr 1e-3 --coupled-mumps-hmat" }
        Tasks:
          -
            executable: "mkdir -p wrk && mv es_config.json wrk/es_config.json &&
            export ENERGY_SCOPE_TRACES_PATH=$(pwd) && {es[executable]}"
            options: "-d -nbrhs 50 -nbpts {job[nbpts]} -withmpf --fembem
            --coupled --mumps-verbose --mumps-blr --mumps-blr-variant 1
            --mumps-blr-accuracy 1e-3 {job[options]} --energy-scope-tags"
            Validations:
              -
                id: "va-es-coupled-gc-single-node-{job[batch]}-{job[scheme]}-\
                {job[solver]}-{job[nbpts]}"
                launch_command: "{@job_creation[va_executable]}
                -K solver=mumps/{job[solver]}"
              -
                id: "va-es-coupled-gc-single-node-eprofile-{job[batch]}-\
                {job[scheme]}-{job[solver]}-{job[nbpts]}"
                launch_command: "export ENERGY_SCOPE_TRACES_PATH=$(pwd) &&
                ~/energy_scope/energy_scope_run_analysis_all.sh"
      -
        id: "es-coupled-multi-node-{job[batch]}-{job[scheme]}-{job[solver]}"
        template_files: ["wrapper-in-core-distributed", "polybatch-distributed",
          "es"]
        template_instantiation:
          scheduler:
            - { prefix: "es-coupled-multi-node", platform: "plafrim",
                family: "miriel", constraint: "&omnipath", nodes: 4, nt: 1,
                np: 1, time: "0-03:00:00" }
          parallel:
            - *PARALLEL_DEFAULT
          es: *ES_DEFAULT
          job:
            - { batch: 1, scheme: "multi-solve", solver: "spido",
                options: "--mumps-multi-solve -nbrhsmumps 256" }
            - { batch: 1, scheme: "multi-solve", solver: "hmat",
                options: "--mumps-multi-solve -nbrhsmumps 256
                --mumps-nbcols-schur-hmat 1024 --hmat --hmat-eps-assemb 1e-3
                --hmat-eps-recompr 1e-3 --coupled-mumps-hmat" }
            - { batch: 2, scheme: "multi-facto", solver: "spido",
                options: "--mumps-multi-facto --mumps-sizeschur 19638" }
            - { batch: 2, scheme: "multi-facto", solver: "hmat",
                options: "--mumps-multi-facto --mumps-sizeschur 19638 --hmat
                --hmat-eps-assemb 1e-3 --hmat-eps-recompr 1e-3
                --coupled-mumps-hmat" }
        Tasks:
          -
            nprocs: "-np {scheduler[nodes]} -map-by
            ppr:{scheduler[np]}:{parallel[map]} -rank-by {parallel[rank]}
            -bind-to {parallel[bind]}"
            executable: "mkdir -p wrk && mv es_config.json wrk/es_config.json &&
            export ENERGY_SCOPE_TRACES_PATH=$(pwd) && {es[executable]}"
            options: "-d -nbrhs 50 -nbpts 2000000 -withmpf --fembem --coupled
            --mumps-verbose --mumps-blr --mumps-blr-variant 1
            --mumps-blr-accuracy 1e-3 {job[options]} --energy-scope-tags"
            launch_command: *ES_LC_TIMELINE
            Validations:
              -
                id: "va-es-coupled-multi-node-{job[batch]}-{job[scheme]}-\
                {job[solver]}"
                launch_command: "{@job_creation[va_executable]}
                -K solver=mumps/{job[solver]}"
              -
                id: "va-es-coupled-multi-node-eprofile-{job[batch]}-\
                {job[scheme]}-{job[solver]}"
                launch_command: "export ENERGY_SCOPE_TRACES_PATH=$(pwd) &&
                ~/energy_scope/energy_scope_run_analysis_all.sh"
      -
        id: "es-coupled-debug-multi-facto-{parallel[np]}"
        template_files: ["wrapper-in-core", "monobatch", "es"]
        template_instantiation:
          scheduler:
            - { prefix: "es-coupled-debug-multi-facto", platform: "plafrim",
                family: "miriel", nodes: 1, tasks: 24, time: "0-12:00:00" }
          parallel:
            - { np:  4, nt:  6, map:   "numa", rank:   "numa", bind:   "numa" }
            - { np:  2, nt: 12, map: "socket", rank: "socket", bind: "socket" }
            - { np:  1, nt: 24, map:   "node", rank:   "node", bind:   "none" }
          es: *ES_DEFAULT
          job:
            - { options: "--mumps-multi-facto --mumps-sizeschur 19638 --hmat
                --hmat-eps-assemb 1e-3 --hmat-eps-recompr 1e-3
                --coupled-mumps-hmat" }
        Tasks:
          -
            executable: "mkdir -p wrk && mv es_config.json wrk/es_config.json &&
            export ENERGY_SCOPE_TRACES_PATH=$(pwd) && {es[executable]}"
            options: "-d -nbrhs 50 -nbpts 2000000 -withmpf --fembem --coupled
            --mumps-verbose --mumps-blr --mumps-blr-variant 1
            --mumps-blr-accuracy 1e-3 {job[options]} --energy-scope-tags"
            launch_command: "chmod +x wrapper.sh && {@job_creation[executable]}
            {@job_creation[nprocs]} $SINGULARITY_EXEC $SINGULARITY_IMAGE bash
            ./wrapper.sh likwid-perfctr -g FLOPS_AVX -o likwid-%r.csv -O -t 1s
            python3 ../../../scripts/rss.py test_FEMBEM -radius 2 -height 4
            {@job_creation[options]} 2>&1 | tee stdout.log | tee
            slurm-$SLURM_JOBID.out"
            Validations:
              -
                id: "va-es-coupled-debug-multi-facto-{parallel[np]}"
                launch_command: "{@job_creation[va_executable]}
                -K solver=mumps/hmat"
              -
                id: "va-es-coupled-debug-multi-facto-eprofile-{parallel[np]}"
                launch_command: "export ENERGY_SCOPE_TRACES_PATH=$(pwd) &&
                ~/energy_scope/energy_scope_run_analysis_all.sh"
      -
        id: "es-industrial-{job[batch]}-{job[scheme]}-{job[solver]}"
        template_files: ["wrapper-in-core-distributed", "polybatch-distributed",
          "es"]
        template_instantiation:
          scheduler:
            - { prefix: "es-industrial", platform: "plafrim", family: "miriel",
                constraint: "&omnipath", nodes: 4, nt: 24, np: 1,
                time: "1-00:00:00" }
          parallel:
            - *PARALLEL_DEFAULT
          es: *ES_DEFAULT
          job:
            - { batch: 1, scheme: "multi-solve", solver: "spido",
                options: "--mumps-multi-solve -nbrhsmumps 256" }
            - { batch: 2, scheme: "multi-solve", solver: "hmat",
                options: "--mumps-multi-solve -nbrhsmumps 256
                --mumps-nbcols-schur-hmat 1024 --hmat --hmat-eps-assemb 1e-3
                --hmat-eps-recompr 1e-3 --coupled-mumps-hmat" }
            - { batch: 3, scheme: "multi-facto", solver: "spido",
                options: "--mumps-multi-facto --mumps-sizeschur 16885" }
            - { batch: 4, scheme: "multi-facto", solver: "hmat",
                options: "--mumps-multi-facto --mumps-sizeschur 16885 --hmat
                --hmat-eps-assemb 1e-3 --hmat-eps-recompr 1e-3
                --coupled-mumps-hmat" }
        Tasks:
          -
            nprocs: "-np {scheduler[nodes]} -map-by
            ppr:{scheduler[np]}:{parallel[map]} -rank-by {parallel[rank]}
            -bind-to {parallel[bind]}"
            executable: "mkdir -p wrk && mv es_config.json wrk/es_config.json &&
            export ENERGY_SCOPE_TRACES_PATH=$(pwd) && {es[executable]}"
            launch_command: "chmod +x wrapper.sh && {@job_creation[executable]}
            {@job_creation[nprocs]} $SINGULARITY_EXEC $SINGULARITY_IMAGE bash
            ./wrapper.sh python3 ../../../scripts/rss.py actipole -p01
            EPI_FAN_rwd_isolated_FEM_BEM.p01 -cprec -potential
            {@job_creation[options]} 2>&1 | tee stdout.log | tee
            slurm-$SLURM_JOBID.out"
            options: "-withmpf --fembem --coupled --mumps-verbose --mumps-blr
            --mumps-blr-variant 1 --mumps-blr-accuracy 1e-3 {job[options]}
            --energy-scope-tags --putenv HMAT_DISABLE_OOC=1
            --putenv MPF_SPIDO_INCORE=1 --mumpsic"
            Validations:
              -
                id: "va-es-industrial-{job[batch]}-{job[scheme]}-\
                {job[solver]}"
                launch_command: "{@job_creation[va_executable]}
                -K solver=mumps/{job[solver]}"
              -
                id: "va-es-industrial-eprofile-{job[batch]}-\
                {job[scheme]}-{job[solver]}"
                launch_command: "export ENERGY_SCOPE_TRACES_PATH=$(pwd) &&
                ~/energy_scope/energy_scope_run_analysis_all.sh"

3.5.3. Out-of-core benchmarks

  -
    pack_id: "ooc"
    description: "Out-of-core benchmarks."
    default_values:
      task:
        launch_command: "chmod +x wrapper.sh && {@job_creation[executable]}
        {@job_creation[nprocs]} $SINGULARITY_EXEC $SINGULARITY_IMAGE bash
        ./wrapper.sh python3 ../../../scripts/rss.py --with-hdd test_FEMBEM
        -radius 2 -height 4 {@job_creation[options]} 2>&1 | tee stdout.log | tee
        slurm-$SLURM_JOBID.out"
    Tests:

The goal is to analyze the impact of out-of-core computation on the two-stage implementation schemes multi-solve and multi-factorization using MUMPS as sparse solver and SPIDO or HMAT as dense solver. We vary problem's unknown count (see the nbpts key in the job map). For multi-solve, we vary also the count of right-hand sides to be processed at once by MUMPS during the Schur complement computation (see the nbrhs key in the job map). The maps sparse and dense defines the options for the sparse and the dense solver respectively.

      -
        id: "ooc-multi-solve-{job[batch]}-{sparse[name]}-{dense[name]}-\
        {dense[ooc]}-{job[nbrhs]}-{job[nbpts]}"
        template_files: &COUPLED_OOC [ "wrapper-ooc", "coupled-ooc" ]
        template_instantiation:
          scheduler:
            - { prefix: "ooc-multi-solve", platform: "plafrim",
                family: "miriel", constraint: "", exclude: "", nodes: 1,
                tasks: 24, time: "3-00:00:00" }
          parallel:
            - *PARALLEL_DEFAULT
          job:
            # N = 1M
            - { nbpts:  1000000, nbrhs: 128, batch:  1 }
            - { nbpts:  1000000, nbrhs: 256, batch:  1 }
            - { nbpts:  1000000, nbrhs: 512, batch:  1 }
            # N = 3M
            - { nbpts:  3000000, nbrhs: 128, batch:  1 }
            - { nbpts:  3000000, nbrhs: 256, batch:  1 }
            - { nbpts:  3000000, nbrhs: 512, batch:  1 }
            # N = 7M
            - { nbpts:  7000000, nbrhs: 128, batch:  2 }
            - { nbpts:  7000000, nbrhs: 256, batch:  3 }
            - { nbpts:  7000000, nbrhs: 512, batch:  4 }
            # N = 9M
            - { nbpts:  9000000, nbrhs: 128, batch:  5 }
            - { nbpts:  9000000, nbrhs: 256, batch:  6 }
            - { nbpts:  9000000, nbrhs: 512, batch:  7 }
            # N = 11M
            - { nbpts: 11000000, nbrhs: 128, batch:  8 }
            - { nbpts: 11000000, nbrhs: 256, batch:  9 }
            - { nbpts: 11000000, nbrhs: 512, batch: 10 }
          sparse:
            - { name: "mumps", options: "-mumpsooc --mumps-verbose --mumps-blr
            --mumps-blr-variant 1 --mumps-blr-accuracy 1e-3 --mumps-multi-solve
            -nbrhsmumps" }
          dense:
            - { name: "spido", ooc: 1 }
            - { name: "spido", ooc: 0 }
        Tasks:
          -
            options: "-d -nbrhs 50 --fembem -withmpf --coupled {sparse[options]}
            {job[nbrhs]} -nbpts {job[nbpts]}"
            Validations:
              -
                id: "va-ooc-multi-solve-{job[batch]}-{sparse[name]}-\
                {dense[name]}-{dense[ooc]}-{job[nbrhs]}-{job[nbpts]}"
                launch_command: "{@job_creation[va_executable]}
                -K dense_ooc={dense[ooc]},solver={sparse[name]}/{dense[name]}"
      -
        id: "ooc-multi-solve-{job[batch]}-{sparse[name]}-{dense[name]}-\
        {dense[ooc]}-{job[nbrhs]}-{job[schur]}-{job[nbpts]}"
        template_files: *COUPLED_OOC
        template_instantiation:
          scheduler:
            - { prefix: "ooc-multi-solve", platform: "plafrim",
                family: "miriel", constraint: "", exclude: "", nodes: 1,
                tasks: 24, time: "3-00:00:00" }
          parallel:
            - *PARALLEL_DEFAULT
          job:
            # N = 1M
            - { nbpts:  1000000, schur:  256, nbrhs: 256, batch:  1 }
            - { nbpts:  1000000, schur:  512, nbrhs: 256, batch:  1 }
            - { nbpts:  1000000, schur: 1024, nbrhs: 256, batch:  1 }
            # N = 3M
            - { nbpts:  3000000, schur:  256, nbrhs: 256, batch:  1 }
            - { nbpts:  3000000, schur:  512, nbrhs: 256, batch:  1 }
            - { nbpts:  3000000, schur: 1024, nbrhs: 256, batch:  1 }
            # N = 7M
            - { nbpts:  7000000, schur:  256, nbrhs: 256, batch:  2 }
            - { nbpts:  7000000, schur:  512, nbrhs: 256, batch:  3 }
            - { nbpts:  7000000, schur: 1024, nbrhs: 256, batch:  4 }
            # N = 9M
            - { nbpts:  9000000, schur:  256, nbrhs: 256, batch:  5 }
            - { nbpts:  9000000, schur:  512, nbrhs: 256, batch:  6 }
            - { nbpts:  9000000, schur: 1024, nbrhs: 256, batch:  7 }
            # N = 11M
            - { nbpts: 11000000, schur:  256, nbrhs: 256, batch:  8 }
            - { nbpts: 11000000, schur:  512, nbrhs: 256, batch:  9 }
            - { nbpts: 11000000, schur: 1024, nbrhs: 256, batch: 10 }
          sparse:
            - { name: "mumps", options: "-mumpsooc --mumps-verbose --mumps-blr
            --mumps-blr-variant 1 --mumps-blr-accuracy 1e-3 --mumps-multi-solve
            -nbrhsmumps" }
          dense:
            - { name: "hmat", ooc: 1}
            - { name: "hmat", ooc: 0}
        Tasks:
          -
            options: "-d -nbrhs 50 --fembem -withmpf --coupled {sparse[options]}
            {job[nbrhs]} --mumps-nbcols-schur-hmat {job[schur]} --hmat
            --hmat-eps-assemb 1e-3 --hmat-eps-recompr 1e-3 --coupled-mumps-hmat
            -nbpts {job[nbpts]}"
            Validations:
              -
                id: "va-ooc-multi-solve-{job[batch]}-{sparse[name]}-\
                {dense[name]}-{dense[ooc]}-{job[nbrhs]}-{job[schur]}-\
                {job[nbpts]}"
                launch_command: "{@job_creation[va_executable]}
                -K dense_ooc={dense[ooc]},solver={sparse[name]}/{dense[name]}"

For multi-factorization, we vary the size of blocks of the Schur complement matrix during the computation (see the schur key in the job map) and use the IOblock key under job to set the size of a disk block when using SPIDO as dense solver. This is to ensure that the value is a multiple of the chosen Schur complement block size.

      -
        id: "ooc-multi-factorization-{job[batch]}-{sparse[name]}-{dense[name]}-\
        {dense[ooc]}-{job[nbpts]}-{job[schur]}"
        template_files: *COUPLED_OOC
        template_instantiation:
          scheduler:
            - { prefix: "ooc-multi-factorization", platform: "plafrim",
                family: "miriel", constraint: "", exclude: "", nodes: 1,
                tasks: 24, time: "3-00:00:00" }
          parallel:
            - *PARALLEL_DEFAULT
          job:
            # N = 1.00M, n_BEM = 37,169
            - { nbpts:  1000000, schur:  3379, nb: 11, batch:  1 }
            - { nbpts:  1000000, schur:  5310, nb:  7, batch:  1 }
            - { nbpts:  1000000, schur: 12390, nb:  3, batch:  1 }
            # N = 1.25M, n_BEM = 53,841
            - { nbpts:  1750000, schur:  4895, nb: 11, batch:  2 }
            - { nbpts:  1750000, schur:  7692, nb:  7, batch:  2 }
            - { nbpts:  1750000, schur: 17943, nb:  3, batch:  3 }
            # N = 2.50M, n_BEM = 68,524
            - { nbpts:  2500000, schur:  6230, nb: 11, batch:  3 }
            - { nbpts:  2500000, schur:  9790, nb:  7, batch:  4 }
            - { nbpts:  2500000, schur: 22842, nb:  3, batch:  4 }
          sparse:
            - { name: "mumps", options: "-mumpsooc --mumps-verbose --mumps-blr
            --mumps-blr-variant 1 --mumps-blr-accuracy 1e-3
            --mumps-multi-facto" }
          dense:
            - { name: "spido", ooc: 1, options: "" }
            - { name: "spido", ooc: 0, options: "" }
            - { name: "hmat", ooc: 1, options: "--hmat --hmat-eps-assemb 1e-3
            --hmat-eps-recompr 1e-3 --coupled-mumps-hmat" }
            - { name: "hmat", ooc: 0, options: "--hmat --hmat-eps-assemb 1e-3
            --hmat-eps-recompr 1e-3 --coupled-mumps-hmat" }
        Tasks:
          -
            options: "-d -nbrhs 50 --fembem -withmpf --coupled {sparse[options]}
            {dense[options]} --mumps-sizeschur {job[schur]} -nbpts {job[nbpts]}"
            Validations:
              -
                id: "va-ooc-multi-factorization-{job[batch]}-{sparse[name]}-\
                {dense[name]}-{dense[ooc]}-{job[nbpts]}-{job[schur]}"
                launch_command: "{@job_creation[va_executable]}
                -K dense_ooc={dense[ooc]},solver={sparse[name]}/{dense[name]}
                -K n_blocks={job[nb]}"

3.5.4. Multi-node parallel distributed benchmarks

  -
    pack_id: "distributed"
    description: "Multi-node parallel distributed benchmarks."
    Tests:

The goal here is to analyze the impact of computation in a distributed memory environment on the two-stage implementation schemes multi-solve and multi-factorization using MUMPS as sparse solver and SPIDO or HMAT as dense solver. We vary problem's unknown count (see the nbpts key in the job map). For multi-solve, we vary also count of right-hand sides to be processed at once by MUMPS during the Schur complement computation (see the nbrhs key in the job map). For multi-factorization, we vary the size of blocks of the Schur complement matrix during the computation (see the schur key in the job map) and use the IOblock key under job to set the size of a disk block when using SPIDO as dense solver. This is to ensure that the value is a multiple of the chosen Schur complement block size. The maps sparse and dense defines the options for the sparse and the dense solver respectively.

      -
        id: "distributed-multi-solve-{sparse[name]}-{dense[name]}-\
        {scheduler[nodes]}-{job[nbpts]}"
        template_files: &DISTRIBUTED [ "wrapper-in-core-distributed",
          "coupled-distributed" ]
        template_instantiation:
          scheduler:
            - { prefix: "distributed-multi-solve", platform: "plafrim",
                family: "miriel", constraint: "&omnipath", nodes: 2, nt: 24,
                np: 1, time: "1-12:00:00" }
            - { prefix: "distributed-multi-solve", platform: "plafrim",
                family: "miriel", constraint: "&omnipath", nodes: 4, nt: 24,
                np: 1, time: "1-12:00:00" }
            - { prefix: "distributed-multi-solve", platform: "plafrim",
                family: "miriel", constraint: "&omnipath", nodes: 8, nt: 24,
                np: 1, time: "1-12:00:00" }
          parallel:
            - *PARALLEL_DEFAULT
          job:
            - { nbpts: 1000000 }
            - { nbpts: 2000000 }
            - { nbpts: 4000000 }
            - { nbpts: 7000000 }
            - { nbpts: 9000000 }
          sparse:
            - { name: "mumps", options: "--mumps-verbose --mumps-blr
            --mumps-blr-variant 1 --mumps-blr-accuracy 1e-3 --mumps-multi-solve
            -nbrhsmumps 256" }
          dense:
            - { name: "spido", options: ""}
            - { name: "hmat", options: "--hmat --hmat-eps-assemb 1e-3
            --hmat-eps-recompr 1e-3 --coupled-mumps-hmat 1024"}
        Tasks:
          -
            nprocs: "-np {scheduler[nodes]} -map-by
            ppr:{scheduler[np]}:{parallel[map]} -rank-by {parallel[rank]}
            -bind-to {parallel[bind]}"
            options: "-d -nbrhs 50 --fembem -withmpf --coupled {sparse[options]}
            {dense[options]} -nbpts {job[nbpts]}"
            Validations:
              -
                id: "va-distributed-multi-solve-{sparse[name]}-{dense[name]}-\
                {scheduler[nodes]}-{job[nbpts]}"
                launch_command: "{@job_creation[va_executable]}
                -K solver={sparse[name]}/{dense[name]}"
      -
        id: "distributed-multi-factorization-{sparse[name]}-{dense[name]}-\
        {scheduler[nodes]}-{job[nbpts]}"
        template_files: *DISTRIBUTED
        template_instantiation:
          scheduler:
            - { prefix: "distributed-multi-factorization", platform: "plafrim",
                family: "miriel", constraint: "&omnipath", nodes: 2, nt: 24,
                np: 1, time: "1-12:00:00" }
            - { prefix: "distributed-multi-factorization", platform: "plafrim",
                family: "miriel", constraint: "&omnipath", nodes: 4, nt: 24,
                np: 1, time: "1-12:00:00" }
            - { prefix: "distributed-multi-factorization", platform: "plafrim",
                family: "miriel", constraint: "&omnipath", nodes: 8, nt: 24,
                np: 1, time: "1-12:00:00" }
          parallel:
            - *PARALLEL_DEFAULT
          job:
            - { nbpts: 1000000, schur: 37170, IOblock: 3717, nb: 1 }
            - { nbpts: 1500000, schur: 24375, IOblock: 4875, nb: 2 }
            - { nbpts: 2000000, schur: 29456, IOblock: 4208, nb: 2 }
            - { nbpts: 2500000, schur: 13705, IOblock: 2741, nb: 5 }
          sparse:
            - { name: "mumps", options: "--mumps-verbose --mumps-blr
            --mumps-blr-variant 1 --mumps-blr-accuracy 1e-3
            --mumps-multi-facto" }
          dense:
            - { name: "spido", options: "" }
            - { name: "hmat", options: "--hmat --hmat-eps-assemb 1e-3
            --hmat-eps-recompr 1e-3 --coupled-mumps-hmat" }
        Tasks:
          -
            nprocs: "-np {scheduler[nodes]} -map-by
            ppr:{scheduler[np]}:{parallel[map]} -rank-by {parallel[rank]}
            -bind-to {parallel[bind]}"
            options: "-d -nbrhs 50 --fembem -withmpf --coupled {sparse[options]}
            {dense[options]} --mumps-sizeschur {job[schur]}
            -diskblock {job[IOblock]} -nbpts {job[nbpts]}"
            Validations:
              -
                id: "va-distributed-multi-factorization-{sparse[name]}-\
                {dense[name]}-{scheduler[nodes]}-{job[nbpts]}"
                launch_command: "{@job_creation[va_executable]}
                -K solver={sparse[name]}/{dense[name]},n_blocks={job[nb]}"

3.5.5. Test benchmarks

  -
    pack_id: "test"
    description: "Benchmarks for testing the benchmark framework."
    Tests:

In this section, we define a small number of testing benchmarks allowing us to check the correct operation of the benchmark framework itself for each category of benchmarks, i.e. in-core, out-of-core and parallel distributed.

      -
        id: "test-ic"
        template_files: *MONO
        template_instantiation:
          scheduler:
            - { prefix: "test-ic", platform: "plafrim", family: "miriel",
                nodes: 1, tasks: 24, time: "0-00:15:00" }
          parallel:
            - *PARALLEL_DEFAULT
        Tasks:
          -
            options: "-z --fembem -withmpf --coupled --mumps-verbose --mumps-blr
            --mumps-blr-variant 1 --mumps-blr-accuracy 1e-3
            --mumps-multi-solve -nbpts 200000"
            Validations:
              -
                id: "va-test-ic"
                launch_command: "{@job_creation[va_executable]}"
          -
            options: "-z -nbrhs 50 --fembem -withmpf --coupled --mumps-verbose
            --mumps-blr --mumps-blr-variant 1 --mumps-blr-accuracy 1e-3
            --mumps-multi-solve -nbpts 200000 --mumps-nbcols-schur-hmat 512
            --hmat --hmat-eps-assemb 1e-3 --hmat-eps-recompr 1e-3
            --coupled-mumps-hmat"
            Validations:
              -
                id: "va-test-ic"
                launch_command: "{@job_creation[va_executable]}"
      -
        id: "test-ooc"
        template_files: ["wrapper-ooc", "monobatch"]
        template_instantiation:
          scheduler:
            - { prefix: "test-ooc", platform: "plafrim", family: "miriel",
                nodes: 1, tasks: 24, time: "0-00:15:00" }
          parallel:
            - *PARALLEL_DEFAULT
          dense:
            - { ooc: 1 }
        Tasks:
          -
            options: "-z -nbrhs 50 --fembem -withmpf --coupled --mumps-verbose
            --mumps-blr --mumps-blr-variant 1 --mumps-blr-accuracy 1e-3
            -mumpsooc --mumps-multi-solve -nbpts 200000"
            Validations:
              -
                id: "va-test-ooc"
                launch_command: "{@job_creation[va_executable]}"
          -
            options: "-z --fembem -withmpf --coupled --mumps-verbose --mumps-blr
            --mumps-blr-variant 1 --mumps-blr-accuracy 1e-3 -mumpsooc
            --mumps-multi-solve -nbpts 200000 --mumps-nbcols-schur-hmat 512
            --hmat --hmat-eps-assemb 1e-3 --hmat-eps-recompr 1e-3
            --coupled-mumps-hmat"
            Validations:
              -
                id: "va-test-ooc"
                launch_command: "{@job_creation[va_executable]}"
      -
        id: "test-distributed"
        template_files: ["wrapper-in-core-distributed", "monobatch"]
        template_instantiation:
          scheduler:
            - { prefix: "test-distributed", platform: "plafrim",
                family: "miriel", nodes: 2, tasks: 24, nt: 24, np: 1,
                time: "0-00:15:00" }
          parallel:
            - *PARALLEL_DEFAULT
        Tasks:
          -
            nprocs: "-np 2 -map-by ppr:1:node -rank-by node -bind-to none"
            options: "-z -nbrhs 50 --fembem -withmpf --coupled --mumps-verbose
            --mumps-blr --mumps-blr-variant 1 --mumps-blr-accuracy 1e-3
            --mumps-multi-solve -nbpts 400000"
            Validations:
              -
                id: "va-test-distributed"
                launch_command: "{@job_creation[va_executable]}"
          -
            nprocs: "-np 2 -map-by ppr:1:node -rank-by node -bind-to none"
            options: "-z --fembem -withmpf --coupled --mumps-verbose --mumps-blr
            --mumps-blr-variant 1 --mumps-blr-accuracy 1e-3
            --mumps-multi-solve -nbpts 400000 --mumps-nbcols-schur-hmat 512
            --hmat --hmat-eps-assemb 1e-3 --hmat-eps-recompr 1e-3
            --coupled-mumps-hmat"
            Validations:
              -
                id: "va-test-distributed"
                launch_command: "{@job_creation[va_executable]}"

See the complete definition file benchmarks.yaml ⁸.

3.6. Resource monitoring

To continuously measure the amount of storage (RAM and optionally hard drive) resources consumed during each benchmark run, we launch test_FEMBEM through the Python script rss.py producing as many memory and disk usage logs as there are MPI processes created during benchmark execution. The peak values are included at the very end of these log files. Note that measures are taken regularly each second and the output units are mibibytes [MiB].

import subprocess
import sys
import threading
import time
import os
import psutil
from datetime import datetime

After importing necessary Python modules, we begin by defining a help message function that can be triggered with the -h option.

def help():
    print("Monitor memory and disk usage of PROGRAM.\n",
          "Usage: ", sys.argv[0], " [options] [PROGRAM]\n\n",
          "Options:\n  -h    Show this help message.",
          file = sys.stderr, sep = "")

We must check if at least one argument has been passed to the script. It takes as arguments the executable the resource consumption of which shall be measured together with the arguments of the latter. Otherwise, one can specify the -h option to make the script print a help message on how to use it. Also, the --with-hdd switch allows one to enable the hard drive consumption monitoring.

if len(sys.argv) < 2:
    print("Error: Arguments mismatch!\n", file = sys.stderr)
    help()
    sys.exit(1)

if sys.argv[1] == "-h":
    help()
    sys.exit(0)

with_hdd = False
if sys.argv[1] == "--with-hdd":
    with_hdd = True

Follow some function definitions. At first, a function to determine the MPI rank of the currently monitored process.

def mpi_rank():
    rank = 0
    for env_name in ['MPI_RANKID', 'OMPI_COMM_WORLD_RANK']:
        try:
            rank = int(os.environ[env_name])
            break
        except:
            continue
    return rank

We also need a function to determine and format the current date and time. Indeed, we include a timestamp in the output log files everytime we record a measure. This helps us to synchronize the records with other monitoring tools we are using, i.e. energy_scope (see Section 3.7).

def timestamp():
    now = datetime.now()
    return now.strftime("%Y/%m/%dT%H:%M:%S")

The script gather real (resident) memory, e.g. the amout of data stored in the random access memory (RAM). On Linux, the values can be obtained from the /proc filesystem.

Memory usage statistics of a particular process are stored in /proc/<pid>/statm where <pid> is the process identifier (PID). In this file, the field VmRSS holds the amount of real memory used by the process at instant $t$. See the associated function below.

def rss(pid):
    with open("/proc/%d/statm" % pid, "r") as f:
        line = f.readline().split();
        VmRSS = int(line[1]) * 4
        return timestamp(), (VmRSS / 1024.)

To be able to monitor disk space used during benchmarks, we assign to each run its specific temporary folder all the files used by test_FEMEBM are put into. Finally, we monitor the evolution of the size of TMPDIR in our Python script using the du Linux command.

def hdd(tmpdir):
    du = subprocess.Popen(["du", "-s", tmpdir], stdout = subprocess.PIPE)
    HddNow = int(du.stdout.read().decode("utf-8").split("\t")[0])
    return timestamp(), (HddNow / 1024.)

Before entering into the monitoring loop, we need to:

initialize variables to store peak values;

VmHWM = int(0)
HddPeak = int(0)

get the path of the temporary folder from the TMPDIR environment variable. If the variable is not set, /tmp is returned to allow the script to detect that no separate temporary folder has been created and exit with failure to prevent taking potentially incorrect measures;

tmpdir = os.getenv('TMPDIR', '/tmp')

if tmpdir == '/tmp':
    print('Error: Temporary files were not redirected!', file = sys.stderr)
    sys.exit(1)

launch the program to monitor and get the rank of the currently monitored process.

myargs = sys.argv[1:]
if with_hdd == True:
    myargs = sys.argv[2:]

p = subprocess.Popen(myargs)
rank = mpi_rank()

At this point, we begin the monitoring loop. The latter breaks when the monitored process exits. Note that the measured values are initially stored in memory. They are dumped into the corresponding log files at the end of the benchmark execution. This is to prevent frequent disk accesses that may impact the execution time of the monitored benchmark.

rss_log = []
hdd_log = []

while p.returncode == None:

The following tasks are performed to gather necessary results:

get current memory usage, update the peak memory usage if necessary and store the corresponding value converted from pages to kibibytes to the log;
```
    VmRSS = rss(p.pid)

    if VmRSS[1] > VmHWM:
        VmHWM = VmRSS[1]

    rss_log.append(VmRSS)
```

collect disk usage statistics, update the peak disk space usage if necessary and store the corresponding value converted from bytes to kibibytes to the log;

    if with_hdd == True:
        HddNow = hdd(tmpdir)

        if HddNow[1] > HddPeak:
            HddPeak = HddNow[1]

        hdd_log.append(HddNow)

sleep for one second and repeat.
```
    time.sleep(1)
    p.poll()
```

Eventually, before exiting, we append usage peaks to the logs and dump the latter to disk.

print("Writing storage resource usage logs...")

m = open("rss-%d.log" % rank, "w")

for entry in rss_log:
    m.write("%s\t%g\n" % (entry[0], entry[1]))

m.write("%g\n" % VmHWM)
m.flush()
m.close()

if with_hdd == True:
    d = open("hdd-%d.log" % rank, "w")   

    for entry in hdd_log:
        d.write("%s\t%g\n" % (entry[0], entry[1]))

    d.write("%g\n" % HddPeak)
    d.flush()
    d.close()

sys.exit(p.returncode)

See the complete source file rss.py ⁹.

3.7. Energy consumption monitoring

energy_scope is an energy consumption monitoring tool. To configure what and how should be monitored, it uses a configuration in JSON format. Among other parameters, it allows the choose the acquisition frequency and profile.

In this section, we define a gcvb template (see Section 3.1) of the configuration file.

{{
    "parameter": {{
        "comment":"configuration file template",
        "version":"2022-01",
        "owner":"energy_scope",
        "cpu_info":"/proc/cpuinfo",
        "nvidia_gpu_info":"/proc/driver/nvidia/gpus",
        "stop_file_prefix":"/tmp/es_read_rt_stop_flag_",
        "synchro_file_saved_prefix":"wrk/energy_scope_tmp_file_",
        "synchro_started_prefix":"wrk/energy_scope_tmp_started_",
        "msr_cmd":"rdmsr",
        "opt_enode":"node",
        "opt_estat":"shell",
        "opt_profile":"none",
        "opt_send":"none",
        "analyseafteracquisition":"no",
        "eprofile_period(ms)":{es[interval]},
        "default_read_rt_profile":"{es[profile]}"
    }},
    "data": {{
        "intel": {{
            "generic" : {{
                "read_rt_idle_time":{es[idle_time]},
                "registers": {{
                    "0x010": {{"when":"ae", "pct":"package"}},
                    "0x0ce": {{"when":"b", "pct":"package"}},
                    "0x0e7": {{"when":"N", "pct":"thread"}},
                    "0x0e8": {{"when":"a", "pct":"thread"}},
                    "0x198": {{"when":"N", "pct":"core"}},
                    "0x19c": {{"when":"at", "pct":"core"}},
                    "0x1A2": {{"when":"b", "pct":"package"}},
                    "0x606": {{"when":"b", "pct":"package"}},
                    "0x610": {{"when":"b", "pct":"package"}},
                    "0x611": {{"when":"ae", "pct":"package"}},
                    "0x613": {{"when":"a", "pct":"package"}},
                    "0x614": {{"when":"b", "pct":"package"}},
                    "0x619": {{"when":"ae", "pct":"package"}},
                    "0x620": {{"when":"b", "pct":"package"}},
                    "0x621": {{"when":"a", "pct":"package"}},
                    "0x639": {{"when":"ae", "pct":"package"}},
                    "0x641": {{"when":"ae", "pct":"package"}},
                    "0x64e": {{"when":"a", "pct":"thread"}},
                    "0x770": {{"when":"b", "pct":"package"}},
                    "0x771": {{"when":"b", "pct":"thread"}},
                    "0x774": {{"when":"b", "pct":"thread"}}
                }}
            }}
        }},
        "amd":{{
            "generic" : {{
                "read_rt_idle_time":{es[idle_time]},
                "registers": {{
                    "0x010": {{"when":"ae", "pct":"package"}},
                    "0x0e7": {{"when":"a", "pct":"thread"}},
                    "0x0e8": {{"when":"a", "pct":"thread"}},
                    "0xC0000104": {{"when":"b", "pct":"thread"}},
                    "0xC0010064": {{"when":"b", "pct":"core"}},
                    "0xC0010299": {{"when":"b", "pct":"package"}},
                    "0xC001029A": {{"when":"a", "pct":"core"}},
                    "0xC001029B": {{"when":"ae", "pct":"package"}}
                }}
            }}
        }}
    }}
}}

See the complete energy_scope configuration file template es_config.json ¹⁰.

3.8. Result parsing

The Shell script parse.sh parses data from a test_FEMBEM benchmark and store the result to a comma-separated values .csv file.

We begin with a help message function that can be triggered using the -h option.

function help() {
  echo -n "Parse data from a test_FEMBEM run and store the result to a " >&2
  echo "comma-separated (*.csv) file." >&2
  echo "Usage: ./parse.sh [options]" >&2
  echo >&2
  echo "Options:" >&2
  echo "  -h                             Show this help message." >&2
  echo -n "  -s FILE                        Read the standard output of " >&2
  echo "test_FEMBEM from FILE rather than from the standard input." >&2
  echo -n "  -r PATH                        Read memory and disk usage " >&2
  echo "logs provided by rss.py from the directory at PATH." >&2
  echo -n "  -l FILE=FUNCTION[,FUNCTION]    Read the trace log of the " >&2
  echo -n "test_FEMBEM run from FILE and parse execution time and " >&2
  echo "iteration count of one or more FUNCTION(s)."
  echo -n "  -K KEY=VALUE[,KEY=VALUE]       Parse one or more additional " >&2
  echo "KEY=VALUE pairs to be added to the output (*.csv) file." >&2
  echo "  -H                             Do not print header on output." >&2
  echo -n "  -o FILE                        Specify the file to store the " >&2
  echo "output of the parsing to." >&2
}

We use a generic error message function. The error message to print is expected to be the first argument to the function. If not present, the function displays a generic error message.

function error() {
  if test $# -lt 1;
  then
    echo "An unknown error occurred!" >&2
  else
    echo "Error: $1" >&2
  fi
}

There should be at least one argument provided on the command line.

if test $# -lt 1;
then
  help
  exit 1
fi

STDOUT holds the path to the input file containing test_FEMBEM standard output.

STDOUT=""

RSS contains path and name pattern of input files containing the log of the resource monitoring script (see Section 3.6).

RSS=""

TRACE is the input file containing the trace call log produced by test_FEMBEM.

TRACE=""

OUTPUT is the path to the output file to store the parsed values to.

OUTPUT=""

FUNCTIONS contains names of functions to parse information about from the trace call log produced by test_FEMBEM.

FUNCTIONS=""

CUSTOM_KV holds custom key-value pairs to be included in the output *.csv file.

CUSTOM_KV=""

DISABLE_HEADING toggles heading in the output *.csv file.

DISABLE_HEADING=0

At this point, we parse provided arguments and check the validity of option values.

while getopts ":hs:r:l:K:o:H" option;
do
  case $option in

Standard output of the test_FEMBEM executable run can be passed to the script either through its standard input or in a file specified using the -s option.

    s)
      STDOUT=$OPTARG

      if test ! -f $STDOUT;
      then
        error "'$STDOUT' is not a valid file!"
        exit 1
      fi
      ;;

When the -r option is provided, the script shall also parse resource monitoring logs.

    r)
      RSS=$OPTARG

      if test ! -d $RSS;
      then
        error "'$RSS' is not a valid directory!"
        exit 1
      fi
      ;;

When the -l option is provided, the script shall read the trace call log of test_FEMBEM from the file passed as argument and parse execution time, iteration count and floating-point operations (Flops) of one or more functions.

    l)
      TRACE=$(echo $OPTARG | cut -d '=' -f 1)
      FUNCTIONS=$(echo $OPTARG | cut -d '=' -f 2 | sed 's/,/\t/g')

      if test ! -f $TRACE;
      then
        error "'$TRACE' is not a valid file!"
        exit 1
      fi
      ;;

The -K option allows to add one or more KEY=VALUE pairs into the output *.csv file.

    K)
      if test "$CUSTOM_KV" == "";
      then
        CUSTOM_KV="$OPTARG"
      else
        CUSTOM_KV="$CUSTOM_KV,$OPTARG"
      fi
      ;;

The -H option toggles printing of the header in the output *.csv file.

    H)
      DISABLE_HEADING=1
      ;;

Using the -o option, one can specify the name of the output *.csv file.

    o)
      OUTPUT=$OPTARG
      ;;

Eventually, we take care of unknown options or missing arguments and raise an error if any.

    \?) # Unknown option
      error "Arguments mismatch! Invalid option '-$OPTARG'."
      echo
      help
      exit 1
      ;;
    :) # Missing option argument
      error "Arguments mismatch! Option '-$OPTARG' expects an argument!"
      echo
      help
      exit 1
      ;;
    h | *)
      help
      exit 0
      ;;
  esac
done

If the standard output of test_FEMBEM is not provided in a file, we try to read its content from the standard input of the script.

if test "$STDOUT" == "";
then
  rm -rf .input

  while read line;
  do
    echo $line >> .input
  done

  STDOUT=.input
fi

If no standard output from test_FEMBEM to parse is found, we abort the script and raise an error.

if test $(wc --bytes $STDOUT | cut -d' ' -f 1) -lt 1;
then
  error "'$STDOUT' contains no 'test_FEMBEM' standard output to parse!"
  exit 1
fi

The treatment starts by separating custom key-value pairs, if any, by a tabulation character, so they can be looped over using a for loop.

CUSTOM_KV=$(echo $CUSTOM_KV | sed 's/,/\t/g')

We print the header line to the output file if it was not explicitly disabled using the -H option.

if test $DISABLE_HEADING -ne 1;
then
  echo -n "processes,by,mapping,ranking,binding,omp_thread_num," > $OUTPUT
  echo -n "mkl_thread_num,hmat_ncpu,mpf_max_memory,nbpts,radius," >> $OUTPUT
  echo -n "height,nbrhs,step_mesh,nbem,nbpts_lambda,lambda," >> $OUTPUT
  echo -n "thread_block_size,proc_block_size,disk_block_size," >> $OUTPUT
  echo -n "tps_cpu_facto_mpf,tps_cpu_solve_mpf,error,symmetry," >> $OUTPUT
  echo -n "h_assembly_accuracy,h_recompression_accuracy," >> $OUTPUT
  echo -n "assembled_size_mb,tps_cpu_facto,factorized_size_mb," >> $OUTPUT
  echo -n "tps_cpu_solve,mumps_blr,mumps_blr_accuracy," >> $OUTPUT
  echo -n "mumps_blr_variant,coupled_method,coupled_nbrhs," >> $OUTPUT
  echo -n "size_schur,cols_schur,sparse_ooc,platform,node_family," >> $OUTPUT
  echo -n "singularity_container,slurm_jobid,system_kind,es_cpu," >> $OUTPUT
  echo -n "es_dram,es_duration,rm_peak,hdd_peak" >> $OUTPUT

The common header entries are followed by custom key-value pairs and additional details about selected functions acquired from the trace call logs produced by test_FEMBEM, if any.

  for kv in $CUSTOM_KV;
  do
    KEY=$(echo $kv | cut -d '=' -f 1)
    echo -n ",$KEY" >> $OUTPUT
  done

  for f in $FUNCTIONS;
  do
    echo -n ","$f"_exec_time,"$f"_nb_execs,"$f"_flops" >> $OUTPUT
  done

  echo >> $OUTPUT
fi

Next, we parse all the interesting information from the provided standard output of test_FEMBEM. The regular expressions and parameters of grep and sed utilities used in the script are chosen based on the output format of test_FEMBEM messages (see Section 5.1 in Appendix).

processes=$(cat $STDOUT | grep "<PERFTESTS> MPI process count" | uniq | \
                  cut -d '=' -f 2 | sed 's/[^0-9]//g')
mapping=$(cat $STDOUT | grep "<PERFTESTS> MPI processes mapped by" | uniq | \
                  cut -d '=' -f 2 | sed 's/[^a-z]//g')
ranking=$(cat $STDOUT | grep "<PERFTESTS> MPI processes ranked by" | uniq |\
                  cut -d '=' -f 2 | sed 's/[^a-z]//g')
binding=$(cat $STDOUT | grep "<PERFTESTS> MPI processes bound to" | uniq | \
                  cut -d '=' -f 2 | sed 's/[^a-z]//g')
omp_thread_num=$(cat $STDOUT | grep "OpenMP thread number" | cut -d '=' -f 2 | \
                  sed 's/[^0-9]//g')
mkl_thread_num=$(cat $STDOUT | grep "MKL thread number" | cut -d '=' -f 2 | \
                    sed 's/[^0-9]//g')
hmat_ncpu=$(cat $STDOUT | grep "HMAT_NCPU" | cut -d '=' -f 4 | \
              sed 's/[^0-9]//g')
mpf_max_memory=$(cat $STDOUT | grep "MPF_MAX_MEMORY" | cut -d '=' -f 2 | \
                  sed 's/[^0-9]//g')
nbpts=$(cat $STDOUT | grep "<PERFTESTS> NbPts =" | cut -d '=' -f 2 | \
          sed 's/[^0-9]//g')
radius=$(cat $STDOUT | grep "Reading radius" | cut -d '=' -f 2 | \
          sed 's/[^0-9.]//g')
height=$(cat $STDOUT | grep "Reading height" | cut -d '=' -f 2 | \
          sed 's/[^0-9.]//g')
nbrhs=$(cat $STDOUT | grep "<PERFTESTS> NbRhs" | cut -d '=' -f 2 | \
          sed 's/[^0-9]//g')
step_mesh=$(cat $STDOUT | grep "<PERFTESTS> StepMesh" | cut -d '=' -f 2 | \
              sed 's/[^-0-9e.+]//g')
nbem=$(cat $STDOUT | grep "<PERFTESTS> NbPtsBEM" | cut -d '=' -f 2 | \
        sed 's/[^0-9]//g')
nbpts_lambda=$(cat $STDOUT | grep "<PERFTESTS> nbPtLambda" | \
                  cut -d '=' -f 2 | sed 's/[^-0-9e.+]//g')
lambda=$(cat $STDOUT | grep "<PERFTESTS> Lambda" | cut -d '=' -f 2 | \
          sed 's/[^-0-9e.+]//g')
thread_block_size=$(cat $STDOUT | grep "thread block" | cut -d ':' -f 2 | \
                      cut -d 'x' -f 1 | sed 's/[^0-9]//g')
proc_block_size=$(cat $STDOUT | grep "proc block" | cut -d ':' -f 2 | \
                    cut -d 'x' -f 1 | sed 's/[^0-9]//g')
disk_block_size=$(cat $STDOUT | grep "disk block" | cut -d ':' -f 2 | \
                    cut -d 'x' -f 1 | sed 's/[^0-9]//g')
tps_cpu_facto_mpf=$(cat $STDOUT | grep "<PERFTESTS> TpsCpuFactoMPF =" | \
                      cut -d '=' -f 2 | sed 's/[^0-9.]//g')
tps_cpu_solve_mpf=$(cat $STDOUT | grep "<PERFTESTS> TpsCpuSolveMPF =" | \
                      cut -d '=' -f 2 | sed 's/[^0-9.]//g')
error=$(cat $STDOUT | grep "<PERFTESTS> Error" | cut -d '=' -f 2 | \
          sed 's/[^-0-9e.+]//g')

symmetry=$(cat $STDOUT | grep "Testing: Non-Symmetric matrices and solvers.")
if test "$symmetry" != "";
then
  symmetry="non-symmetric"
else
  symmetry="symmetric"
fi

h_assembly_accuracy=$(cat $STDOUT | grep "\[HMat\] Compression epsilon" | \
                        cut -d ':' -f 2 | sed 's/[^-0-9e.+]//g')
h_recompression_accuracy=$(cat $STDOUT | \
                             grep "\[HMat\] Recompression epsilon" | \
                             cut -d ':' -f 2 | sed 's/[^-0-9e.+]//g')
assembled_size_mb=$(cat $STDOUT | grep "<PERFTESTS> AssembledSizeMb" | \
                      cut -d '=' -f 2 | sed 's/[^0-9.]//g')
tps_cpu_facto=$(cat $STDOUT | grep "<PERFTESTS> TpsCpuFacto =" | \
                  cut -d '=' -f 2 | sed 's/[^0-9.]//g')
factorized_size_mb=$(cat $STDOUT | grep "<PERFTESTS> FactorizedSizeMb" | \
                       cut -d '=' -f 2 | sed 's/[^0-9.]//g')
tps_cpu_solve=$(cat $STDOUT | grep "<PERFTESTS> TpsCpuSolve =" | \
                  cut -d '=' -f 2 | sed 's/[^0-9.]//g')
mumps_blr="NA"

if test "$(cat $STDOUT | grep 'NOT Using Block-Low-Rank compression')" != "";
then
  mumps_blr=0
elif test "$(cat $STDOUT | grep 'Using Block-Low-Rank compression')" != "";
then
  mumps_blr=1
fi

mumps_blr_accuracy=$(cat $STDOUT | \
                       grep "Accuracy parameter for Block-Low-Rank" | \
                       cut -d ':' -f 2 | sed 's/[^-0-9e.+]//g')
mumps_blr_variant=$(cat $STDOUT | grep "BLR Factorization variant" | \
                      cut -d ':' -f 2 | sed 's/[^-0-9e.+]//g')

coupled_method=""
if test "$(cat $STDOUT | grep 'Multi solve method.')" != "";
then
  coupled_method="multi-solve"
elif test "$(cat $STDOUT | grep 'Multi facto method.')" != "";
then
  coupled_method="multi-facto"
fi

coupled_nbrhs=$(cat $STDOUT | grep "Number of simultaneous RHS" | \
                  cut -d ':' -f 2 | sed 's/[^0-9.]//g')

if test "$coupled_nbrhs" == "";
then
  coupled_nbrhs=32 # Default value
fi

size_schur=$(cat $STDOUT | grep "Size of the block Schur" | \
                  cut -d ':' -f 2 | sed 's/[^0-9.]//g')

cols_schur=$(cat $STDOUT | \
               grep "Number of columns in the H-matrix Schur block" | \
               cut -d ':' -f 2 | sed 's/[^0-9.]//g')

sparse_ooc=$(cat $STDOUT | grep "[mumps] Out-of-core.")
if test "$sparse_ooc" != "";
then
  sparse_ooc=1
else
  sparse_ooc=0
fi

platform=$(cat $STDOUT | grep "<PERFTESTS> Underlying platform" | uniq | \
                  cut -d '=' -f 2 | sed 's/[^a-z]//g')
node_family=$(cat $STDOUT | grep "<PERFTESTS> Node family" | uniq | \
                  cut -d '=' -f 2 | sed 's/[^a-z]//g')

singularity_container=$(cat $STDOUT | grep "<PERFTESTS> Singularity image")
if test "$singularity_container" != "";
then
  singularity_container=$(echo $singularity_container | cut -d '=' -f 2 | \
                            sed 's/[[:space:]]*//g')
else
  singularity_container="NA"
fi

slurm_jobid=$(cat $STDOUT | grep "<PERFTESTS> Slurm job identifier" | uniq | \
                  cut -d '=' -f 2 | sed 's/[^0-9.]//g')

if test "$(cat $STDOUT | grep 'Testing : FEM-BEM')" != "";
then
  system_kind="fem-bem"
elif test "$(cat $STDOUT | grep 'Testing : FEM')" != "";
then
  system_kind="fem"
elif test "$(cat $STDOUT | grep 'Testing : BEM')" != "";
then
  system_kind="bem"
else
  system_kind="unknown"
fi

es_cpu="NA"
es_dram="NA"
es_duration="NA"

if test -f energy_scope_estat*;
then
  es_estat=$(cat energy_scope_estat*)
  es_cpu=$(echo $es_estat | jq '.arch.data."joule(J)"."ecpu(J)"')
  es_dram=$(echo $es_estat | jq '.arch.data."joule(J)"."edram(J)"')
  es_duration=$(echo $es_estat | jq '."duration(sec)"')
fi

If the -r option was provided, we parse also storage resource monitoring logs. Memory usage log files are named rss-x.log and disk usage log files are named hdd-x.log where $x$ is MPI process rank.

if test "$RSS" != "";
then
  rm_peak="0.0"
  hdd_peak="0.0"

  for rss in $(ls $RSS | grep -e "^rss");
  do
    rm_peak="$rm_peak + $(cat $RSS/$rss | tail -n 1)"
  done

  for hdd in $(ls $RSS | grep -e "^hdd");
  do
    hdd_peak="$hdd_peak + $(cat $RSS/$hdd | tail -n 1)"
  done

  rm_peak=$(echo $rm_peak | bc -l)
  hdd_peak=$(echo $hdd_peak | bc -l)
fi

Eventually, we print parsed values, values from custom key-value pairs, if any, and additional function information possibly parsed from trace call logs produced by test_FEMBEM.

echo -n "$processes,$by,$mapping,$ranking,$binding,$omp_thread_num," >> $OUTPUT
echo -n "$mkl_thread_num,$hmat_ncpu,$mpf_max_memory,$nbpts," >> $OUTPUT
echo -n "$radius,$height,$nbrhs,$step_mesh,$nbem,$nbpts_lambda," >> $OUTPUT
echo -n "$lambda,$thread_block_size,$proc_block_size," >> $OUTPUT
echo -n "$disk_block_size,$tps_cpu_facto_mpf,$tps_cpu_solve_mpf," >> $OUTPUT
echo -n "$error,$symmetry,$h_assembly_accuracy," >> $OUTPUT
echo -n "$h_recompression_accuracy,$assembled_size_mb," >> $OUTPUT
echo -n "$tps_cpu_facto,$factorized_size_mb,$tps_cpu_solve," >> $OUTPUT
echo -n "$mumps_blr,$mumps_blr_accuracy,$mumps_blr_variant," >> $OUTPUT
echo -n "$coupled_method,$coupled_nbrhs,$size_schur,$cols_schur," >> $OUTPUT
echo -n "$sparse_ooc,$platform,$node_family,$singularity_container," >> $OUTPUT
echo -n "$slurm_jobid,$system_kind,$es_cpu,$es_dram,$es_duration," >> $OUTPUT
echo -n "$rm_peak,$hdd_peak" >> $OUTPUT

for kv in $CUSTOM_KV;
do
  VALUE=$(echo $kv | cut -d '=' -f 2)
  echo -n ",$VALUE" >> $OUTPUT
done

for f in $FUNCTIONS;
do
  exec_time=$(cat $TRACE | grep $f | cut -d '|' -f 2 | sed 's/[^0-9.]//g')
  nb_execs=$(cat $TRACE | grep $f | cut -d '|' -f 3 | sed 's/[^0-9]//g')
  flops=$(cat $TRACE | grep $f | cut -d '|' -f 4 | sed 's/[^0-9]//g')
  echo -n ",$exec_time,$nb_execs,$flops" >> $OUTPUT
done

At the very end, we print the trailing new line character to the output file, perform cleaning and terminate.

echo >> $OUTPUT
rm -rf .input
exit 0

See the complete source file parse.sh ¹¹.

3.9. Database injecting

The Python script inject.py allows to call a custom parsing script (see Section 3.8), producing a .csv file on output and gather the results exported in the latter, then inject the values into a gcvb NoSQL database (see Section 3.1) for a possible result visualization using the gcvb dashbord feature even if it is only experimental for now. See a concrete usage of this script in Section 3.5.

At the beginning, we import necessary Python modules as well as the gcvb module.

import gcvb
import subprocess
import csv
import sys

The script expect the -h option or at least two arguments:

DATASET which stands for the .csv file to gather the data from,
PARSER which represents the path to the parsing script to use.

DATASET should be a .csv. file containing two lines, one heading providing the captions and then the associated values.

Next, may follow the arguments to call PARSER with.

The script continues with a help message function, that can be triggered with the -h option, and argument check.

def help():
    print("Launch PARSER (with ARGUMENTS if needed) instrumented to produce a ",
          "comma-separated values .csv output DATASET, deserialize it and ",
          "inject to the current gcvb database provided by the gcvb module.\n",
          "Usage: ", sys.argv[0], " DATASET PARSER [ARGUMENTS]\n\n",
          "Options:\n  -h    Show this help message.",
          file = sys.stderr, sep = "")

if len(sys.argv) < 2:
    print("Error: Arguments mismatch!\n", file = sys.stderr)
    help()

if sys.argv[1] == "-h":
    help()
    sys.exit(0)

In the main function, we parse the arguments

def main():
    args = sys.argv
    args.pop(0) # Drop the script name.
    dataset = args.pop(0)
    parser = args.pop(0)

and launch the parsing script.

    subprocess.run([parser] + args)

Once it's finished, we open the data set produced, gather data into a dictionary and inject key-value pairs one by one into the currently used gcvb database provided by the imported gcvb module.

    with open(dataset, mode = 'r') as data:
        reader = csv.DictReader(data)

        for row in reader:
            for item in row:
                gcvb.add_metric(item, row[item])

if __name__ == '__main__':
    main()

See the complete source file inject.py ¹².

3.10. Extracting additional results

In some cases, we need to extract a more detailed information from the log files produced by one or more selected benchmarks. For example, regarding real memory (RAM) consumption, we gather only the peak usage values by default. Although, we may also need to know how the consumption evolves during the entire execution time of a given benchmark. Furthermore, some benchmarks may produce additional data which are not taken into account by the parsing script (see Section 3.8).

Therefore, we define here the extract.sh shell script allowing to extract additional benchmark results for one or more user-specified benchmarks.

The script begins traditionnaly with a help message function that can be triggered using the -h option.

function help() {
  echo "Extract additional results for selected benchmark or benchmarks." >&2
  echo "Usage: ./extract.sh [options]" >&2
  echo >&2
  echo "Options:" >&2
  echo "  -h            Show this help message." >&2
  echo -n "  -B ID[,ID]    Select the benchmark matching ID. Multiple " >&2
  echo "identifiers may be specified to select multiple benchmarks." >&2
  echo "  -d PATH       Place the extracted files to PATH." >&2
  echo "  -r            Extract detailed real memory (RAM) consumption." >&2
  echo "  -s PATH       Search for benchmark results in PATH:" >&2    
  echo "  -t            Extract the execution timeline." >&2
  echo "  -T PATH       Search for the timeline extraction tool in PATH." \
       "By default, the script looks in './sources'.">&2
}

We use a generic error message function. The error message to print is expected to be the first argument to the function. If not present, the function displays a generic error message.

function error() {
  if test $# -lt 1;
  then
    echo "An unknown error occurred!" >&2
  else
    echo "Error: $1" >&2
  fi
}

Follows the option parsing.

BENCHMARKS=""
DESTINATION=$(pwd)
RSS=0
SOURCE=""
TIMELINE=0
TIMELINE_PATH="./sources"

while getopts ":hB:d:rs:tT:" option;
do
  case $option in

The -B option allows to specify one or more benchmarks to extract additional results for. If more than one identifier is present, they must be separated by commas. The latter are then converted to tabulations for easier post-processing.

    B)
      BENCHMARKS="$BENCHMARKS $(echo $OPTARG | sed 's/,/\t/g')"

      if test "$BENCHMARKS" == "";
      then
        error "Bad usage of the '-B' option (no identifiers specified)!"
        exit 1
      fi
      ;;

By default, the extracted files shall be placed to the current working directory of the script. One can use the -d option to specify an another destination directory.

    d)
      DESTINATION=$OPTARG

      if test ! -d "$DESTINATION";
      then
        error "'$DESTINATION' is not a valid destination directory!"
        exit 1
      fi
      ;;

The -r option tells the script to extract detailed real memory (RAM) consumption data.

    r)
      RSS=1
      ;;

The -s option allows to specify the path to the directory to search for benchmark results in. Typically, this represent the path to a gcvb benchmark session directory (see Section 3.1).

    s)
      SOURCE=$OPTARG

      if test ! -d "$SOURCE";
      then
        error "'$SOURCE' is not a valid source directory!"
        exit 1
      fi
      ;;

The -t option allows to extract the execution timelines of the selected benchmarks. We also have to check that the external proprietary extraction tool from Airbus we use is available from the current working directory.

    t)
      TIMELINE=1
      ;;

The -T option allows to specify the path to search the timeline extraction tool in.

    T)
      TIMELINE_PATH=$OPTARG
      ;;

Eventually, we have to take care of unknown options or missing arguments, if any, raise an error and terminate the script in that case.

    \?) # Unknown option
      error "Arguments mismatch! Invalid option '-$OPTARG'."
      echo
      help
      exit 1
      ;;
    :) # Missing option argument
      error "Arguments mismatch! Option '-$OPTARG' expects an argument!"
      echo
      help
      exit 1
      ;;
    h | *)
      help
      exit 0
      ;;
  esac
done

The -B and the -s options are mandatory and at least one of the extraction options, -r or -t, must be specified as well.

if test "$SOURCE" == "";
then
  error "No source directory was specified! Nothing to do."
  exit 1
fi

if test "$BENCHMARKS" == "";
then
  error "No benchmark identifier was specified!"
  exit 1
fi

if test $RSS -eq 0 && test $TIMELINE -eq 0;
then
  error "Use at least one of the extraction options! Nothing to do."
  exit 1
fi

If the -t option was specified, we need to check if the timeline extraction tool is accessible.

if test $TIMELINE -ne 0 && test ! -f $TIMELINE_PATH/timeline.py;
then
  error "Timeline extraction tool was not found in '$TIMELINE_PATH'!"
  exit 1
fi

For each benchmark identifier specified using the -B option we:

extract all of the real memory consumption logs if the -r option is present,

for b in $BENCHMARKS;
do
  if test $RSS -ne 0;
  then
    RSS_LOGS=$(ls $SOURCE/$b/rss-*.log)
    
    if test "$RSS_LOGS" == "";
    then
      error "There are no real memory (RAM) consumption logs at '$RSS_LOGS'!"
      exit 1
    fi

    COUNTER=0
    for l in $RSS_LOGS;
    do
      cp $l $DESTINATION/rss-$COUNTER-$b.log
      if test $? -ne 0;
      then
        error "Failed to extract the real memory (RAM) consumption log '$l'!"
        exit 1
      fi
      
      COUNTER=$(expr $COUNTER + 1)
    done
  fi

extract the execution timelines from the standard output log of the selected benchamrks if the -t option is present.

  if test $TIMELINE -ne 0;
  then
    python $TIMELINE_PATH/timeline.py $SOURCE/$b/stdout.log \
           > $DESTINATION/timeline-$b.log

    if test $? -ne 0 || test ! -f $DESTINATION/timeline-$b.log;
    then
      error "Failed to extract the execution timeline of the benchmark '$b'!"
      exit 1
    fi
  fi
done

See the complete source file extract.sh ¹³.

3.11. Wrapper scripts

Before launching a benchmark there are some preparation actions to perform. Similarly, there are some completion and cleaning actions to perform once the benchmark task finishes. For the sake of convenience, especially in case of distributed parallel execution, we use a wrapper Shell script to perform these actions. This script has the form of a gcvb template file (see Section 3.1) allowing us to make the script fit the individual parameters of each benchmark. Although, the major part of the template is identical for all kinds of benchmark we define (see Section 3.5), some parts may differ to meet the specific needs of a given benchmark or set of benchmarks, e.g. setting of the environment variables related to out-of-core computation activation. In this section, we define multiple variations of the wrapper script template.

3.11.1. In-core benchmarks

The wrapper script template wrapper-incore.sh is meant for benchmarks run in-core, i.e. with the out-of-core computation disabled.

At first, we need to setup a dedicated temporary folder for the benchmark enabling us to measure the disk space used during the execution of the latter (see Section 3.6). The GCVB_TEST_ID environment variable is set by gcvb and represents a unique benchmark identifier. Note that we remove the /tmp/vive-pain-au-chocolat directory first to ensure there is no data left behind by a previous benchmark run, if any.

rm -rf /tmp/vive-pain-au-chocolat
mkdir -p /tmp/vive-pain-au-chocolat/$GCVB_TEST_ID
export TMPDIR=/tmp/vive-pain-au-chocolat/$GCVB_TEST_ID

As a temporary workaround for Singularity containers to work properly, we need to set the PYTHONPATH environment variable too.

if test "$SINGULARITY_CONTAINER" != "";
then
  export PYTHONPATH=$GUIX_PYTHONPATH
fi

Using the OMPI_MCA_pml environment variable, we specify the communication library to use by OpenMPI at runtime.

export OMPI_MCA_pml='^ucx'

Then, we position the environment variables specifying the number of threads to use available through the parallel placeholder (see Section 3.5).

export OMP_NUM_THREADS={parallel[nt]} MKL_NUM_THREADS={parallel[nt]} \
       HMAT_NCPU={parallel[nt]}

Also, we need to disable the out-of-core computation feature. For MUMPS, it is disabled by default. For SPIDO and HMAT we need to set a couple of environment variables to disable it explicitly.

export MPF_SPIDO_INCORE=1 HMAT_DISABLE_OOC=1

In prevision of later post-processing of the standard output of the execution, we try to determine the rank of the current process

if test "$OMPI_COMM_WORLD_RANK" != "";
then
  MPI_RANK=$OMPI_COMM_WORLD_RANK
elif test "$MPI_RANKID" != "";
then
  MPI_RANK=$MPI_RANKID
else
  echo "Failed to get the MPI of the current process! Can not proceed." >&2
  exit 1
fi

and if we are the 0^th process, we print out:

selected scheduling information (see the scheduler placeholder in Section 3.5),

if test $MPI_RANK -eq 0;
then
   echo "<PERFTESTS> Underlying platform = {scheduler[platform]}"
   echo "<PERFTESTS> Node family = {scheduler[family]}"
fi

the MPI configuration (see the parallel placeholder in Section 3.5),

if test $MPI_RANK -eq 0;
then
   echo "<PERFTESTS> MPI process count = {parallel[np]}"
   echo "<PERFTESTS> MPI processes mapped by = {parallel[map]}"
   echo "<PERFTESTS> MPI processes ranked by = {parallel[rank]}"
   echo "<PERFTESTS> MPI processes bound to = {parallel[bind]}"
fi

the path to the Singularity container, if the execution is taking place within one of them,

if test "$SINGULARITY_CONTAINER" != "" && test $MPI_RANK -eq 0;
then
  echo "<PERFTESTS> Singularity image = $SINGULARITY_CONTAINER."
fi

the Slurm job identifier.

if test "$SLURM_JOBID" != "" && test $MPI_RANK -eq 0;
then
  echo "<PERFTESTS> Slurm job identifier = $SLURM_JOBID"
fi

We also need to create symbolic links to industrial test case meshes in the current working directory.

if test $MPI_RANK -eq 0;
then
  MESHES="../../../meshes"
  for i in $(ls $MESHES);
  do
    ln -s $MESHES/$i $i
  done
fi

At this point, we can launch the benchmark command passed as argument.

eval "$*"

After the execution, we have to clean all the files produced during the benchmark execution except for logs, results, FXT traces, configuration files, scheduling and launching scripts, i.e. files matching *.log, *.yaml, es_config.json, prof_file*, *.out, *batch, coupled, scalability, or *.sh.

if test $MPI_RANK -eq 0;
then
  rm -f $(find . -type f ! -name "*.log" ! -name "*.yaml" ! -name "prof_file*" \
               ! -name "*.out" ! -name "*batch" ! -name "coupled" \
               ! -name "scalability" ! -name "*.sh" ! -name "*energy_scope*" \
               ! -name "es_config.json" ! -name "*.csv")
fi

3.11.2. Out-of-core benchmarks

Compared to the template for in-core benchmarks (see Section 3.11.1), the template for out-of-core benchmarks wrapper-ooc.sh makes sure the environment variables disabling the feature for SPIDO and HMAT are unset if the placeholder {dense[ooc]} is true. For MUMPS, the activation of out-of-core is handled using a dedicated switch of test_FEMBEM. Also, by default, HMAT starts to dump data on disk only when the solver's consumption reaches 55% of available system memory. This leaves only 45% of available memory for all the other data and computations. Therefore, we need to decrease this threshold.

<<code:incore-preparation>>
<<code:incore-mpi>>
if test {dense[ooc]} -eq 1;
then
  unset MPF_SPIDO_INCORE
  unset HMAT_DISABLE_OOC
  TOTAL_MEMORY_KiB=$(grep "MemTotal" /proc/meminfo | sed 's/[^0-9]//g')
  TOTAL_MEMORY_MiB=$(echo "$TOTAL_MEMORY_KiB / 1024 / 4;" | bc -l)
  export HMAT_LIMIT_MEM=$TOTAL_MEMORY_MiB
else
  export MPF_SPIDO_INCORE=1 HMAT_DISABLE_OOC=1
fi
<<code:incore-completion>>

3.11.3. FXT execution tracing template

In the template wrapper-fxt.sh for benchmarks ensuring FXT execution tracing, we need to set specific StarPU environment variables related to this functionality and add the path to the StarVZ R library (see Section 4.2) in PATH. Note that the out-of-core computation remains disabled.

<<code:incore-preparation>>
<<code:incore-mpi>>
<<code:incore-incore>>
export STARPU_FXT_PREFIX=$(pwd)/ STARPU_PROFILING=1 STARPU_WORKER_STATS=1
export PATH=$GUIX_ENVIRONMENT/site-library/starvz/tools/:$PATH
<<code:incore-completion>>

3.11.4. Multi-node parallel distributed benchmarks template

In the template wrapper-in-core-distributed.sh for multi-node parallel distributed benchmarks, the total number of MPI processes is given by the {scheduler[nodes]} placeholder instead of {parallel[np]}. In this case, we also print out the number of MPI processes per node.

<<code:incore-preparation>>
<<code:incore-incore>>
if test $MPI_RANK -eq 0;
then
  echo "<PERFTESTS> MPI process count = {scheduler[nodes]}"
  echo "<PERFTESTS> MPI processes per node = {scheduler[np]}"
  echo "<PERFTESTS> MPI processes mapped by = {parallel[map]}"
  echo "<PERFTESTS> MPI processes ranked by = {parallel[rank]}"
  echo "<PERFTESTS> MPI processes bound to = {parallel[bind]}"
fi
<<code:incore-completion>>

See the complete wrapper script templates wrapper-in-core.sh ¹⁴, wrapper-ooc.sh ¹⁵ and wrapper-fxt.sh ¹⁶.

3.12. Generate benchmark runs

A single gcvb command is used to generate a new benchmark session in the results folder (see Section 3.1) of the corresponding gcvb filesystem based on the definition file (see Section 3.5) specified with the --yaml-file option.

python3 -m gcvb generate --yaml-file rr-2020.yaml

3.13. Job submission

To simplify the scheduling of benchmarks to run in parallel based on the associated job names (see Section 3.2), we use the shell script submit.sh. It determines the list of benchmarks from a given benchmark session, finds the corresponding slurm batch script configuration headers and schedule associated jobs.

We begin by defining a help message function that can be triggered if the script is run with the -h option.

function help() {
  echo "Submit benchmarks from SESSION." >&2
  echo "Usage: $(basename $0) [options]" >&2
  echo >&2
  echo "Options:" >&2
  echo "  -h            Print this help message." >&2
  echo "  -E REGEX      Submit all benchmarks except those verifying REGEX" \
       "(mutually exclusive with '-F')." >&2
  echo "  -F REGEX      Submit only the benchmarks verifying REGEX." >&2
  echo "  -g            Generate job scripts without submitting them." >&2
  echo "  -i ID[,ID]    Submit only the benchmarks having ID." >&2
  echo -n "  -l            Only list the benchmarks to run instead of " >&2
  echo "actually submitting the corresponding jobs." >&2
  echo "  -s SESSION    Session to submit the benchmarks from." >&2
  echo "  -v            Validate benchmarks without computing." >&2
  echo "  -w            Make the script wait until all of the submitted" \
       "jobs complete." >&2
}

We use a generic error message function. The error message to print is expected to be the first argument to the function. If not present, the function displays a generic error message.

function error() {
  if test $# -lt 1;
  then
    echo "An unknown error occurred!" >&2
  else
    echo "Error: $1" >&2
  fi
}

Follows the parsing of options.

REGEX=""
EXCLUDE=0
INCLUDE=0
IDs=""
SESSION=""
SESSION_ID=""
GENERATE_ONLY=0
LIST_ONLY=0
VALIDATE=0
WAIT=0

while getopts ":hE:F:gi:ls:vw" option;
do
  case $option in

Using the -E option, one can decide to exclude from scheduling the benchmarks the names of which verify a given regular expression.

    E)
      REGEX=$OPTARG
      EXCLUDE=1

      if test "$REGEX" == "";
      then
        error "Bad usage of the '-E' option (empty regular expression)!"
        exit 1
      fi
      ;;

Using the -F option, one can decide to schedule only the benchmarks the names of which verify a given regular expression.

    F)
      REGEX=$OPTARG
      INCLUDE=1

      if test "$REGEX" == "";
      then
        error "Bad usage of the '-F' option (empty regular expression)!"
        exit 1
      fi
      ;;

The -g option can be used to generate job scripts for selected benchmarks without actually submitting them. The generated scripts are then placed into results/<session>/jobs.

    g)
      GENERATE_ONLY=1
      ;;

The -i option allows to select benchmarks to schedule by providing their exact identifiers separated by commas if multuiple identifiers are specified.

    i)
      IDs="$IDs $(echo $OPTARG | sed 's/,/ /g')"

      if test "$IDs" == "";
      then
        error "Bad usage of the '-i' option (empty list of identifiers)!"
        exit 1
      fi
      ;;

The -l option enables to list benchmarks selected for scheduling without actually scheduling them. It may be particularly useful, for example, in combination with the -F option to verify whether the provided regular expression matches the intended benchmarks.

    l)
      LIST_ONLY=1
      ;;

The -s option is used to specify the benchmark session in the results directory (see Section 3.1) to schedule the benchmarks from.

    s)
      SESSION=$OPTARG

      if test ! -d $SESSION;
      then
        error "'$SESSION' is not a valid directory!"
        exit 1
      fi

      SESSION_ID=$(basename $SESSION)
      ;;

To re-run the validation phase without performing the computation of benchmarks, we can use the -v option.

    v)
      VALIDATE=1
      ;;

To make the script wait until all the jobs that it has scheduled finish, the -w option may be used.

    w)
      WAIT=1
      ;;

Eventually, we have to take care of unknown options or missing arguments, if any, raise an error and terminate the script in that case.

    \?) # Unknown option
      error "Arguments mismatch! Invalid option '-$OPTARG'."
      echo
      help
      exit 1
      ;;
    :) # Missing option argument
      error "Arguments mismatch! Option '-$OPTARG' expects an argument!"
      echo
      help
      exit 1
      ;;
    h | *)
      help
      exit 0
      ;;
  esac
done

Options -E and -F, -F and -i, -l and -w, -g and -w as well as -g and -l are mutually exclusive.

if test $EXCLUDE -ne 0 && test $INCLUDE -ne 0;
then
  error "Options '-E' and '-F' must not be used together!"
  exit 1
fi

if test "$REGEX" != "" && test "$IDs" != "";
then
  error "Options '-E' and '-F' must not be used together with the '-i' option!"
  exit 1
fi

if test $WAIT -ne 0 && test $LIST_ONLY -ne 0;
then
  error "Options '-l' and '-w' must not be used together!"
  exit 1
fi

if test $WAIT -ne 0 && test $GENERATE_ONLY -ne 0;
then
  error "Options '-g' and '-w' must not be used together!"
  exit 1
fi

if test $GENERATE_ONLY -ne 0 && test $LIST_ONLY -ne 0;
then
  error "Options '-g' and '-l' must not be used together!"
  exit 1
fi

After argument parse and check, we naviagte to the root of the gcvb filesystem of the given benchmark session folder.

cd $SESSION/../../

We also check whether we are in a valid gcvb filesystem contaning a config.yaml and a results folder (see Section 3.1).

if test ! -f config.yaml || test ! -d results;
then
  error "'$SESSION' is not a correct gcvb session directory!"
  exit 1
fi

Then, we ensure to clean all the temporary folders we may have used before. As configured in definition files (see Section 3.5), to isolate our temporary files from other users, we use a dedicated temporary folder at /tmp/vive-pain-au-chocolat.

rm -rf /tmp/vive-pain-au-chocolat
if test $? -ne 0;
then
  error "Unable to clean any temporary folder(s) previously used!"
  exit 1
fi

Before submitting, we look for all the sbatch configuration headers. If there is none, there is no valid benchmark job to submit.

RUNS=$(find $SESSION -maxdepth 2 -type f -name "sbatch")

if test "$RUNS" == "";
then
  error "No sbatch file was found! No valid job for submission."
  exit 1
fi

The script loops over all the headers found and extracts the job name from each of them. We use an associative array to store for each job name, corresponding to a set of benchmarks (e. g. multi-solve-1-mumps-spido), the path to the associated header file.

declare -A BATCH_JOBS

In session folder, there is one folder for each benchmark, not only per set of benchmarks. So, the same header file may be present multiple times in different folders. Using an associative array we manage to keep the path to only one copy of each header and prevent redundant job submissions.

for run in $RUNS
do
  JOB_NAME=$(grep "\-\-job-name" $run | cut -d '=' -f 2)

If the -i options is used to specify exact identifiers of benchmarks to be submitted, we need to filter the detected benchmarks.

  if test "$IDs" != "";
  then
    for id in $IDs;
    do
      if test $(echo $run | grep "$id" | wc -l) -gt 0;
      then
        BATCH_JOBS[$JOB_NAME]=$run
      fi
    done

Also, in the case the -F option followed by a regular expression is specified, we pick only the jobs the name of which matches the expression.

  elif test "$REGEX" == "" || \
      (test $INCLUDE -ne 0 && [[ "$JOB_NAME" =~ $REGEX ]]) || \
      (test $EXCLUDE -ne 0 && ! [[ "$JOB_NAME" =~ $REGEX ]]);
  then
    BATCH_JOBS[$JOB_NAME]=$run
  fi
done

If the user instruments the script to wait until the submitted jobs complete using the -w option, before actually submitting the jobs, we must reinitialize the list of jobs to wait for.

if test $WAIT -ne 0;
then
  JOB_LIST=""
fi

Finally, we call the gcvb submission command for each job name with the --filter-by-test-id and --header options allowing us to specify a subset of jobs to submit and prepend associated shell scripts with corresponding slurm batch script configuration headers. If the -v option is used, we pass the --validate-only option to gcvb in order to run only the validation phase. If the -g option is used, we pass the --dry-run option to gcvb in order to exit before actually submit the generated job script. The latter is then moved into results/<session>/jobs. If the -l option is specified, the jobs selected for scheduling are not submitted, only their identifier is printed out on the screen.

SET_COUNT=${#BATCH_JOBS[@]}
i=1

for unique in "${!BATCH_JOBS[@]}";
do
  PREFIX="[ $i / $SET_COUNT ]"

  if test $LIST_ONLY -ne 0;
  then
    echo "$PREFIX Listing job '$unique'."
  else
    ACTION_ERROR="submit"
    ACTION_SUCCESS="Submitted"
    
    if test $VALIDATE -ne 0;
    then
      ACTION_ERROR="validate"
      ACTION_SUCCESS="Validated"
      
      python3 -m gcvb --filter-by-test-id "$unique" compute \
              --gcvb-base $SESSION_ID --validate-only --wait-after-submitting
    elif test $GENERATE_ONLY -ne 0;
    then
      ACTION_ERROR="generate job script for"
      ACTION_SUCCESS="Generated job script for"
      
      python3 -m gcvb --filter-by-test-id "$unique" compute \
              --gcvb-base $SESSION_ID --dry-run --header ${BATCH_JOBS[$unique]}
      mkdir -p $SESSION/jobs
      mv $SESSION/job.sh $SESSION/jobs/$unique.sh
    else
      python3 -m gcvb --filter-by-test-id "$unique" compute \
              --gcvb-base $SESSION_ID --wait-after-submitting \
              --header ${BATCH_JOBS[$unique]}
    fi
    
    if test $? -ne 0;
    then
      error "$PREFIX Failed to $ACTION_ERROR batch job '$unique'!"
      exit 1
    else
      echo "$PREFIX $ACTION_SUCCESS batch job '$unique'."
    fi

Also, if the -w option is set, we use the Slurm's squeue command to collect job identifiers of the launched jobs to make the script wait for the latter to complete before exiting itself.

    if test $WAIT -ne 0;
    then
      LAST_JOB=$(squeue -u $USER -ho "%i" -S -V | head -n 1)
      JOB_LIST="$LAST_JOB $JOB_LIST"
    fi
  fi

  i=$(expr $i + 1)
done

Once the jobs are submitted through the sbatch command and if the -w option is set, we make the script wait until all the submitted jobs complete. We use squeue again to retrieve the list of running jobs.

Note that if the compute command of gcvb executes correctly but slurm fails to submit the job anyway, JOB_LIST will not be an empty string but will contain spaces. Therefore, before entering the waiting loop, we have to ensure the variable does not begin with a space in order to prevent an infinite waiting loop.

if test $WAIT -ne 0;
then
  if test "${JOB_LIST:0:1}" == " ";
  then
    error "No jobs to wait for!"
    exit 1
  fi
  
  echo -n "Waiting for submitted jobs to complete... "

  while test "$JOB_LIST" != "";
  do
    STILL_RUNNING=""
    for id in $JOB_LIST;
    do
      if test $(squeue | grep "$id" | wc -l) -gt 0;
      then
        STILL_RUNNING="$STILL_RUNNING $id"
      fi
    done

    JOB_LIST=$STILL_RUNNING
    sleep 1m
  done
  echo "Done"
fi

exit 0

See the complete source file submit.sh ¹⁷.

4. Post-processing results

4.1. Benchmarks

To visualize benchmark results we use R and rely mainly on the plot drawing library ggplot2 ggplot2. The R script file plot.R we describe in this section is meant to be used from within plot drawing R source code blocks in concerned Org documents, e.g. reports, articles and so on.

4.1.1. Prerequisites

At the beginning we need to import several libraries for:

manipulating data sources, i.e. R data frames, SQLite databases and JSON files,
```
library(plyr)
library(dplyr)
library(readr)
library(DBI)
library(rjson)
```
transforming data frames,
```
library(tidyr)
```
decompressing gzip files,
```
library(R.utils)
```

defining and exporting plots.

library(ggplot2)
library(scales)
library(grid)
library(gridExtra)
require(cowplot)
library(stringr)
library(ggrepel)

We also need to prevent using the cairo library to produce figures. This does not work on Linux.

set_null_device(cairo_pdf)

4.1.2. Importing data

4.1.2.1. Benchmark results

The benchmark tool we rely on, gcvb, stores benchmark results into a local SQLite database (see Section 3.1). We define a global variable having the name gcvb_db_path to store the path to the gcvb database we are currently using.

gcvb_db_path <- "../results.db"

The database file containing our latest experimental results is available for download through the link below.

https://thesis-mfelsoci.gitlabpages.inria.fr/experiments/attachments/results.db

To extract the results from the database, we define the function gcvb_import which takes up to three arguments:

session indicating the gcvb benchmark session to extract the results from (see Section 3.1),
like representing a list or a vector of benchmark name patterns,
path containing the path to the gcvb SQLite database file and by default is set to gcvb_db_path (see above).

Using the like argument one can extract only the results of benchmarks the identifiers of which match given pattern or patterns (see Section 3.5). The latter may contain wildcard characters supported by the SQL's operator LIKE sqliteLike.

The function begins by opening a connection to gcvb SQLite database and setting the LIKE keyword to be case-sensitive.

gcvb_import <- function(session, like = c(), path = gcvb_db_path) {
  conn <- dbConnect(RSQLite::SQLite(), path)
  dbExecute(conn, "PRAGMA case_sensitive_like = true;")

A gcvb database contains 6 tables (see Figure 1).

Figure 1: EER diagram of a gcvb database.

At first, we need to select all the benchmarks belonging to session from the test table and optionally restrict the selection to benchmarks matching the patterns of like.

  request <- "" 
  for(p in 1:length(like)) {
    if(p > 1) {
      request <- paste(request, "OR ")
    }
    request <- paste0(request, "name LIKE '", like[p], "'")
  }
  if(length(like) > 0) {
    request <- paste(
      "SELECT id, name, start_date FROM test WHERE (",
      request,
      ") AND run_id IN",
      "(SELECT id FROM run WHERE gcvb_id =", session, ")"
    )
  } else {
    request <- paste(
      "SELECT id, name, start_date FROM test WHERE run_id IN",
      "(SELECT id FROM run WHERE gcvb_id =", session, ")"
    )
  }

  benchmarks <- dbGetQuery(conn, request)

Then, if a given benchmark was executed more then once, we want to keep only the data associated with the latest run. To do this, we sort the data by the execution start timestamp, i.e. the start_date column and remove duplicates based on benchmark identifiers in the name column. Note that start_date is imported as string, so we need to convert it to a datetime format at first. At the end, we drop the start_date column being useless for further treatment.

  benchmarks$start_date <- as.POSIXct(benchmarks$start_date)
  benchmarks <- benchmarks[order(benchmarks$start_date, decreasing = TRUE), ]
  benchmarks <- benchmarks[!duplicated(benchmarks$name), ]
  benchmarks <- subset(benchmarks, select = -c(start_date))

Benchmark results are stored in the separate valid table. Each row references a benchmark in the test table and represents one metric and its value (see an example in Table 1). This representation is referred to as long format.

Table 1: Example of the `valid` table.
id	metric	value	test_id	task_step
39870	nbrhs	50	1260	NULL
39871	step_mesh	0.02241996	1260	NULL
…	…	…	…	…

In addition to the valid table, the file table holds files potentially associated with a given benchmark, e.g. resource consumption logs (see an example in Table 2. Note that the files themselves are stored in a gzip-compressed binary form.

Table 2: Example of the `files` table.
id	filename	file	test_id
9	rss-0.log	BLOB	3666
10	hdd-0.log	BLOB	3666
11	…/energy_scope_eprofile_1473279.txt	BLOB	3666
…	…	…	…

So, for each of the selected benchmarks, we extract the associated rows from the valid and the files tables based on the test_id key. Note that the value column is of mixed type, i.e. it contains both strings and numerical values. To prevent the R dbGetQuery function from doing inappropriate type conversions, we cast the value column to text in the corresponding SQL request. We do the same for the file column in the files table. Note also that for the union operation below to work, we have to rename the selected columns of the files table to match those in the valid table.

  for(row in 1:nrow(benchmarks)) {
    test_id <- benchmarks[row, "id"]
    metrics <- dbGetQuery(
      conn,
      paste(
        "SELECT metric, CAST(value AS TEXT) AS 'value'",
        "FROM valid WHERE test_id = :id",
        "UNION",
        "SELECT filename AS 'metric', QUOTE(file) AS 'value'",
        "FROM files WHERE test_id = :id"
      ),
      params = list(id = test_id)
    )

At the same time, we convert the original long format representation (see Table 1) into wide format for easier data manipulation. In the latter, to each metric corresponds a separate column (see an example in Table 3). This way, each row contains all the results of a given benchmark.

Table 3: Example of a table containing benchmark results in wide format.
id	name	nbrhs	step_mesh	…
1260	spido-100000	50	0.02241996	…

    if(nrow(metrics) > 0) {
      for(metric in 1:nrow(metrics)) {
        metric_name <- metrics[metric, "metric"]
        if(!(metric_name %in% colnames(benchmarks))) {
          benchmarks[[metric_name]] <- NA
        }

        benchmarks[which(benchmarks$id == test_id), metric_name] <-
          metrics[metric, "value"]
      }
    }
  }

Finally, we convert columns containing numerical values to the appropriate data type, close the database connection and return the resulting data frame. Note that the names of some columns, e.g. rss-0.log, hdd-1.log or tmp_energy_scope_1473279/energy_scope_eprofile_1473279.txt may vary from one benchmark to another so we have to determine their names dynamically using a regular expression.

  cols_to_numeric <- colnames(benchmarks)
  dynamic_columns <- cols_to_numeric[
    grepl(
      paste(
        "^rss.*\\.log$",
        "^hdd.*\\.log$",
        "^likwid.*\\.csv$",
        ".*energy_scope_eprofile.*\\.txt$",
        sep = "|"
      ),
      cols_to_numeric
    )
  ]
  
  cols_to_numeric <- cols_to_numeric[! cols_to_numeric %in%
                                     c(
                                       c("name", "mapping", "ranking",
                                         "binding", "coupled_method", "solver",
                                         "variation", "platform", "node",
                                         "node_family", "symmetry",
                                         "singularity_container", "system_kind",
                                         "energy_scope", "config_kind",
                                         "solver_config"),
                                       dynamic_columns
                                     )]
  benchmarks <- benchmarks %>% mutate_at(cols_to_numeric, as.numeric)
  dbDisconnect(conn)
  return(benchmarks)
}

4.1.2.2. Trace files

We use multiple benchmarking tools to monitor the usage of storage resources (memory and hard drive), the power and energy consumption as well as the floating-point operation rate and memory bandwidth of our solver suite.

To monitor the memory and hard drive usage, we use a custom Python script named rss.py (see Section 3.6). A measure is taken every second during the execution. There is one consumption log file per MPI process. The names of the log files matche rss-N.log for memory usage and hdd-N.log for hard drive usage where N stands for the rank of the corresponding MPI process. In a log file, each line contains the current timestamp and usage in mibibytes (MiB). The very last line corresponds to the peak value.

To measure the power and the energy consumption of processing units and RAM, we use the energy_scope tool (see Section 3.7). In this case, the acquisition interval can be defined by the user. As a result, energy_scope produces an archive containing all the measures in a raw form. It can then produce a JSON trace file based on this archive. The output file is stored under ./tmp_energy_scope_X/energy_scope_eprofile_X.txt where X represents the identifier of the corresponding Slurm job (see Section 3.2).

Finally, to measure the floating-point operation rate and the memory bandwidth, we rely on the likwid tool likwid,treibig2010likwid. likwid benchmarks each MPI process of the application separately and stores all the measures into a comma-separated value file the name of which mathes likwid-N.csv where N stands for the rank of the corresponding MPI process.

At the end of a benchmark execution, the above log and trace files are compressed and stored into the SQLite database holding all the benchmark results (see Section 4.1.2.1). Therefore, in the first place, we need to define a function allowing us to extract the files from the database and decompress them.

4.1.2.3. Retrieving files

The function gcvb_retrieve_files takes a data frame containing the results of one single benchmark, through the benchmark argument, retrieved by the gcvb_import function (see Section 4.1.2.1) and formatted using the gcvb_format function (see Section 4.1.3.1), extracts the corresponding trace files and decompresses them into the current working directory. Note that the energy_scope trace file is renamed to eprofile.txt to ensure a static file name and simplify its later processing.

In the first place, we need to dynamically determine the names of the columns containing the log and trace files based on the patterns discussed above.

gcvb_retrieve_files <- function(benchmark) {
  trace_file_columns <- colnames(benchmark)
  trace_file_columns <- trace_file_columns[
    grepl(
      paste(
        "^rss.*\\.log$",
        "^hdd.*\\.log$",
        "^likwid.*\\.csv$",
        ".*energy_scope_eprofile.*\\.txt$",
        sep = "|"
      ),
      trace_file_columns
    )
  ]

Then, we iterate over the corresponding columns in benchmark and:

  for(trace_file_column in trace_file_columns) {

retrieve the compressed version of the file as string,

    trace_file_name <- benchmark[[trace_file_column]]

if the file is not empty, strip the X' prefix and the ' suffix from this string,

    if(!is.na(trace_file_name)) {
      trace_file_name <- substring(
        trace_file_name, 3, nchar(trace_file_name) - 1
      )

split the string into a vector of two-character strings representing the hexadecimal binary content of the file,

      trace_file_name <- strsplit(trace_file_name, "")[[1]]
      trace_file_name <- paste0(
        trace_file_name[c(TRUE, FALSE)],
        trace_file_name[c(FALSE, TRUE)]
      )

change the output file name to eprofile.txt, if the file corresponds to an energy_scope trace file,

      if(grepl("energy_scope", trace_file_column, fixed = TRUE)) {
        filename <- "eprofile.txt.gz"
      } else {
        filename <- paste0(basename(trace_file_column), ".gz")
      }

convert the vector of two-character strings into a raw binary form and restore the corresponding gzip-compressed file,
```
      writeBin(as.raw(as.hexmode(trace_file_name)), filename)
```
decompress the file into the current working directory and remove its compressed counterpart.
```
      gunzip(filename, overwrite = TRUE)
    }
  }
}
```

Reading rss.py trace files
The function read_rss constructs an R data frame based on memory usage and optionally on hard drive usage trace files (if the include_hdd switch is set to TRUE) located in logs_path. If a named vector is provided through the optional aesthetics argument, the function adds extra constant columns into the data frame using the names from the named vector and initialized with the values of the named vector. Adding such columns may be useful later for using aesthetics function (see Section 4.1.4) when drawing plots. If the output of the function is intended to be combined with an energy_scope trace (see Section 4.1.2.3.3), the optional override_beginning argument can be used to synchronize the output data frame according to the energy_scope trace by considering the datetime given by this argument as the alternative beginning of the acquisition. Note that the energy_scope trace always starts first as energy_scope is always launched before the other benchmarking tools.

The output data frame shall contain four columns, the timestamp column giving the time of the measure in milliseconds since the beginning of the acquisition, the corresponding consumption value, the kind of the measure (sram for memory usage or shdd for hard drive usage) and the process rank column.
```
read_rss <- function(logs_path, include_hdd = FALSE, aesthetics = NA,
                     override_beginning = NA) {
  columns <- c("timestamp", "consumption", "kind", "process")
  output <- data.frame(matrix(nrow = 0, ncol = length(columns)))
  colnames(output) <- columns
```
We begin by listing all the memory and, optionally, hard drive usage trace files in logs_path.
```
  pattern <- ifelse(include_hdd, "rss|hdd.*\\.log", "rss.*\\.log")
  logs <- list.files(
    logs_path, pattern = pattern, full.names = TRUE, include.dirs = FALSE
  )
```
Then, we read each log file, strip its last line containing the peak value, which we do not use here, and append the data into the output data frame data.

Note that in a previous version of rss.py, the trace file format was different. There was only one column in the trace file corresponding to the amount of memory or hard drive used. For backward compatibility with old trace files, we continue supporting the former trace file format too.
```
  for(log in logs) {
    count <- length(readLines(log, warn = FALSE)) - 1
    data <- read.table(log, nrow = count)

    if(ncol(data) < 2) { # Legacy format
      colnames(data) <- c("consumption")
      data$timestamp <- 0:(count - 1)
    } else {
      colnames(data) <- c("timestamp", "consumption")
      data$timestamp <- as.POSIXct(data$timestamp, format = "%Y/%m/%dT%H:%M:%S")

      start <- min(data$timestamp)
```
If the override_beginning argument was provided, we need to synchronize the timestamp column in such a ways as to consider override_beginning to be the beginning of the trace.
```
      if(!is.na(override_beginning)) {
        beginning <- as.integer(min(data$timestamp) - override_beginning)
        data$timestamp <- data$timestamp + beginning
      }
```
Then, we convert the timestamp to milliseconds and the measured value to gibibytes (GiB).
```
      data$timestamp <- (as.integer(data$timestamp) - as.integer(start)) * 1000
    }
    
    data$consumption <- as.numeric(data$consumption / 1024.)
```
Based on the name of the currently processed trace file, we determine the kind of the measure as well as the MPI process rank. We merge the data frame corresponding to the currently processed trace file with the global output data frame and continue with the other trace files, if any.
```
    base <- basename(log)
    kind <- ifelse(!include_hdd || startsWith(base, "rss"), "sram", "shdd")
    data$kind <- rep(kind, nrow(data))

    process <- gsub("([a-z]+)-(\\d+)\\.log", "\\2", base) 
    data$process <- rep(as.integer(process), nrow(data))

    output <- rbind(output, data)
  }
```
In case of multiple log files per benchmark, i.e. when a benchmark is run in a distributed fashion, we need to sum up the consumption of all the processes.
```
  processes <- max(output$process)
  if(processes > 0) {
    output <- output %>% group_by(timestamp, kind) %>%
      summarise(
        consumption = sum(consumption)
      )
  }
```
Finally, we process the aesthetics parameter, if provided, and return the output data frame.
```
  if(length(aesthetics) > 0) {
    for(aesthetic in aesthetics) {
      name <- names(aesthetics)[aesthetics == aesthetic]
      output[[name]] <- rep(as.character(aesthetic), nrow(output))
    }
  }

  return(output)
}
```
Reading likwid trace files
The function read_likwid constructs a data frame based on likwid trace files located in logs_path. If a named vector is provided through the optional aesthetics argument, the function adds extra constant columns into the data frame using the names from the named vector and initialized with the values of the named vector (see Section 4.1.2.3.1). If the output of the function is intended to be combined with an energy_scope trace (see Section 4.1.2.3.3), the optional shift argument can be used to synchronize the output data frame according to the energy_scope trace by shifting the output timestamps by the value (in millisecons) of this argument. As the measuring tool provides only timestamps relative to the beginning of the execution, we are unable to provide a more precise synchronization procedure. Therefore, we fallback to considering the beginning of the rss.py trace (see Section 4.1.2.3.1) corresponding to the same test case as t₀ for the likwid trace file.

The output data frame shall contain four columns, the timestamp column giving the time of the measure in milliseconds since the beginning of the acquisition, the corresponding consumption value, the kind of the measure (flops for floating-point operation rate and bw for memory bandwidth) and the process rank column.
```
read_likwid <- function(logs_path, aesthetics = NA, shift = 0) {
  columns <- c("timestamp", "consumption", "kind", "process")
  output <- data.frame(matrix(nrow = 0, ncol = length(columns)))
  colnames(output) <- columns
```
We begin by listing all the trace files in logs_path.
```
  pattern <- "likwid.*\\.csv"
  logs <- list.files(
    logs_path, pattern = pattern, full.names = TRUE, include.dirs = FALSE
  )
```
A likwid trace file has two or more heading lines at its beginning. The count of heading lines depends on the count of monitored hardware performance counters or groups of the latter. In our case, we use two groups of performance counters, one for floating-point operation rate and one for memory bandwidth. We thus have three heading lines in our likwid-N.csv trace files. Then, the first four columns (# GID, MetricsCount, CpuCount and Total runtime [s]) of the trace file are common for all the performance counters. Each group of performance counters may have a different number of associated metrics and thus a different number of associated columns in the trace file. The MetricsCount column gives the number of metrics and allows to compute the number of columns in the trace file associated to a given performance group. In our case, there are 6 metrics for the floating-point operation rate group and 10 for the memory bandwidth group. Moreover, there is a separate column for each processing unit and for each metric. Note that the number of processing units used during the execution is given by the CpuCount column. In summary, the trace file contains 4 common columns followed by 6 × CpuCount columns contaning the values for the metrics of the floating-point operation rate group followed by 10 × CpuCount columns containing the values for the metrics of the memory bandwidth group.

We read each log file and:
```
  for(log in logs) {
    input <- read.csv(log, header = FALSE, skip = 3)
```
- determine the count of processing units used for the corresponding test case,
```
    cpus <- as.integer(input[1, 3])
```
- create an intermediate data frame with the timestamps, the floating-point operation rate, corresponding to the metric Packed DP [MFLOP/s], and the memory bandwidth, corresponding to the metric Memory bandwidth [MBytes/s],
```
    data_flops <- data.frame(
      timestamp = input[which(input[, 2] == 6), c(4)],
      consumption = rowSums(input[which(input[, 2] == 6), c((4 + 5 * cpus + 1):(4 + 6 * cpus - 1))])
    )
    data_bw <- data.frame(
      timestamp = input[which(input[, 2] == 10), c(4)],
      consumption = rowSums(input[which(input[, 2] == 10), c((4 + 8 * cpus + 1):(4 + 9 * cpus - 1))])
    )
```
- initialize the kind columns in these separate data frames,
```
    data_flops$kind <- rep("flops", nrow(data_flops))
    data_bw$kind <- rep("bw", nrow(data_bw))
```
- merge the separate data frames again into a unique data frame,
```
    data <- rbind(data_flops, data_bw)
```
- convert the timestamp to milliseconds and synchronize it if the shift argument was provided,
```
    data$timestamp <- as.integer(round(data$timestamp, 0) * 1000)
    
    if(shift > 0) {
      data$timestamp <- data$timestamp + shift
    }
```
- determine the MPI process rank corresponding to the current trace file based on the name of the latter and initialize the rank column,
```
    process <- gsub("([a-z]+)-(\\d+)\\.csv", "\\2", basename(log))
    data$process <- rep(as.integer(process), nrow(data))
```
- merge the data frame corresponding to the currently processed trace file with the global output data frame and continue with the other trace files, if any.
```
    output <- rbind(output, data)
  }
```
In case of multiple log files per benchmark, i.e. when a benchmark is run in distributed fashion, we need to sum up the consumption of all the processes.
```
  processes <- max(output$process)
  if(processes > 0) {
    output <- output %>% group_by(timestamp, kind) %>%
      summarise(
        consumption = sum(consumption)
      )
  }
```
Finally, we process the aesthetics parameter, if provided, and return the output data frame.
```
  if(length(aesthetics) > 0) {
    for(aesthetic in aesthetics) {
      name <- names(aesthetics)[aesthetics == aesthetic]
      output[[name]] <- rep(as.character(aesthetic), nrow(output))
    }
  }

  return(output)
}
```

Reading energy_scope trace files

The function read_es constructs an R data frame based on a given energy_scope JSON data file es_data. If a named vector is provided through the optional aesthetics argument, the function adds extra constant columns into the data frame using the names from the named vector and initialized with the values of the named vector. Adding such columns may be useful later for using aesthetics function (see Section 4.1.4) when drawing plots. The data frame contains also tags. Benchmark executable may print special energy_scope tags to delimit the beginning and the end of a particular portion of computation in the energy_scope trace. If the only optional argument is provided, only the tags specified by the latter are treated. The interval argument allows one to set the interval (in milliseconds) the measures were taken in. By default, the function considers the interval of 1000 milliseconds.

The resulting data frame is composed of the following columns:

tag representing the tag label,
type indicating whether a line corresponds to a start or a stop tag,
type_label holding prettified version of type for plot drawing,
timestamp representing the time of the measure in milliseconds since the beginning of the acquisition,
consumption representing current power consumption of CPU or RAM (in Watts [W]),
kind indicating whether consumption represents power consumption of CPU or RAM,
aesthetics representing an optional named vector describing constant columns to be added to the output data frame in order to simplify plot drawing based on the latter.

Note that the format of the output data frame is compatible, in terms of columns it conains, to the format of data frames produced by the functions read_rss (see Section 4.1.2.3.1) and read_likwid (see Section 4.1.2.3.2) for reading trace files produced by the other benchmarking tools we use. This is to allow for combining different kinds of benchmark data. Indeed, one can provide this function with a data frame obtained with read_rss through the rss_data argument and extend the output data frame with the measures done by the rss.py monitoring script (see Section 3.6). Similarly, we can extend the output data frame with the measures done by likwid and extracted using the read_likwid function through the likwid_data argument. The timestamp column present also in rss_data as well as in likwid_data shall allow us to synchronize the measures.

The first step is to read the data from the JSON data file at es_data

read_es <- function(es_data, aesthetics = c(), only = NA, interval = 1000,
                    rss_data = NA, include_hdd = FALSE, likwid_data = NA) {
  es_raw <- fromJSON(file = es_data)

and extract the timestamp corresponding to the beginning of acquisition as es_start.

  es_start <- as.POSIXct(
    es_raw$data$data$tags$es_total$start, format = "%Y/%m/%d %H:%M:%S"
  )

Then, we collect CPU and RAM power consumption data into the respective data frames es_cpu and es_ram. The timestamp column is generated based on es_start. Note that the acquisition interval is one second.

  es_cpu <- data.frame(
    consumption = as.numeric(es_raw$data$data$`ecpu(W)`),
    timestamp = as.integer(0)
  )
  
  for(i in 2:nrow(es_cpu)) {
    es_cpu$timestamp[i] <- es_cpu$timestamp[i - 1] + interval
  }
  
  es_cpu$kind <- rep("ecpu", nrow(es_cpu))

  es_ram <- data.frame(
    power_total = as.numeric(es_raw$data$data$`etotal(W)`),
    power_cpu = as.numeric(es_raw$data$data$`ecpu(W)`),
    timestamp = as.integer(0)
  )

  es_ram$consumption <- es_ram$power_total - es_ram$power_cpu
  es_ram <- subset(es_ram, select = c("consumption", "timestamp"))

  for(i in 2:nrow(es_ram)) {
    es_ram$timestamp[i] <- es_ram$timestamp[i - 1] + interval
  }

  es_ram$kind <- rep("eram", nrow(es_ram))

We also make a local copy of rss_data if provided.

  rss_ram <- NA
  rss_hdd <- NA
  likwid <- NA

  if(!is.na(rss_data)) {
    rss <- read_rss(rss_data, include_hdd, aesthetics, es_start)
    rss_ram <- subset(rss, kind == "sram")

    if(include_hdd) {
      rss_hdd <- subset(rss, kind == "shdd")
    }

    actual_start <- min(rss_ram$timestamp)
    actual_end <- max(rss_ram$timestamp)
    
    es_cpu <- subset(
      es_cpu, timestamp >= actual_start & timestamp <= actual_end
    )
    es_ram <- subset(
      es_ram, timestamp >= actual_start & timestamp <= actual_end
    )
  }

  if(!is.na(likwid_data)) {
    shift <- 0
    
    if(!is.na(rss_data)) {
      shift <- min(rss_ram$timestamp)
    }
    
    likwid <- read_likwid(likwid_data, aesthetics, shift)
  }

Now, we initialize additional columns to store tag information.

  es_cpu$tag <- NA
  es_cpu$type <- NA
  es_cpu$type_label <- NA

  es_ram$tag <- NA
  es_ram$type <- NA
  es_ram$type_label <- NA

  if(!is.na(rss_data)) {
    rss_ram$tag <- NA
    rss_ram$type <- NA
    rss_ram$type_label <- NA

    if(include_hdd) {
      rss_hdd$tag <- NA
      rss_hdd$type <- NA
      rss_hdd$type_label <- NA
    }
  }

  if(!is.na(likwid_data)) {
    likwid$tag <- NA
    likwid$type <- NA
    likwid$type_label <- NA
  }

For each tag considered, we add four new lines into the es data frame, i.e. one line with start time and CPU power consumption, one for stop time and CPU power consumption, one for start time and RAM power consumption and one for stop time and RAM power consumption.

If during benchmark program execution a loop generates more than one occurrence of a given energy_scope tag, a unique identifier is appended to the latter, e.g. tag_0, tag_1, etc. Otherwise, energy_scope would discard all the data related to the tag. However, the presence of such identifiers complicates the post-treatment. Therefore, we need to strip them off the tag name tag but keep them for later usage in instance.

  for(entire_tag in names(es_raw$data$data$tags)) {
    tag <- sub("_[^_]+$", "", entire_tag)
    instance <- tail(strsplit(entire_tag, "_")[[1]], n = 1)
    instance <- suppressWarnings(as.integer(instance))
    instance <- ifelse(is.na(instance), "i", instance)

Here we use the imported JSON data file es_raw to get the power consumption values as well as the start and stop timestamps of tags.

    if(is.na(only) || tag %in% only) {
      es_start <- as.integer(es_start)
      start_time <- as.integer(
        as.POSIXct(
          es_raw$data$data$tags[[entire_tag]]$start,
          format = "%Y/%m/%d %H:%M:%S"
        )
      )
      stop_time <- as.integer(
        as.POSIXct(
          es_raw$data$data$tags[[entire_tag]]$stop,
          format = "%Y/%m/%d %H:%M:%S"
        )
      )

      begin <- (start_time - es_start) * 1000
      end <- (stop_time - es_start) * 1000

      row <- data.frame(
        tag = as.character(tag),
        type = "start",
        type_label = paste0(
          "bold(alpha)~", label_tag_short(c(tag), instance)
        ),
        timestamp = begin,
        consumption = es_cpu[
          es_cpu$timestamp == begin & is.na(as.character(es_cpu$tag)),
          "consumption"
        ],
        kind = "ecpu"
      )

      es_cpu <- rbind(es_cpu, row)

      row <- data.frame(
        tag = as.character(tag),
        type = "stop",
        type_label = paste0(
          "bold(omega)~", label_tag_short(c(tag), instance)
        ),
        timestamp = end,
        consumption = es_cpu[
          es_cpu$timestamp == end & is.na(as.character(es_cpu$tag)),
          "consumption"
        ],
        kind = "ecpu"
      )

      es_cpu <- rbind(es_cpu, row)

      row <- data.frame(
        tag = as.character(tag),
        type = "start",
        type_label = paste0(
          "bold(alpha)~", label_tag_short(c(tag), instance)
        ),
        timestamp = begin,
        consumption = es_ram[
          es_ram$timestamp == begin & is.na(as.character(es_ram$tag)),
          "consumption"
        ],
        kind = "eram"
      )

      es_ram <- rbind(es_ram, row)
  
      row <- data.frame(
        tag = as.character(tag),
        type = "stop",
        type_label = paste0(
          "bold(omega)~", label_tag_short(c(tag), instance)
        ),
        timestamp = end,
        consumption = es_ram[
          es_ram$timestamp == end & is.na(as.character(es_ram$tag)),
          "consumption"
        ],
        kind = "eram"
      )

      es_ram <- rbind(es_ram, row)
  
      if(!is.na(rss_data)) {
        consumption <- rss_ram[
          rss_ram$timestamp == begin & is.na(as.character(rss_ram$tag)),
          "consumption"
        ]
        consumption <- ifelse(length(consumption) > 0, consumption, NA)

        if(!is.na(consumption)) {
          row <- data.frame(
            tag = as.character(tag),
            type = "start",
            type_label = paste0(
              "bold(alpha)~", label_tag_short(c(tag), instance)
            ),
            timestamp = begin,
            consumption = as.numeric(consumption),
            kind = "sram"
          )
          
          rss_ram <- rbind.fill(rss_ram, row)
        }

        consumption <- rss_ram[
          rss_ram$timestamp == end & is.na(as.character(rss_ram$tag)),
          "consumption"
        ]
        consumption <- ifelse(length(consumption) > 0, consumption, NA)

        if(!is.na(consumption)) {
          row <- data.frame(
            tag = as.character(tag),
            type = "stop",
            type_label = paste0(
              "bold(omega)~", label_tag_short(c(tag), instance)
            ),
            timestamp = end,
            consumption = as.numeric(consumption),
            kind = "sram"
          )
          
          rss_ram <- rbind.fill(rss_ram, row)
        }

        if(include_hdd) {
          consumption <- rss_hdd[
            rss_hdd$timestamp == begin & is.na(as.character(rss_hdd$tag)),
            "consumption"
          ]
          consumption <- ifelse(length(consumption) > 0, consumption, NA)

          if(!is.na(consumption)) {
            row <- data.frame(
              tag = as.character(tag),
              type = "start",
              type_label = paste0(
                "bold(alpha)~", label_tag_short(c(tag), instance)
              ),
              timestamp = begin,
              consumption = as.numeric(consumption),
              kind = "shdd"
            )
        
            rss_hdd <- rbind.fill(rss_hdd, row)
          }

          consumption <- rss_hdd[
            rss_hdd$timestamp == end & is.na(as.character(rss_hdd$tag)),
            "consumption"
          ]
          consumption <- ifelse(length(consumption) > 0, consumption, NA)

          if(!is.na(consumption)) {
            row <- data.frame(
              tag = as.character(tag),
              type = "stop",
              type_label = paste0(
                "bold(omega)~", label_tag_short(c(tag), instance)
              ),
              timestamp = end,
              consumption = as.numeric(consumption),
              kind = "shdd"
            )

            rss_hdd <- rbind.fill(rss_hdd, row)
          }
        }
      }
      
      if(!is.na(likwid_data)) {
        consumption <- likwid[
          likwid$timestamp == begin & is.na(as.character(likwid$tag)),
          "consumption"
        ]
        kind <- likwid[
          likwid$timestamp == begin & is.na(as.character(likwid$tag)),
          "kind"
        ]
        consumption <- ifelse(length(consumption) > 0, consumption, NA)
        kind <- ifelse(length(kind) > 0, kind, NA)

        if(!is.na(consumption)) {
          row <- data.frame(
            tag = as.character(tag),
            type = "start",
            type_label = paste0(
              "bold(alpha)~", label_tag_short(c(tag), instance)
            ),
            timestamp = begin,
            consumption = as.numeric(consumption),
            kind = kind
          )
          
          likwid <- rbind.fill(likwid, row)
        }

        consumption <- likwid[
          likwid$timestamp == end & is.na(as.character(likwid$tag)),
          "consumption"
        ]
        kind <- likwid[
          likwid$timestamp == end & is.na(as.character(likwid$tag)),
          "kind"
        ]
        consumption <- ifelse(length(consumption) > 0, consumption, NA)
        kind <- ifelse(length(kind) > 0, kind, NA)

        if(!is.na(consumption)) {
          row <- data.frame(
            tag = as.character(tag),
            type = "stop",
            type_label = paste0(
              "bold(omega)~", label_tag_short(c(tag), instance)
            ),
            timestamp = end,
            consumption = as.numeric(consumption),
            kind = kind
          )
          
          likwid <- rbind.fill(likwid, row)
        }
      }
    }
  }

If the aesthetics argument is provided, we include constant columns described by the latter to the resulting data frame. Finally, we return a single data frame combining es_cpu, es_ram and es_rss if the rss_data argument was provided. In this case, we also include an additional column to the three data frames to distinguish between power and storage resources consumption. This is useful for plotting figures (see Section 4.1.5.13).

  if(length(aesthetics) > 0) {
    for(aesthetic in aesthetics) {
      name <- names(aesthetics)[aesthetics == aesthetic]
      es_cpu[[name]] <- rep(as.character(aesthetic), nrow(es_cpu))
      es_ram[[name]] <- rep(as.character(aesthetic), nrow(es_ram))

      if(!is.na(rss_data)) {
        rss_ram[[name]] <- rep(as.character(aesthetic), nrow(rss_ram))
        
        if(include_hdd) {
          rss_hdd[[name]] <- rep(as.character(aesthetic), nrow(rss_hdd))
        }
      }
      
      if(!is.na(likwid_data)) {
        likwid[[name]] <- rep(as.character(aesthetic), nrow(likwid))
      }
    }
  }

  es <- rbind(es_cpu, es_ram)
  es$kind_general <- "e"

  if(!is.na(likwid_data)) {
    likwid$kind_general <- likwid$kind
    es <- rbind.fill(es, likwid)
  }

  if(!is.na(rss_data)) {
    rss_ram$kind_general <- "s"

    if(include_hdd) {
      rss_hdd$kind_general <- "s"
      es <- rbind.fill(es, rbind(rss_ram, rss_hdd))
    } else {
      es <- rbind.fill(es, rss_ram)
    }
    es$timestamp <- es$timestamp - min(rss_ram$timestamp)
  }

  return(es)
}

Reading timeline trace files

The function read_timeline allows us to read a test_FEMBEM timeline trace trace, extract data about selected computation phases from the latter and return it in form of a data frame suitable for drawing axis intercepting lines with the associated labels using the rss_by_time plot function (see Section 4.1.5.12).

phases represents a data frame of selected computation phases to extract information about. Note that one computation phase label can appear more than once if the corresponding function was called multiple times. This is why the data frame contains the occurrence column (see Table 4) indicating which occurrence of a given phase should be considered. The type column indicates whether a given row in the data frame designs the beginning (ENTER) or the end (EXIT) of the corresponding computation phase. In this data frame, one can provide additional information, i.e. prettified computation phase labels for better data readability (see Table 4). The phases data frame is used to construct the output data frame of this function.

Table 4: Example of a data frame that can be passed to `read_timeline` through the `phases` argument.
type	name	label	occurrence
EXIT	mpf_solvegemm()	Block-wise Schur complement computation	1
…	…	…	…

We begin by reading the timeline trace trace.

read_timeline <- function(trace, phases) {
  complete_timeline <- read.delim(trace, header = FALSE)

For an easier column access, we add column names to the newly created data frame.

  colnames(complete_timeline) <- c("rank", "time", "type", "duration", "path")

Then, we construct the resulting timeline data frame containing only the data we are interested in based on the input phases data frame. The new data frame contains three additional columns:

time providing the corresponding computation phase delimiter timestamp,
zero a constant zero column useful for plotting vertical or horizontal axis intercepts,
middle providing the values representing the middles of the intervals defined by the values of the time column useful to center phase labels in the rss_by_time plot function (see Section 4.1.5.12).

  timeline <- phases
  timeline$time <- NA
  timeline$middle <- NA
  timeline$zero <- rep(0, nrow(timeline))
  for(i in 1:length(phases)) {
    time <- complete_timeline[
      complete_timeline$type == phases[[i]]["type"] &
      str_detect(complete_timeline$path, paste0(phases[[i]]["label"], "$")),
      "time"
    ]
    time <- time[[as.integer(phases[[i]]["occurrence"])]]
    middle <- ifelse(
      i > 1,
      time - ((time - timeline[i - 1, 1]) / 2),
      time / 2
    )
    timeline[i, "time"] <- as.integer(time)
    timeline[i, "middle"] <- as.double(middle)
  }

At the end, we return the timeline data frame.

  return(timeline)
}

4.1.3. Formatting data

4.1.3.1. Benchmark results

Before actually visualizing data, we have to format it. This is the role of the function gcvb_format which takes a single argument - data, an R data frame containing benchmark results imported by the gcvb_import function (see Section 4.1.2).

gcvb_format <- function(data) {

Due to the way certain benchmarks are defined in gcvb, the solver column may contain non-uniform values, i.e., in some cases, coupled solver names use dashes instead of slashes, which are the expected separators. For example, we need to replace the dash in mumps-spido by a slash so it becomes mumps/spido.

  data$solver <- ifelse(
    data$solver == "mumps-spido", "mumps/spido",
    ifelse(data$solver == "mumps-hmat", "mumps/hmat", data$solver)
  )

Names of columns holding the same information, such as factorization and solve time, expected result accuracy and so on, vary depending on the solver used during a given benchmark. To ease further data treatment, we fusion these into common columns.

  data$tps_facto <- ifelse(
    is.na(data$tps_cpu_facto_mpf),
    data$tps_cpu_facto,
    data$tps_cpu_facto_mpf
  )

  data$tps_solve <- ifelse(
    is.na(data$tps_cpu_solve_mpf),
    data$tps_cpu_solve,
    data$tps_cpu_solve_mpf
  )

  data$desired_accuracy <- ifelse(
    !is.na(data$mumps_blr_accuracy),
    data$mumps_blr_accuracy,
    NA
  )

  data$size_schur <- ifelse(
    is.na(data$size_schur) & data$coupled_method == "multi-facto",
    data$disk_block_size,
    data$size_schur
  )

  if("cols_schur" %in% colnames(data)) {
    data$cols_schur <- ifelse(
      data$coupled_method == "multi-solve" & data$solver == "mumps/spido",
      data$coupled_nbrhs,
      data$cols_schur
    )
  }

  data$desired_accuracy <- ifelse(
    (data$solver == "hmat-bem" | data$solver == "hmat-fem" |
    data$solver == "hmat-fem-nd") &
    !is.na(data$h_assembly_accuracy),
    data$h_assembly_accuracy,
    data$desired_accuracy
  )

Then, we must remove from the data frame the lines corresponding to unfinished jobs, i.e. lines without computation time information or where the computation time is null.

  data <- subset(data, !is.na(tps_facto))
  data <- subset(data, !is.na(tps_solve))
  data <- subset(data, tps_facto > 0.0)
  data <- subset(data, tps_solve > 0.0)

If the dense_ooc column is present, we have to take care of NA values so that we can make the column a factor. NA appears in case of purely in-core benchmarks that do not position this variable at all. Finaly, we replace NA by -1.

  if("dense_ooc" %in% colnames(data)) {
    data$dense_ooc <- ifelse(
      is.na(data$dense_ooc), -1, data$dense_ooc
    )
    data$dense_ooc <- as.character(data$dense_ooc)
    data$dense_ooc <- factor(data$dense_ooc, levels = c("-1", "0", "1"))
  }

The same goes for the energy_scope column. In this case, NA means that the tool was not used.

  if("energy_scope" %in% colnames(data)) {
    data$energy_scope <- ifelse(
      is.na(data$energy_scope), "off", data$energy_scope
    )
    data$energy_scope <- as.character(data$energy_scope)
    data$energy_scope <- factor(data$energy_scope, levels = c("off", "on"))
  }

Some required columns are not present in the data frame by default, so we have to compute them based on existing data.

The total of cores used for computation is the combination of MPI process count and thread count.

  data$p_units <- data$omp_thread_num * data$processes

The value of n_blocks represents the number of blocks the dense submatrix $A_{ss}$ is split into during the Schur complement computation in the two-stage multi-factorization implementation. When this value is not explicitly set by the user, n_blocks equals auto. Although, for the plot drawing functions to operate correctly, we need to recompute the corresponding numerical values.

  if("n_blocks" %in% colnames(data)) {
    data$n_blocks <- ifelse(
      data$n_blocks == "auto",
      ceiling(data$nbem / data$size_schur),
      as.numeric(data$n_blocks)
    )
  }

Eventually, factorization and solve efficiency computation is split into several steps:

gathering best sequential times for each solver,

  sequentials <- subset(
    data,
    select = c("solver", "tps_facto", "tps_solve", "nbpts"),
    data$p_units == 1
  )

  if(nrow(sequentials) > 0) {
    sequentials <- merge(
      aggregate(tps_facto ~ solver + nbpts, min, data = sequentials),
      sequentials
    )

    sequentials <- sequentials %>% dplyr::rename(
                                            tps_facto_seq = tps_facto,
                                            tps_solve_seq = tps_solve
                                          )

adding a column containing sequential time for every run in the data frame,

    data <- dplyr::left_join(data, sequentials, by = c("solver", "nbpts"))

computing the efficiency itself.

    data$efficiency_facto <-
      data$tps_facto_seq / (data$p_units * data$tps_facto)
    data$efficiency_solve <-
      data$tps_solve_seq / (data$p_units * data$tps_solve)
  }

  return(data)
}

4.1.4. Style presets

We also define a common set of data visualization elements to ensure a unified graphic output when drawing plots.

4.1.4.1. Label functions

Label functions allow us to format and prettify column names or values when used in plot legends.

label_percentage
This function multiplies value by 100 and returns the results as a string with appended percentage sign.
```
label_percentage <- function(value) {
  return(paste0(value * 100, "%"))
}
```

label_time

This function converts time values in seconds into more human-readable equivalents.

label_time <- function(labels) {
  out <- c()

  for (i in 1:length(labels)) {
    if(!is.na(labels[i])) {
      days <- as.integer(labels[i] / 86400)
      hours <- as.integer((labels[i] - (86400 * days)) / 3600)
      minutes <- as.integer((labels[i] - (86400 * days) - (3600 * hours)) / 60)
      seconds <- as.integer(
        labels[i] - (86400 * days) - (3600 * hours) - (60 * minutes)
      )
      
      out[i] <- ""
      
      if(days > 0) {
        out[i] <- paste0(out[i], days, "d")
      }
      
      if(hours > 0) {
        out[i] <- paste0(out[i], hours, "h")
      }
      
      if(minutes > 0 && days < 1) {
        out[i] <- paste0(out[i], minutes, "m")
      }
      
      if(seconds > 0 && days < 1 && hours < 1) {
        out[i] <- paste0(out[i], seconds, "s")
      }

      if(seconds < 1 && minutes < 1 && hours < 1 && days < 1) {
        out[i] <- "0s"
      } 
    } else {
      out[i] <- "?"
    }
  }
  
  return(out)
}

label_time_short

This function converts time values in seconds into short more human-readable equivalents.

label_time_short <- function(labels) {
  out <- c()

  for (i in 1:length(labels)) {
    if(!is.na(labels[i])) {
      days <- as.integer(labels[i] / 86400)
      hours <- as.integer((labels[i] - (86400 * days)) / 3600)
      minutes <- as.integer((labels[i] - (86400 * days) - (3600 * hours)) / 60)
      seconds <- as.integer(
        labels[i] - (86400 * days) - (3600 * hours) - (60 * minutes)
      )
      
      if(days > 0) {
        out[i] <- paste0(days, "d")
      } else if(hours > 0) {
        out[i] <- paste0(hours, "h")
      } else if(minutes > 0) {
        out[i] <- paste0(minutes, "m")
      } else if(seconds > 0) {
        out[i] <- paste0(seconds, "s")
      } else {
        out[i] <- "0s"
      } 
    } else {
      out[i] <- "?"
    }
  }
  
  return(out)
}

label_epsilon

This function converts solution accuracy threshold values from numeric values to strings having the form $1 \times 10^{-n}$ or to ‘unset (no compression)’ if the value is not assigned like in the case of solvers not using data compression.

label_epsilon <- function(labels) {
  out <- c()

  for (i in 1:length(labels)) {
    if(!is.na(labels[i])) {
      out[i] <- formatC(
        as.numeric(labels[i]),
        format = "e",
        digits = 1
      )
    } else {
      out[i] <- "unset (no compression)"
    }
  }

  return(out)
}

label_storage

This function prettifies storage support names.

label_storage <- function(labels) {
  out <- c()
  for(i in 1:length(labels)) {
    out[i] <- switch(
      as.character(labels[i]),
      "rm_peak" = "Memory (RAM)",
      "swap_peak" = "Swap (pagefile)",
      "hdd_peak" = "Disk drive"
    )
  }

  return(out)
}

label_ooc

This function prettifies the names of facet grids distinguishing between benchmarks with and without out-of-core computation (column dense_ooc in the data frame).

label_ooc <- function(labels) {
  out <- c()
  for(i in 1:length(labels)) {
    out[i] <- switch(
      labels[i],
      "-1" = "All in-core",
       "0" = "Out-of-core except Schur complement",
       "1" = "All out-of-core"
    )
  }

  return(out)
}

label_solver

This function prettifies solver names.

label_solver <- function(labels) {
  out <- c()
  for(i in 1:length(labels)) {
    out[i] <- switch(
      as.character(labels[i]),
      "spido" = "SPIDO",
      "hmat-bem" = "HMAT",
      "hmat-fem" = "HMAT",
      "hmat-fem-nd" = "HMAT (proto-ND)",
      "hmat-oss" = "HMAT-OSS",
      "chameleon" = "Chameleon",
      "mumps" = "MUMPS",
      "mumps-blr" = "MUMPS (compressed)",
      "mumps-no-blr" = "MUMPS (uncompressed)",
      "qr_mumps" = "QRM",
      "mumps/spido" = "MUMPS/SPIDO",
      "qr_mumps/spido" = "QRM/SPIDO",
      "mumps/hmat" = "MUMPS/HMAT",
      "hmat/hmat" = "HMAT"
    )
  }

  return(out)
}

label_solver_config

This function prettifies solver configuration labels.

label_solver_config <- function(labels) {
  out <- c()
  for(i in 1:length(labels)) {
    out[i] <- switch(
      as.character(labels[i]),
      "vanilla" = "without advanced features",
      "sparse-rhs" = "sparse right-hand side",
      "blr-dense-rhs" = "compression",
      "blr-sparse-rhs" = "compression and sparse right-hand side"
    )
  }

  return(out)
}

label_tag

This function prettifies energy_scope tag names.

label_tag <- function(labels) {
  out <- c()
  for(i in 1:length(labels)) {
    out[i] <- switch(
      as.character(labels[i]),
      # Multi-solve
      "CreateSchurComplement_MultiSolve_MUMPS" = expression(
        italic(S[i] == A[ss[i]] - A[sv[i]]*A[vv]^{-1}*A[sv[i]]^{T}) ~
        "(Schur compl. block)"
      ),
      "CreateSchurComplement_MultiSolve_MUMPS_solve" = expression(
        italic(A[vv]^{-1}*A[sv[i]]^{T}) ~
        "(sparse solve)"
      ),
      "CreateSchurComplement_MultiSolve_MUMPS_product" = expression(
        italic(A[sv[i]]*A[vv]^{-1}*A[sv[i]]^{T}) ~
        "(SpMM)"
      ),
      "MPF_SolveGEMM_MUMPS_HMAT_assembly" = expression(
        italic(compress * group("(", A[sv[i]]*A[vv]^{-1}*A[sv[i]]^{T}, ")")) ~
        "(SpMM)"
      ),
      "MPF_SolveGEMM_MUMPS_SPIDO_AXPY" = expression(
        italic(A[ss[i]] - A[sv[i]]*A[vv]^{-1}*A[sv[i]]^{T}) ~
        "(AXPY)"
      ),
      "MPF_SolveGEMM_MUMPS_HMAT_AXPY" = expression(
        italic(A[ss[i]] - A[sv[i]]*A[vv]^{-1}*A[sv[i]]^{T}) ~
        "(compressed AXPY)"
      ),
      "MPF_SolveGEMM_MUMPS" = expression(
        italic(S == A[ss] - A[sv]*A[vv]^{-1}*A[sv]^{T}) ~
        "(Schur complement)"
      ),
      # Multi-factorization
      "CreateSchurComplement" = expression(
        italic(S[i] == A[ss[ij]] - A[sv[i]]*A[vv]^{-1}*A[sv[j]]^{T}) ~
        "(Schur compl. block)"
      ),
      "CreateSchurComplement_factorization" = expression(
        italic(X[ij] == - A[sv[i]]*A[vv]^{-1}*A[sv[j]]^{T}) ~
        "(Schur compl. block)"
      ),
      "MPF_multifact_MUMPS" = expression(
        italic(S == A[ss] - A[sv]*A[vv]^{-1}*A[sv]^{T}) ~
        "(Schur complement)"
      ),
      "MPF_multifact_MUMPS_HMAT" = expression(
        italic(S == A[ss] - A[sv]*A[vv]^{-1}*A[sv]^{T}) ~
        "(Schur complement)"
      ),
      # Common
      "testMPF_factorisation_Schur" = expression(
        italic(S^{-1}) ~ "(Schur factorization)"
      ),
      "testMPF_factorisation" = expression(
        italic(A^{-1}) ~ "(factorization of" ~ italic(A) * ")"
      ),
      "testMPF_solve" = expression(
        italic(A^{-1}*x) ~ "(system solution)"
      )
    )
  }

  return(out)
}

label_tag_short

This function do the same thing as label_tag but returns shorter labels than the latter.

label_tag_short <- function(labels, instance = "i") {
  out <- c()
  for(i in 1:length(labels)) {
    out[i] <- switch(
      as.character(labels[i]),
      # Multi-solve
      "CreateSchurComplement_MultiSolve_MUMPS" = paste0(
        "italic(S[", instance, "])"
      ),
      "CreateSchurComplement_MultiSolve_MUMPS_solve" = paste0(
        "italic(solve[", instance, "])"
      ),
      "CreateSchurComplement_MultiSolve_MUMPS_product" = paste0(
        "italic(SpMM[", instance, "])"
      ),
      "MPF_SolveGEMM_MUMPS_HMAT_assembly" = paste0(
        "italic(compress[", instance, "])"
      ),
      "MPF_SolveGEMM_MUMPS_HMAT_AXPY" = paste0(
        "italic(CAXPY[", instance, "])"
      ),
      "MPF_SolveGEMM_MUMPS_SPIDO_AXPY" = paste0(
        "italic(AXPY[", instance, "])"
      ),
      "MPF_SolveGEMM_MUMPS" = "italic(S)",
      "CreateSchurComplement" = paste0(
        "italic(S[", instance, "])"
      ),
      # Multi-factorization
      "CreateSchurComplement_factorization" = paste0(
        "italic(X[", instance, "])"
      ),
      "MPF_multifact_MUMPS" = "italic(S)",
      "MPF_multifact_MUMPS_HMAT" = "italic(S)",
      # Common
      "testMPF_factorisation_Schur" = "italic(S^{-1})",
      "testMPF_factorisation" = "italic(A^{-1})",
      "testMPF_solve" = "italic(A^{-1}*x)"
    )
  }

  return(out)
}

label_kind

This function prettifies power and storage resources consumption kinds (see Section 4.1.2.3.3).

label_kind <- function(labels) {
  out <- c()
  for(i in 1:length(labels)) {
    out[i] <- switch(
      as.character(labels[i]),
      "ecpu" = "CPU",
      "eram" = "RAM",
      "sram" = "RAM",
      "shdd" = "Disk",
      "flops" = "Flop rate",
      "bw" = "RAM bandwidth"
    )
  }

  return(out)
}

label_mapping

This function transforms MPI process mapping values into strings giving a more detailed information about the underlying parallel configuration.

label_mapping <- function(labels) {
  out <- c()
  for(i in 1:length(labels)) {
    out[i] <- switch(
      as.character(labels[i]),
      "node" = "1 unbound MPI process \U2A09 1 to 24 threads",
      "socket" = "2 MPI processes bound to sockets \U2A09 1 to 12 threads",
      "numa" = "4 MPI processes bound to NUMAs \U2A09 1 to 6 threads",
      "core" = "1 to 24 MPI processes bound to cores \U2A09 1 thread"
    )
  }

  return(out)
}

label_nbpts

This function converts problem's unknown counts into the form N = <count> where <count> uses the scientific notation of the associated numerical values.

label_nbpts <- function(labels) {
  out <- c()

  for (i in 1:length(labels)) {
    out[i] <- paste(
      "\U1D441 =",
      as.character(
        round(as.numeric(labels[i]) / 1e+06, digits = 2)
      ),
      "M"
    )
  }

  return(out)
}

label_nbpts_ticks

This function converts problem's unknown counts into the form <count>M where <count> is converted to millions and rounded to two decimals.

label_nbpts_ticks <- function(value) {
  return(
    paste0(
      as.character(
        round(value / 1e+06, digits = 2)
      ),
      "M"
    )
  )
}

label_scientific

This function converts numeric label values to scientific notation.

label_scientific <- function(labels) {
  out <- c()

  for (i in 1:length(labels)) {
    out[i] <- formatC(
      as.numeric(labels[i]),
      format = "e",
      digits = 1
    )
  }

  return(out)
}

label_coupling and label_coupling_two_stage

These functions prettify names of the implementation schemes for the solution of coupled linear systems.

label_coupling <- function(labels) {
  out <- c()
  for(i in 1:length(labels)) {
    out[i] <- switch(
      as.character(labels[i]),
      "multi-solve" = "Multi-solve (two-stage)",
      "multi-facto" = "Multi-factorization (two-stage)",
      "full-hmat" = "Partially sparse-aware (single-stage)"
    )
  }

  return(out)
}

label_coupling_short <- function(labels) {
  out <- c()
  for(i in 1:length(labels)) {
    out[i] <- switch(
      as.character(labels[i]),
      "multi-solve" = "Multi-solve",
      "multi-facto" = "Multi-factorization",
      "full-hmat" = "Partially sparse-aware"
    )
  }

  return(out)
}

label_peak_kind

This function prettifies peak memory usage value kinds.

label_peak_kind <- function(labels) {
  out <- c()
  for(i in 1:length(labels)) {
    out[i] <- switch(
      as.character(labels[i]),
      "assembly_estimation" = expression("assembled system"),
      "schur_estimation" = expression("Schur complement matrix " * italic(S)),
      "peak_symmetric_factorization" = expression(
        "symmetric factorization of " * italic(A[vv])),
      "peak_non_symmetric_factorization" = expression(
        "non-symmetric factorization of " * italic(A[vv]))
    )
  }

  return(out)
}

4.1.4.2. Label maps

A label map is a variation of a label function. It provides alternative names for given columns of a dataframe.

phase.labs
This label map provides prettified names for factorization and solve time columns tps_facto and tps_solve, the corresponding efficiency columns efficiency_facto and efficiency_solve as well as the column holding the associated execution time of the Schur block creation phase CreateSchurComplement_exec_time.
```
phase.labs <- c(
  "Factorization",
  "Solve",
  "Factorization",
  "Solve",
  "Schur complement computation"
)

names(phase.labs) <- c(
  "tps_facto",
  "tps_solve",
  "efficiency_facto",
  "efficiency_solve",
  "CreateSchurComplement_exec_time"
)
```

4.1.4.3. Plot theme

To preserve a unique color palette as well as sets of point shapes and line types throughout all the plots, we provide each legend entry with the information on its:

color,

colors <- c(
  "0.001" = "#F07E26",
  "1e-06" = "#9B004F",
  "node" = "#1488CA",
  "socket" = "#95C11F",
  "numa" = "#FFCD1C",
  "core" = "#6561A9",
  "1 (24 threads)" = "#E63312",
  "2 (48 threads)" = "#F07E26",
  "4 (96 threads)" = "#9B004F",
  "8 (192 threads)" = "#1488CA",
  "(32, 32)" = "#F07E26",
  "(48, 48)" = "#9B004F",
  "(64, 64)" = "#1488CA",
  "(128, 128)" = "#95C11F",
  "(256, 256)" = "#E63312",
  "(256, 512)" = "#FFCD1C",
  "(512, 512)" = "#800080",
  "(256, 1024)" = "#6561A9",
  "(256, 2048)" = "#89CCCA",
  "(256, 4096)" = "#384257",
  "1" = "#1488CA",
  "2" = "#95C11F",
  "3" = "#FFCD1C",
  "4" = "#6561A9",
  "5" = "#384257",
  "7" = "#89CCCA",
  "11" = "#9B004F",
  "250000" = "#384257",
  "5e+05" = "#1488CA",
  "1e+06" = "#E63312",
  "1200000" = "#95C11F",
  "1400000" = "#FFCD1C",
  "1500000" = "#6561A9",
  "2e+06" = "#F07E26",
  "2500000" = "#9B004F",
  "multi-facto" = "#6561A9",
  "multi-solve" = "#89CCCA",
  "full-hmat" = "#C7D64F",
  "spido" = "#384257",
  "hmat" = "#9B004F",
  "mumps" = "#F07E26",
  "mumps-blr" = "#1488CA",  
  "qr_mumps" = "#95C11F",
  "mumps/spido" = "#95C11F",
  "mumps/hmat" = "#FFCD1C",
  "ecpu" = "#384257",
  "eram" = "#E63312",
  "sram" = "#1488CA",
  "shdd" = "#95C11F",
  "flops" = "#9B004F",
  "bw" = "#1488CA",  
  "vanilla" = "#1488CA",
  "sparse-rhs" = "#95C11F",
  "blr-dense-rhs" = "#FFCD1C",
  "blr-sparse-rhs" = "#6561A9"
)

point shape style,

shapes <- c(
  "spido" = 16,
  "hmat-bem" = 15,
  "hmat-fem" = 15,
  "hmat-oss" = 15,
  "hmat-fem-nd" = 23,
  "chameleon" = 18,
  "mumps" = 16,
  "qr_mumps" = 17,
  "mumps/spido" = 16,
  "qr_mumps/spido" = 17,
  "mumps/hmat" = 15,
  "hmat/hmat" = 18,
  "rm_peak" = 16,
  "swap_peak" = 17,
  "hdd_peak" = 15,
  "CreateSchurComplement_MultiSolve_MUMPS" = 22,
  "CreateSchurComplement_MultiSolve_MUMPS_solve" = 22,
  "CreateSchurComplement_MultiSolve_MUMPS_product" = 21,
  "MPF_SolveGEMM_MUMPS_HMAT_assembly" = 23,
  "MPF_SolveGEMM_MUMPS_SPIDO_AXPY" = 24,
  "MPF_SolveGEMM_MUMPS_HMAT_AXPY" = 25,
  "MPF_SolveGEMM_MUMPS" = 23,
  "CreateSchurComplement" = 22,
  "CreateSchurComplement_factorization" = 22,
  "MPF_multifact_MUMPS" = 23,
  "MPF_multifact_MUMPS_HMAT" = 23,
  "testMPF_factorisation_Schur" = 24,
  "testMPF_factorisation" = 23,
  "testMPF_solve" = 24
)

line type.

linetypes <- c(
  "spido" = "solid",
  "hmat-bem" = "dotted",
  "hmat-fem" = "dotted",
  "hmat-oss" = "dotted",
  "hmat-fem-nd" = "longdash",
  "chameleon" = "longdash",
  "mumps" = "solid",
  "qr_mumps" = "dashed",
  "mumps/spido" = "solid",
  "qr_mumps/spido" = "dashed",
  "mumps/hmat" = "dotted",
  "hmat/hmat" = "longdash",
  "rm_peak" = "solid",
  "swap_peak" = "dotted",
  "hdd_peak" = "longdash",
  "assembly_estimation" = "solid",
  "peak_symmetric_factorization" = "dotted",
  "peak_non_symmetric_factorization" = "longdash",
  "schur_estimation" = "dashed"
)

Furthermore, each plot object uses a set of common theme layers provided by the function generate_theme. If needed, the latter allows us to provide custom breaks for color, point shape or line type aesthetics, legend title, the number of lines allowed in the legend as well as the legend box placement.

generate_theme <- function(
  color_breaks = waiver(),
  color_labels = waiver(),
  color_override_aes = list(),
  shape_breaks = waiver(),
  shape_labels = waiver(),
  shape_override_aes = list(),
  linetype_breaks = waiver(),
  linetype_labels = waiver(),
  linetype_override_aes = list(),
  legend_title = element_text(family = "CMU Serif", size = 14, face = "bold"),
  legend_text = element_text(family = "CMU Serif", size = 14),
  theme_text = element_text(family = "CMU Serif", size = 16),
  legend_rows = 1,
  legend_rows_color = NA,
  legend_rows_fill = NA,
  legend_rows_shape = NA,
  legend_rows_linetype = NA,
  legend_box = "vertical",
  legend_position = "bottom"
) {
  return(list(
    scale_color_manual(
      values = colors,
      na.value = "#384257",
      labels = color_labels,
      breaks = color_breaks
    ),
    scale_fill_manual(
      values = colors,
      na.value = "#384257",
      labels = color_labels,
      breaks = color_breaks
    ),
    scale_shape_manual(
      values = shapes,
      labels = shape_labels,
      breaks = shape_breaks,
      na.translate = F
    ),
    scale_linetype_manual(
      values = linetypes,
      labels = linetype_labels,
      breaks = linetype_breaks
    ),
    guides(
      shape = guide_legend(
        order = 1,
        nrow = ifelse(
          is.na(legend_rows_shape),
          legend_rows,
          legend_rows_shape
        ),
        byrow = TRUE,
        override.aes = shape_override_aes
      ),
      linetype = guide_legend(
        order = 1,
        nrow = ifelse(
          is.na(legend_rows_linetype),
          legend_rows,
          legend_rows_linetype
        ),
        byrow = TRUE,
        override.aes = linetype_override_aes
      ),
      color = guide_legend(
        order = 2,
        nrow = ifelse(
          is.na(legend_rows_color),
          legend_rows,
          legend_rows_color
        ),
        byrow = TRUE,
        override.aes = color_override_aes
      ),
      fill = guide_legend(
        order = 2,
        nrow = ifelse(
          is.na(legend_rows_fill),
          legend_rows,
          legend_rows_fill
        ),
        byrow = TRUE
      )
    ),

We set the font family to Computer Modern Serif in order to match the font family used in LaTeX typeset PDF documents. Then, we set the font size to 16 points and the legend text size to 14 points.

    theme_bw(),
    theme(
      text = theme_text,
      legend.text = legend_text,
      legend.title = legend_title,

We place the legend at the desired position and sourround it with a rectangle.

      legend.position = legend_position,
      legend.background = element_rect(color = "gray40", size = 0.5),

We place legend items either side by side or one by line.

      legend.box = legend_box,

For better visibility, we make legend symbols longer.

      legend.key.width = unit(3, "line"),

Finally, we rotate X-axis text to avoid long label overwriting.

      axis.text.x = element_text(angle = 60, hjust = 1)
    )
  ))
}

4.1.5. Plot functions

In performance analysis, the metrics we study are in general the same. So, we prepared a series of plot functions for each type of data visualization. This way, we can for example reuse the same function to plot disk usage peaks by problem unknown's count for any combination of solvers.

4.1.5.1. `times_by_nbpts`

This function returns a plot of factorization and solve times relatively to linear system's unknown count for a given set of either dense or sparse solvers.

We combine both metrics in one plot in which they are distinguished thanks to a facet grid. Although, to be able to use the latter, we need to convert the data frame from wide format into long format (see Section 4.1.2) RwideToLong.

The solver_config option allows one to distinguish between different solver configurations on the figure instead of different precision parameter $\epsilon$ values.

By default, the trans_x and trans_y arguments set the axis scales to logarithmic using log10. To use standard scale, the value identity should be used. The time_break_width allows one to set custom Y-axis break widths (in seconds).

times_by_nbpts <- function(dataframe, solver_config = FALSE,
                           trans_x = "log10", trans_y = "log10",
                           time_break_width = 3600) {
  dataframe_long <- gather(
    dataframe,
    key = "computation_phase",
    value = "time",
    c("tps_facto", "tps_solve")
  )

Now, we can begin to define the resulting ggplot2 plot object. The column names featured in this plotting function are:

nbpts linear system's unknown count,
tps_facto represents factorization time in seconds,
tps_solve represents solve time in seconds,
solver_config gives the solver configuration used,
desired_accuracy represents solution accuracy threshold $\epsilon$,
computation_phase tells whether the row contains factorization or solve time (used in the long formatted data frame dataframe_long),
time contains factorization or solve time (used in the long formatted data frame dataframe_long).

  plot <- ggplot(dataframe_long, aes(x = nbpts, y = time))

If solver_config is TRUE, we want to distinguish between different solver configurations.

  if(solver_config) {
    plot <- ggplot(
      dataframe_long,
      aes(
        x = nbpts,
        y = time,
        color = as.character(solver_config),
        shape = solver,
        linetype = solver
      )
    )
  } else if(!all(is.na(dataframe_long$desired_accuracy))) {

Otherwise, we will try to distinguish between various data compression levels.

    plot <- ggplot(
      dataframe_long,
      aes(
        x = nbpts,
        y = time,
        color = as.character(desired_accuracy),
        shape = solver,
        linetype = solver
      )
    )
  }

Then, we scale the axes.

  plot <- plot +
    geom_line() +
    geom_point(size = 2.5) +
    scale_x_continuous(
      "# Unknowns (\U1D441)",
      trans = trans_x,
      breaks = dataframe_long$nbpts,
      labels = label_nbpts_ticks
    ) +
    scale_y_continuous(
      "Computation time",
      trans = trans_y,
      breaks = breaks_width(time_break_width),
      labels = label_time
    ) +

Finally, using facet grid we distinguish between factorization and solve time on horizontal and between each solver considered on vertical row. The plot theme is provided by the generate_theme function (see Section 4.1.4.3).

The color label functions depends on the value of solver_config.

    facet_grid(
      . ~ computation_phase,
      labeller = labeller(computation_phase = phase.labs)
    )
  if(solver_config) {
    plot <- plot +
      labs(color = "Configuration", shape = "Solver", linetype = "Solver") +
      generate_theme(
        color_labels = label_solver_config,
        color_override_aes = list(linetype = 0, shape = 15, size = 8),
        shape_labels = label_solver,
        linetype_labels = label_solver,
        legend_rows = length(unique(na.omit(dataframe_long$solver_config))) / 2
      )
  } else {
    plot <- plot +
      labs(color = "\U1D700", shape = "Solver", linetype = "Solver") +
      generate_theme(
        color_labels = label_epsilon,
        color_override_aes = list(linetype = 0, shape = 15, size = 8),
        shape_labels = label_solver,
        linetype_labels = label_solver
      )
  }

  return(plot)
}

4.1.5.2. `multisolve_times_by_nbrhs_and_nbpts`

This function returns a plot of factorization time relatively to linear system's unknown count for different counts of right-hand sides used during the Schur complement computation when relying on the two-stage multi-solve implementation scheme. Using the ooc switch, we can include a facet grid to distinguish between runs with and without out-of-core Schur complement computation. The distributed switch shall adapt the output figure for multi-node benchmarks if set to TRUE. y_break_width allows one to customize the distance between Y-axis breaks. By default, it corresponds to 20% of the maximum Y-axis value, i.e. the maximum value of the tps_facto column.

Now, we can begin to define the plot object. The column names featured in this plotting function are:

nbpts linear system's unknown count,
tps_facto represents factorization time in seconds,
tps_solve represents solve time in seconds,
processes gives the number of MPI processes used for the computation,
p_units gives the total number of threads used for the computation,
cols_schur gives the number of columns in a Schur complement block,
solver contains the names of solvers featured in the coupling.

Note that we create an additional column p_labels. It contains the legend labels for the color aesthetics combining the values of processes and p_units, e.g. 2 (48 threads) for 2 nodes and 48 threads in total.

multisolve_times_by_nbpts_and_nbrhs <- function(dataframe, ooc = FALSE,
                                                distributed = FALSE) {
  dataframe$cols_schur_labels <- paste0(
    "(", dataframe$coupled_nbrhs, ", ", dataframe$cols_schur, ")"
  )
  
  cbreaks <- unique(na.omit(dataframe$cols_schur_labels))
  cbreaks <- cbreaks[order(
    as.integer(gsub("\\(([0-9]+), ([0-9]+)\\)",
                    "\\1", cbreaks)),
    as.integer(gsub("\\(([0-9]+), ([0-9]+)\\)", "\\2", cbreaks)))]

  if(distributed) {
    dataframe$p_labels <- paste0(
      dataframe$processes, " (", dataframe$p_units, " threads)"
    )
    plot <- ggplot(
      dataframe,
      aes(
        x = nbpts,
        y = tps_facto,
        color = p_labels,
        shape = solver,
        linetype = solver
      )
    )
  } else {
    plot <- ggplot(
      dataframe,
      aes(
        x = nbpts,
        y = tps_facto,
        color = as.character(cols_schur_labels),
        shape = solver,
        linetype = solver
      )
    )
  }
  plot <- plot +
    geom_line() +
    geom_point(size = 2.5) +

The X-axis shows the count of unknowns in the linear system and the Y-axis shows the computation time.

    scale_x_continuous(
      "# Unknowns (\U1D441)",
      breaks = dataframe$nbpts,
      labels = label_nbpts_ticks
    ) +
    scale_y_continuous(
      "Factorization time [h]",
      breaks = breaks_log(n = 8, base = 10),
      trans = "log10",
      labels = label_time
    )

If the ooc switch is set to TRUE, include a facet grid based on the dense_ooc column values.

  if(ooc) {
    plot <- plot +
      facet_grid(
        . ~ dense_ooc,
        labeller = labeller(dense_ooc = label_ooc)
      )
  }

Finally, we set the legend title, apply the custom theme and return the plot object. The color breaks are automatically determined based on the values of the cols_schur column.

  if(distributed) {
    plot <- plot +
      labs(
        shape = "Solver coupling",
        linetype = "Solver coupling",
        color = "# Nodes"
      ) +
      generate_theme(
        color_breaks = as.character(sort(unique(na.omit(dataframe$p_labels)))),
        shape_labels = label_solver,
        linetype_labels = label_solver
      )
  } else {
    plot <- plot +
      labs(
        shape = "Solver coupling",
        linetype = "Solver coupling",
        color = expression(bold(group("(", list(n[italic(c)], n[italic(S)]), ")")))
      ) +
      generate_theme(
        color_breaks = cbreaks,
        shape_labels = label_solver,
        linetype_labels = label_solver
      )
  }

  return(plot)
}

4.1.5.3. `multisolve_rss_by_nbrhs_and_nbpts`

This function returns a plot of real memory (RAM) usage peaks relatively to linear system's unknown count for different counts of right-hand sides used during the Schur complement computation when relying on the two-stage multi-solve implementation scheme. The function can take two extra arguments, limit_max and tick_values, allowing to redefine the default Y-axis maximum and tick values respectively. Using the ooc switch, we can include a facet grid to distinguish between runs with and without out-of-core Schur complement computation as well as between RAM and disk space consumption.

The column names featured in this plotting function are:

nbpts linear system's unknown count,
rm_peak real memory usage peaks, stored in mibibytes (MiB) but converted to gibibytes (GiB),
hdd_peak hard drive usage peaks, stored in mibibytes (MiB) but converted to gibibytes (GiB),
processes gives the number of MPI processes used for the computation,
p_units gives the total number of threads used for the computation,
cols_schur gives the number of columns in a Schur complement block,
solver contains the names of solvers featured in the coupling.

We begin by defining the plot object directly.

multisolve_rss_by_nbpts_and_nbrhs <- function(dataframe, limit_max = 126,
                                              tick_values = c(
                                                0, 30, 60, 90, 120),
                                              ooc = FALSE,
                                              distributed = FALSE
                                              ) {
  df <- dataframe
  df$cols_schur_labels <- paste0(
    "(", df$coupled_nbrhs, ", ", df$cols_schur, ")"
  )

  cbreaks <- unique(na.omit(df$cols_schur_labels))
  cbreaks <- cbreaks[order(as.integer(gsub("\\(([0-9]+), ([0-9]+)\\)", "\\1", cbreaks)),as.integer(gsub("\\(([0-9]+), ([0-9]+)\\)", "\\2", cbreaks)))]

  y_axis_label <- "Real memory (RAM) usage peak [GiB]"

  if(distributed) {
    df$p_labels <- paste0(
      df$processes, " (", df$p_units, " threads)"
    )

    plot <- ggplot(
      df,
      aes(
        x = nbpts,
        y = rm_peak / 1024.,
        color = p_labels,
        shape = solver,
        linetype = solver
      )
    )
  } else if(ooc) {
    rss_kinds <- c("rm_peak", "hdd_peak")
    df <- gather(
      df,
      key = "rss_kind", value = "rss_peak",
      all_of(rss_kinds)
    )
    df$rss_kind <- factor(df$rss_kind, levels = rss_kinds)
    y_axis_label <- "Storage usage peak [GiB]"
    plot <- ggplot(
      df,
      aes(
        x = nbpts,
        y = rss_peak / 1024.,
        color = as.character(cols_schur_labels),
        shape = solver,
        linetype = solver
      )
    )
  } else {
    plot <- ggplot(
      df,
      aes(
        x = nbpts,
        y = rm_peak / 1024.,
        color = as.character(cols_schur_labels),
        shape = solver,
        linetype = solver
      )
    )
  }

  plot <- plot +
    geom_line() +
    geom_point(size = 2.5) +

The X-axis shows the count of unknowns in the linear system and the Y-axis shows the RAM usage peaks. On the Y-axis, we round the values to 0 decimal places.

    scale_x_continuous(
      "# Unknowns (\U1D441)",
      breaks = df$nbpts,
      labels = label_nbpts_ticks
    ) +
    scale_y_continuous(
      y_axis_label,
      labels = function(label) sprintf("%.0f", label),
      limits = c(NA, limit_max),
      breaks = tick_values
    )

If the ooc switch is set to TRUE, include a facet grid based on the rss_kind and the dense_ooc column values.

  if(ooc) {
    plot <- plot +
      facet_grid(
        rss_kind ~ dense_ooc,
        labeller = labeller(rss_kind = label_storage, dense_ooc = label_ooc)
      )
  }

Finally, we set the legend title, apply the custom theme and return the plot object. The color breaks are automatically determined based on the values of the cols_schur column.

  if(distributed) {
    plot <- plot +
      labs(
        shape = "Solver coupling",
        linetype = "Solver coupling",
        color = "# Nodes"
      ) +
      generate_theme(
        color_breaks = as.character(sort(unique(na.omit(df$p_labels)))),
        shape_labels = label_solver,
        linetype_labels = label_solver
      )
  } else {
    plot <- plot +
      labs(
        shape = "Solver coupling",
        linetype = "Solver coupling",
        color = expression(bold(group("(", list(n[italic(c)], n[italic(S)]), ")")))
      ) +
      generate_theme(
        color_breaks = cbreaks,
        shape_labels = label_solver,
        linetype_labels = label_solver
      )
  }

  return(plot)
}

4.1.5.4. `multisolve_memory_aware`

This function returns a plot of factorization time relatively to residual memory (RAM) usage peaks during the Schur complement computation when relying on the two-stage multi-solve implementation scheme. In this case, we show the results for a fixed problem's size, i.e. my_nbpts. narrow optimizes the output for multi-column document layout. multi_node optimizes the output for multi-node benchmark results. hreadable makes the Y-axis values more human readable.

The column names featured in this plotting function are:

tps_facto represents factorization time in seconds,
rm_peak real memory usage peaks, stored in mibibytes (MiB) but converted to gibibytes (GiB),
solver contains the names of solvers featured in the coupling.

We begin by shrinking the data frame on input according to my_nbpts.

multisolve_memory_aware <- function(dataframe, my_nbpts, narrow = FALSE,
                                    font = "CMU Serif", fontsize = 16,
                                    multi_node = FALSE, hreadable = FALSE) {
  dataframe_largest <- NA
  y_title <- "Factorization time [s]"
  y_labeller <- scientific

  if(hreadable) {
    y_title <- "Factorization time"
    y_labeller <- label_time_short
  }

  if(multi_node) {
    dataframe_largest <- subset(
      dataframe, coupled_method == "multi-solve"
    )
  } else {
    dataframe_largest <- subset(
      dataframe, coupled_method == "multi-solve" & nbpts == my_nbpts
    )
  }

Then, we create an additional column in the data frame to hold prettified labels providing the Schur complement computation phase parameter values.

  if(multi_node) {
    dataframe_largest$label_content <- ifelse(
      dataframe_largest$solver != "mumps/spido",
      paste0(
        "group(\"(\", list(", dataframe_largest$coupled_nbrhs, ", ",
        "", dataframe_largest$cols_schur, "), \")\")"
      ),
      paste0(
        "group(\"(\", list(", dataframe_largest$coupled_nbrhs, ", ",
        "", dataframe_largest$coupled_nbrhs, "), \")\")"
      )
    )
  } else {
    dataframe_largest$label_content <- ifelse(
      dataframe_largest$solver != "mumps/spido",
      paste0(
        "atop(n[c]==", dataframe_largest$coupled_nbrhs, ", ",
        "n[S]==", dataframe_largest$cols_schur, ")"
      ),
      paste0("n[c]==", dataframe_largest$coupled_nbrhs)
    )
  }

Now, we define the plot object.

  plot <- NA

  if(multi_node) {
    dataframe_largest$p_labels <- paste0(
      dataframe_largest$processes, " (", dataframe_largest$p_units, " threads)"
    )
    plot <- ggplot(
      dataframe_largest,
      aes(
        x = rm_peak / 1024.,
        y = tps_facto,
        color = p_labels,
        label = label_content,
        shape = solver,
        linetype = solver
      )
    )
  } else {
    plot <- ggplot(
      dataframe_largest,
      aes(
        x = rm_peak / 1024.,
        y = tps_facto,
        color = coupled_method,
        label = label_content,
        shape = solver,
        linetype = solver
      )
    )
  }
  plot <- plot +
    geom_line(size = 0.8) +
    geom_point(size = 2.5) +
    coord_cartesian(clip = "off") +

We make use of the geom_label_repel function from the ggrepel R package to obtain better label layout.

    geom_label_repel(
      parse = TRUE,
      xlim = c(NA, NA), ylim = c(NA, NA),
      color = "black",
      family = font,
      size = 3,
      min.segment.length = 0,
      max.overlaps = Inf
    ) +

The X-axis shows the RAM usage peaks and the Y-axis shows the count of unknowns in the linear system. We use custom break and limit values for better readability.

    scale_x_continuous(
      "Random access memory (RAM) usage peak [GiB]",
      breaks = breaks_extended(10),
      limits = c(NA, NA),
      trans = "log10"
    ) +
    scale_y_continuous(
      y_title,
      trans = "log10",
      labels = y_labeller,
      breaks = breaks_log(6)
    )

Finally, we set the legend title, apply the custom theme and return the plot object. Note that we do not show the legend for the color aesthetics as there is only one item, the multi-solve implementation scheme. However, we still define the color aesthetics in order to preserve the general color scheme.

  if(narrow) {
    plot <- plot +
      labs(
        shape = "Solver coupling",
        linetype = "Solver coupling"
      ) +
      generate_theme(
        shape_labels = label_solver,
        shape_override_aes = list(color = colors["multi-solve"]),
        linetype_labels = label_solver,
        linetype_override_aes = list(color = colors["multi-solve"]),
        legend_rows = 2,
        legend_box = "horizontal",
        legend_title = element_text(
          family = font, size = fontsize - 2, face = "bold"
        ),
        legend_text = element_text(family = font, size = fontsize - 2),
        theme_text = element_text(family = font, size = fontsize)
      ) +
      guides(
        color = FALSE,
        fill = FALSE
      )
  } else if(multi_node) {
    plot <- plot +
      facet_wrap(
        . ~ nbpts,
        labeller = labeller(nbpts = label_nbpts),
        scales = "free"
      ) +
      labs (
        shape = "Solver coupling",
        linetype = "Solver coupling",
        color = "Parallel configuration"
      ) +
      generate_theme(
        color_breaks = as.character(
          sort(unique(na.omit(dataframe_largest$p_labels)))
        ),
        shape_labels = label_solver,
        linetype_labels = label_solver,
        legend_title = element_text(
          family = font, size = fontsize - 2, face = "bold"
        ),
        legend_text = element_text(family = font, size = fontsize - 2),
        theme_text = element_text(family = font, size = fontsize)
      ) +
      guides(
        fill = FALSE
      )
  } else {
    plot <- plot +
      labs(
        shape = "Solver coupling",
        linetype = "Solver coupling"
      ) +
      generate_theme(
        shape_labels = label_solver,
        shape_override_aes = list(color = colors["multi-solve"]),
        linetype_labels = label_solver,
        linetype_override_aes = list(color = colors["multi-solve"]),
        legend_title = element_text(
          family = font, size = fontsize - 2, face = "bold"
        ),
        legend_text = element_text(family = font, size = fontsize - 2),
        theme_text = element_text(family = font, size = fontsize)
      ) +
      guides(
        color = FALSE,
        fill = FALSE
      )
  }

  return(plot)
}

4.1.5.5. `multifacto_memory_aware`

This function returns a plot of factorization time relatively to residual memory (RAM) usage peaks during the Schur complement computation when relying on the two-stage multi-factorization implementation scheme. In this case, we show the results for a fixed problem's size, i.e. my_nbpts. narrow optimizes the output for multi-column document layout. hreadable makes the Y-axis values more human readable.

The column names featured in this plotting function are:

tps_facto represents factorization time in seconds,
rm_peak real memory usage peaks, stored in mibibytes (MiB) but converted to gibibytes (GiB),
solver contains the names of solvers featured in the coupling.

We begin by shrinking the data frame on input according to my_nbpts.

multifacto_memory_aware <- function(dataframe, my_nbpts, narrow = FALSE,
                                    font = "CMU Serif", fontsize = 16,
                                    hreadable = FALSE) {
  y_title <- "Factorization time [s]"
  y_labeller <- scientific

  if(hreadable) {
    y_title <- "Factorization time"
    y_labeller <- label_time_short
  }
  
  dataframe_largest = subset(
    dataframe, coupled_method == "multi-facto" & nbpts == my_nbpts
  )

Before plotting, we also need to:

convert the n_blocks column from string to numeric type and use it to create a corresponding prettified label column,

  dataframe_largest$n_blocks = as.numeric(dataframe_largest$n_blocks)
  dataframe_largest$label_content = paste0(
    "n[b]==", dataframe_largest$n_blocks
  )

convert the values of rm_peak to gibibytes (GiB),

  dataframe_largest$rm_peak = as.numeric(dataframe_largest$rm_peak / 1024.)

order the rows of the data frame based on the n_blocks column for the data points to be correctly connected by line segments on the final plot.

  dataframe_largest <- dataframe_largest[order(dataframe_largest$n_blocks),]

Then, we define the plot object. We also need to find the minimal and the maximal values for the axes (columns rm_peak and tps_facto).

  rm_min = as.integer(min(dataframe_largest$rm_peak))
  rm_max = as.integer(max(dataframe_largest$rm_peak))
  tps_min = as.integer(min(dataframe_largest$tps_facto))
  tps_max = as.integer(max(dataframe_largest$tps_facto))
  plot = ggplot(
    dataframe_largest,
    aes(
      x = rm_peak,
      y = tps_facto,
      color = coupled_method,
      label = label_content,
      shape = solver,
      linetype = solver
    )
  ) +
  geom_path() +
  geom_point(size = 2.5) +
  coord_cartesian(clip = "off") +

We make use of the geom_label_repel function from the ggrepel R package to obtain better label layout.

  geom_label_repel(
    parse = TRUE,
    xlim = c(NA, NA), ylim = c(NA, NA),
    color = "black",
    family = font,
    size = 3,
    min.segment.length = 0,
    max.overlaps = Inf
  ) +

The X-axis shows the RAM usage peaks and the Y-axis shows the count of unknowns in the linear system. We use custom break and limit values for better readability.

  scale_x_continuous(
    "Random access memory (RAM) usage peak [GiB]",
    breaks = seq(
      rm_min - 5,
      rm_max + 5,
      by = 5
    ),
    limits = c(NA, NA),
    trans = "log10"
  ) +
  scale_y_continuous(
    y_title,
    trans = "log10",
    labels = y_labeller,
    breaks = breaks_log(6)
  ) +

Finally, we set the legend title, apply the custom theme and return the plot object. Note that we do not show the legend for the color aesthetics as there is only one item, the multi-factorization implementation scheme. However, we still define the color aesthetics in order to preserve the general color scheme.

  labs(
    shape = "Solver coupling",
    linetype = "Solver coupling"
  )
  if(narrow) {
    plot <- plot +
      generate_theme(
        shape_labels = label_solver,
        shape_override_aes = list(color = colors["multi-facto"]),
        linetype_labels = label_solver,
        linetype_override_aes = list(color = colors["multi-facto"]),
        legend_rows = 2,
        legend_box = "horizontal",
        legend_title = element_text(
          family = font, size = fontsize - 2, face = "bold"
        ),
        legend_text = element_text(family = font, size = fontsize - 2),
        theme_text = element_text(family = font, size = fontsize)
      )
  } else {
    plot <- plot +
      generate_theme(
        shape_labels = label_solver,
        shape_override_aes = list(color = colors["multi-facto"]),
        linetype_labels = label_solver,
        linetype_override_aes = list(color = colors["multi-facto"]),
        legend_title = element_text(
          family = font, size = fontsize - 2, face = "bold"
        ),
        legend_text = element_text(family = font, size = fontsize - 2),
        theme_text = element_text(family = font, size = fontsize)
      )
  }

  plot <- plot +
    guides(
      color = FALSE,
      fill = FALSE
    )

  return(plot)
}

4.1.5.6. `multifacto_times_by_nbpts_and_schur_size`

This function returns a plot of factorization time relatively to linear system's unknown count and the number of blocks $n_b$ the Schur submatrix is split into during the Schur complement computation phase when relying on the two-stage multi-factorization implementation scheme.

The out-of-core block size column is not the same for all the solver couplings we consider (column disk_block_size in case of MUMPS/SPIDO and schur_size in case of MUMPS/HMAT), so we need to copy the data to the same column, size_schur.

multifacto_times_by_nbpts_and_schur_size <- function(dataframe,
                                                     ooc = FALSE,
                                                     distributed = FALSE) {
  local <- dataframe
  local$size_schur[local$solver == "mumps/spido"] =
    local$disk_block_size[local$solver == "mumps/spido"]

Now, we can begin to define the plot object. The column names featured in this plotting function are:

nbpts linear system's unknown count,
tps_facto represents factorization time in seconds,
n_blocks gives the count of blocks the Schur complement matrix was split to,
size_schur gives the out-of-core block size in number of matrix lines or columns
solver contains the names of solvers featured in the coupling.

  if(distributed) {
    local$p_labels <- paste0(
      local$processes, " (", local$p_units, " threads)"
    )
    plot <- ggplot(
      local,
      aes(
        x = nbpts,
        y = tps_facto,
        color = p_labels,
        shape = solver,
        linetype = solver
      )
    )
  } else {
    plot <- ggplot(
      local,
      aes(
        x = nbpts,
        y = tps_facto,
        color = as.character(n_blocks),
        shape = solver,
        linetype = solver
      )
    )
  }
  plot <- plot +
    geom_line() +
    geom_point(size = 2.5) +

The X-axis shows the count of unknowns in the linear system and the Y-axis shows the computation time.

    scale_x_continuous(
      "# Unknowns (\U1D441)",
      breaks = local$nbpts,
      labels = label_nbpts_ticks
    ) +
    scale_y_continuous(
      "Factorization time [h]",
      breaks = breaks_log(n = 8, base = 10),
      trans = "log10",
      labels = label_time
    )

If the ooc switch is set to TRUE, include a facet grid based on the dense_ooc column values.

  if(ooc) {
    plot <- plot +
      facet_grid(
        . ~ dense_ooc,
        labeller = labeller(dense_ooc = label_ooc)
      )
  }

Finally, we set the legend title, apply the custom theme and return the plot object. The color breaks are automatically determined based on the values of the nbpts column.

  if(distributed) {
    plot <- plot +
      labs(
        shape = "Solver coupling",
        linetype = "Solver coupling",
        color = "# Nodes"
      ) +
      generate_theme(
        color_breaks = as.character(sort(unique(na.omit(local$p_labels)))),
        shape_labels = label_solver,
        linetype_labels = label_solver
      )
  } else {
    plot <- plot +
      labs(
        shape = "Solver coupling",
        linetype = "Solver coupling",
        color = expression(n[italic(b)])
      ) +
      generate_theme(
        color_breaks = as.character(sort(unique(na.omit(local$n_blocks)))),
        shape_labels = label_solver,
        linetype_labels = label_solver
      )
  }

  return(plot)
}

4.1.5.7. `multifacto_rss_by_nbpts_and_schur_size`

This function returns a plot of real memory (RAM) usage peaks relatively to linear system's unknown count and the number of blocks $n_b$ the Schur submatrix is split into during the Schur complement computation phase when relying on the two-stage multi-factorization implementation scheme. The function can take two extra arguments, limit_max and tick_values, allowing to redefine the default Y-axis maximum and tick values respectively.

multifacto_rss_by_nbpts_and_schur_size <- function(dataframe, limit_max = 126,
                                                   tick_values = c(
                                                     0, 30, 60, 90, 120),
                                                   ooc = FALSE,
                                                   distributed = FALSE) {
  local <- dataframe
  local$size_schur[local$solver == "mumps/spido"] =
    local$disk_block_size[local$solver == "mumps/spido"]

Now, we can begin to define the plot object. The column names featured in this plotting function are:

nbpts linear system's unknown count,
rm_peak real memory usage peaks, stored in mibibytes (MiB) but converted to gibibytes (GiB),
n_blocks gives the count of blocks the Schur complement matrix was split to,
size_schur gives the out-of-core block size in number of matrix lines or columns
solver contains the names of solvers featured in the coupling.

  if(distributed) {
    local$p_labels <- paste0(
      local$processes, " (", local$p_units, " threads)"
    )
    
    plot <- ggplot(
      local,
      aes(
        x = nbpts,
        y = rm_peak / 1024.,
        color = as.character(p_labels),
        shape = solver,
        linetype = solver
      )
    )
  } else if(ooc) {
    local <- gather(
      local,
      key = "rss_kind", value = "rss_peak",
      c("rm_peak", "hdd_peak")
    )
    plot <- ggplot(
      local,
      aes(
        x = nbpts,
        y = rss_peak / 1024.,
        color = as.character(n_blocks),
        shape = solver,
        linetype = solver
      )
    )
  } else {
    plot <- ggplot(
      local,
      aes(
        x = nbpts,
        y = rm_peak / 1024.,
        color = as.character(n_blocks),
        shape = solver,
        linetype = solver
      )
    )
  }
  plot <- plot +
    geom_line() +
    geom_point(size = 2.5) +

The X-axis shows the count of unknowns in the linear system and the Y-axis shows the RAM usage peaks. On the Y-axis, we round the values to 0 decimal places.

    scale_x_continuous(
      "# Unknowns (\U1D441)",
      breaks = local$nbpts,
      labels = label_nbpts_ticks
    ) +
    scale_y_continuous(
      "Real memory (RAM) usage peak [GiB]",
      labels = function(label) sprintf("%.0f", label),
      limits = c(NA, limit_max),
      breaks = tick_values
    )

If the ooc switch is set to TRUE, include a facet grid based on the rss_kind and the dense_ooc column values.

  if(ooc) {
    plot <- plot +
      facet_grid(
        rss_kind ~ dense_ooc,
        labeller = labeller(rss_kind = label_storage, dense_ooc = label_ooc)
      )
  }

Finally, we set the legend title, apply the custom theme and return the plot object. The color breaks are automatically determined based on the values of the nbpts column.

  if(distributed) {
    plot <- plot +
      labs(
        shape = "Solver coupling",
        linetype = "Solver coupling",
        color = "# Nodes"
      ) +
      generate_theme(
        color_breaks = as.character(sort(unique(na.omit(local$p_labels)))),
        shape_labels = label_solver,
        linetype_labels = label_solver
      )
  } else {
    plot <- plot +
      labs(
        shape = "Solver coupling",
        linetype = "Solver coupling",
        color = expression(n[italic(b)])
      ) +
      generate_theme(
        color_breaks = as.character(sort(unique(na.omit(local$n_blocks)))),
        shape_labels = label_solver,
        linetype_labels = label_solver
      )
  }

  return(plot)
}

4.1.5.8. `compare_coupled`

This function returns a plot of the best factorization times relatively to linear system's unknown count for each implementation scheme for solving coupled systems.

For example, if we want to plot only the results regarding two-stage schemes, we can use the legend_rows argument to adjust the number of rows in the legend. Then, the shorten argument allows to remove legend titles and use shorter legend key labels in order to reduce the final canvas size. check_epsilon can be used to show relative error instead of factorization times on the Y-axis. hreadable makes the Y-axis values more human readable.

We begin by restraining the input data frame to the best results of two-stage and single-stage schemes.

Note that as the latter imply the use of only one solver and not a coupling of solvers, we need to specify the coupling method name manually to enable data frame treatment.

compare_coupled <- function(dataframe, legend_rows = 3, shorten = FALSE,
                            check_epsilon = FALSE, font = "CMU Serif",
                            fontsize = 16, hreadable = FALSE) {
  y_title <- "Factorization time [s]"
  y_labeller <- scientific

  if(hreadable) {
    y_title <- "Factorization time"
    y_labeller <- label_time_short
  }

  dataframe_best <- dataframe
  dataframe_best$coupled_method <- as.character(dataframe_best$coupled_method)
  dataframe_best$coupled_method[dataframe_best$solver == "hmat/hmat"] <-
  "full-hmat"
  dataframe_best <- merge(
    aggregate(
      tps_facto ~ coupled_method + nbpts + solver,
      min,
      data = dataframe_best
    ),
    dataframe_best
  )

Now, we can begin to define the plot object. The column names featured in this plotting function are:

nbpts linear system's unknown count,
tps_facto represents factorization time in seconds,
coupled_method gives the name of the approach used to solve coupled FEM/BEM systems (e. g. multi-solve, etc.),
solver contains the names of solvers featured in the coupling.

  plot <- NA
  if(check_epsilon) {
    plot <- ggplot(
      dataframe_best,
      aes(
        x = nbpts,
        y = error,
        color = coupled_method,
        shape = solver,
        linetype = solver
      )
    )
  } else {
    plot <- ggplot(
      dataframe_best,
      aes(
        x = nbpts,
        y = tps_facto,
        color = coupled_method,
        shape = solver,
        linetype = solver
      )
    )
  }

  plot <- plot +
    geom_line() +
    geom_point(size = 2.5)

The X-axis shows the count of unknowns in the linear system and the Y-axis shows the computation time.

  if(check_epsilon) {
    vjust <- -0.6
    size <- 5
    if(shorten) {
      vjust <- -0.3
      size <- 4
    }
    plot <- plot +
      geom_hline(yintercept = 1.0e-3) +
      geom_text(
        aes(
          1.0e+6,
          1.0e-3,
          label = as.character(expression(epsilon == 10^{-3})),
          vjust = vjust,
          hjust = 0
        ),
        parse = TRUE,
        color = "black",
        family = font,
        size = size
      )
  }

  plot <- plot +
    scale_x_continuous(
      "# Unknowns (\U1D441)",
      breaks = dataframe_best$nbpts,
      labels = scientific
    )

  if(check_epsilon) {
    plot <- plot +
      scale_y_continuous(
        expression(paste("Relative error (", E[rel], ")")),
        limits = c(min(dataframe_best$error), 2.0e-3),
        labels = scientific,
        breaks = breaks_log(10),
        trans = "log10"
      )
  } else {
    plot <- plot +
      scale_y_continuous(
        y_title,
        labels = y_labeller,
        trans = "log10",
        limits = c(NA, NA)
      )
  }

Finally, we set the legend title, apply the default theme and return the plot object. The color, shape and line type breaks are automatically determined based on the values of the coupled_method and solver columns.

  labs(
    shape = "Solvers",
    linetype = "Solvers",
    color = "Implementation\nscheme"
  )
  if(shorten) {
    plot <- plot +
      generate_theme(
        color_breaks = unique(na.omit(dataframe_best$coupled_method)),
        color_labels = label_coupling_short,
        color_override_aes = list(linetype = 0, shape = 15, size = 6),
        shape_breaks = unique(na.omit(dataframe_best$solver)),
        shape_labels = label_solver,
        linetype_breaks = unique(na.omit(dataframe_best$solver)),
        linetype_labels = label_solver,
        legend_rows = legend_rows,
        legend_box = "horizontal",
        legend_title = element_blank(),
        legend_text = element_text(family = font, size = fontsize - 2),
        theme_text = element_text(family = font, size = fontsize)
      )
  } else {
    plot <- plot +
      generate_theme(
        color_breaks = unique(na.omit(dataframe_best$coupled_method)),
        color_labels = label_coupling,
        color_override_aes = list(linetype = 0, shape = 15, size = 8),
        shape_breaks = unique(na.omit(dataframe_best$solver)),
        shape_labels = label_solver,
        linetype_breaks = unique(na.omit(dataframe_best$solver)),
        linetype_labels = label_solver,
        legend_rows = legend_rows,
        legend_box = "horizontal",
        legend_text = element_text(family = font, size = fontsize - 2),
        theme_text = element_text(family = font, size = fontsize)
      )
  }

  return(plot)
}

4.1.5.9. `compare_rss_coupled`

This function returns a plot of the real memory (RAM) usage peaks for the best factorization times relatively to linear system's unknown count for each implementation scheme for solving coupled systems.

We begin by restraining the input data frame to the best results of two-stage schemes and the results of the single-stage scheme implemented using HMAT.

Note that as the latter only implies the use of one single solver, we need to specify the coupling method name manually to enable the data frame treatment.

compare_rss_coupled <- function(dataframe) {
  dataframe_best <- dataframe
  dataframe_best$coupled_method <- as.character(dataframe_best$coupled_method)
  dataframe_best$coupled_method[dataframe_best$solver == "hmat/hmat"] <-
  "full-hmat"
  dataframe_best <- merge(
    aggregate(
      tps_facto ~ coupled_method + nbpts + solver,
      min,
      data = dataframe_best
    ),
    dataframe_best
  )

Now, we can begin to define the plot object. The column names featured in this plotting function are:

nbpts linear system's unknown count,
tps_facto represents factorization time in seconds,
rm_peak real memory usage peaks, stored in mibibytes (MiB) but converted to gibibytes (GiB),
coupled_method gives the name of the approach used to solve coupled FEM/BEM systems (e. g. multi-solve, etc.),
solver contains the names of solvers featured in the coupling.

  plot <- ggplot(
    dataframe_best,
    aes(
      x = nbpts,
      y = rm_peak / 1024.,
      color = coupled_method,
      shape = solver,
      linetype = solver
    )
  ) +
  geom_line() +
  geom_point(size = 2.5) +

The X-axis shows the count of unknowns in the linear system and the Y-axis shows the RAM usage peaks.

  scale_x_continuous(
    "# Unknowns (\U1D441)",
    breaks = dataframe_best$nbpts,
    labels = scientific
  ) +
  scale_y_continuous(
    "Real memory (RAM) usage peak [GiB]",
    labels = function(label) sprintf("%.0f", label),
    limits = c(NA, 126),
    breaks = c(0, 30, 60, 90, 120)
  ) +

  labs(
    shape = "Solvers",
    linetype = "Solvers",
    color = "Implementation\nscheme"
  ) +
  generate_theme(
    color_breaks = length(unique(na.omit(dataframe_best$coupled_method))),
    color_labels = label_coupling,
    shape_breaks = length(unique(na.omit(dataframe_best$solver))),
    shape_labels = label_solver,
    linetype_breaks = length(unique(na.omit(dataframe_best$solver))),
    linetype_labels = label_solver,
    legend_rows = 3,
    legend_box = "horizontal"
  )

  return(plot)
}

4.1.5.10. `accuracy_by_nbpts`

This function returns a plot of relative solution error with respect to linear system's unknown count.

The solver_config option allows one to distinguish between different solver configurations on the figure instead of different precision parameter $\epsilon$ values.

The column names featured in this plotting function are:

nbpts linear system's unknown count,
solver_config gives the solver configuration used,
error relative forward error of the solution approximation.

accuracy_by_nbpts <- function(dataframe, solver_config = FALSE) {
  if(solver_config) {
    plot <- ggplot(
      dataframe,
      aes(
        x = nbpts,
        y = error,
        color = as.character(solver_config),
        shape = solver,
        linetype = solver
      )
    )
  } else {
    plot <- ggplot(
      dataframe,
      aes(
        x = nbpts,
        y = error,
        color = as.character(desired_accuracy),
        shape = solver,
        linetype = solver
      )
    )
  }
  plot <- plot +
    geom_line() +
    geom_point(size = 2.5) +

We draw a horizontal line representing the maximal admitted value of $E_{rel}$ which is 10^-3.

    geom_hline(yintercept = 1.0e-3) +
    geom_text(
      aes(
        min(dataframe$nbpts),
        1.0e-3,
        label = as.character(expression(epsilon == 10^{-3})),
        vjust = -0.6,
        hjust = 0
      ),
      parse = TRUE,
      family = "CMU Serif",
      color = "black",
      size = 5
    ) +

For the Y-axis, we define the tick values manually for better comparison to the associated relative error values.

    scale_x_continuous(
      "# Unknowns (\U1D441)",
      trans = "log10",
      breaks = dataframe$nbpts,
      labels = label_nbpts_ticks
    ) +
    scale_y_continuous(
      expression(paste("Relative error (", E[rel], ")")),
      limits = c(min(dataframe$error), 1e-1),
      breaks = c(1e-16, 1e-13, 1e-10, 1e-6, 1e-3),
      labels = scientific,
      trans = "log10"
    )

Finally, we set the legend labels, apply the common theme parameters and return the plot object.

  if(solver_config) {
    plot <- plot +
      labs(color = "Configuration", shape = "Solver", linetype = "Solver") +
      generate_theme(
        color_labels = label_solver_config,
        color_override_aes = list(linetype = 0, shape = 15, size = 8),
        shape_labels = label_solver,
        linetype_labels = label_solver,
        legend_rows = length(unique(na.omit(dataframe$solver_config))) / 2
      )    
  } else {
    plot <- plot +
      labs(color = "\U1D700", shape = "Solver", linetype = "Solver") +
      generate_theme(
        color_labels = label_epsilon,
        color_override_aes = list(linetype = 0, shape = 15, size = 8),
        shape_labels = label_solver,
        linetype_labels = label_solver
      )
  }

  return(plot)
}

4.1.5.11. `rss_peaks_by_nbpts`

This function returns a plot of real memory (RAM) and disk space usage peaks relatively to linear system's unknown count. The function can take five extra arguments, limit_max, tick_values, solver_config, trans_x and trans_y.

The first two of them can be used to redefine the default Y-axis maximum and tick values respectively. The solver_config option allows one to distinguish between different solver configurations on the figure. By default, the trans_x argument sets the X-axis scale to logarithmic. To use standard scale, the value identity should be used.

The column names featured in this plotting function are:

nbpts linear system's unknown count,
solver_config gives the solver configuration used,
rm_peak real memory usage peaks, stored in mibibytes (MiB) but converted to gibibytes (GiB),
hdd_peak disk space usage peaks, stored in MiB but converted to GiB.

At first, we convert the data frame from wide to long format.

rss_peaks_by_nbpts <- function(dataframe, limit_max = 126,
                               tick_values = c(0, 30, 60, 90, 120),
                               solver_config = FALSE, trans_x = "log10") {
  dataframe_long <- gather(
    dataframe,
    key = "memory_type",
    value = "memory_usage",
    c("rm_peak", "hdd_peak")
  )

Also, we fix the order of the values in memory_type so that real memory (RAM) usage peaks come before disk usage peaks in the facet grid.

dataframe_long$memory_type <- factor(
  dataframe_long$memory_type,
  c('rm_peak', 'hdd_peak')
)

Then, we configure the plot object itself as described above and handle different levels of expected relative solution error in case of solvers using data compression by adding the shape aesthetics. If solver_config is TRUE, we want to distinguish between different solver configurations instead.

  if(solver_config) {
    plot <- ggplot(
      dataframe_long,
      aes(
        x = nbpts,
        y = memory_usage / 1024.,
        color = as.character(solver_config),
        shape = solver,
        linetype = solver
      )
    )
  } else {
    plot <- ggplot(
      dataframe_long,
      aes(
        x = nbpts,
        y = memory_usage / 1024.,
        color = as.character(desired_accuracy),
        shape = solver,
        linetype = solver
      )
    )
  }

  plot <- plot +
    geom_line() +
    geom_point(size = 2.5) +

The X-axis shows the count of unknowns in the linear system and the Y-axis shows the storage usage peaks. On the Y-axis, we round the values to 0 decimal places.

  scale_x_continuous(
    "# Unknowns (\U1D441)",
    trans = trans_x,
    breaks = dataframe_long$nbpts,
    labels = label_nbpts_ticks
  ) +
  scale_y_continuous(
    "Storage usage peak [GiB]",
    labels = function(label) sprintf("%.0f", label),
    limits = c(NA, limit_max),
    breaks = tick_values
  ) +

Storage types are distinguished using a facet grid.

  facet_grid(
    . ~ memory_type,
    labeller = labeller(memory_type = label_storage)
  )

We end by setting legend titles, applying the common theme parameters and returning the plot object.

  if(solver_config) {
    plot <- plot +
      labs(color = "Configuration", shape = "Solver", linetype = "Solver") +
      generate_theme(
        color_labels = label_solver_config,
        color_override_aes = list(linetype = 0, shape = 15, size = 8),
        shape_labels = label_solver,
        linetype_labels = label_solver,
        legend_rows = length(unique(na.omit(dataframe_long$solver_config))) / 2
      )
  } else {
    plot <- plot +
      labs(color = "\U1D700", shape = "Solver", linetype = "Solver") +
      generate_theme(
        color_labels = label_epsilon,
        color_override_aes = list(linetype = 0, shape = 15, size = 8),
        shape_labels = label_solver,
        linetype_labels = label_solver
      )
  }

  return(plot)
}

4.1.5.12. `rss_by_time`

This function returns a plot of real memory (RAM) usage relatively to the execution time. The dataframe argument represents the data frame containing RAM consumption data read using the read_rss function (see Section 4.1.2.3.1). Moreover, using limit_max and tick_values one can redefine the default Y-axis maximum and tick values respectively to improve readibility of the plot. The timeline argument is useful when dataframe contains coupled FEM/BEM benchmark results. timeline represents an additional data frame that can be used to highlight selected computation phases and estimated reference RAM usage peaks of a sparse FEM factorization and Schur matrix size in the plot (see Section 4.1.2.3.4).

The column names featured in this plot function are:

time timestamp in seconds giving the time elapsed since the beginning of the execution,
rss global real memory usage in mibibytes (MiB) but converted to gibibytes (GiB).

We begin directly by defining the plot object.

rss_by_time <- function(dataframe, limit_max = 126,
                        tick_values = c(0, 30, 60, 90, 120),
                        estimated_peak_factorization = NA, estimated_schur = NA,
                        timeline = NA) {
  is_coupled <- FALSE
  fill_legend <- "Solver"
  color_labeller <- label_solver
  if("coupled_method" %in% colnames(dataframe)) {
    is_coupled <- TRUE
    fill_legend <- "Implementation scheme"
    color_labeller <- label_coupling
  }

  if(is_coupled) {
    plot <- ggplot(
      dataframe,
      aes(
        x = timestamp / 1000,
        y = consumption,
        fill = coupled_method
      )
    )
  } else {
    plot <- ggplot(
      dataframe,
      aes(
        x = timestamp / 1000,
        y = consumption,
        fill = solver
      )
    )
  }

We draw the RAM usage as an area chart.

  plot <- plot +
    geom_area()

In the case the extra timeline argument is present, we draw additional labelled vertical lines corresponding to selected computation phases using the geom_vline and geom_text layers. Note that for the geom_text layer we also need to adjust the text position and rotation. To enable the processing of the labels as math expressions, we use the parse parameter.

  if(!is.na(timeline)) {
    plot <- plot +
      geom_vline(
        data = timeline,
        aes(xintercept = time),
        linetype = "dotted",
        size = 0.4
      ) +
      geom_text(
        data = timeline,
        aes(
          x = middle,
          y = 0.75 * limit_max,
          label = label
        ),
        parse = T,
        angle = 90,
        vjust = "center"
      )

Then, we draw also the horizontal segments corresponding to estimated RAM usage peaks of a sparse FEM factorization and estimated Schur matrix size.

However, at first, we have to convert the timeline data frame to the long format (see Section 4.1.2). This allows us to use different sizes to distinguish different segments based on the kind of value they represent.

    columns <- c("assembly_estimation", "peak_symmetric_factorization",
                 "peak_non_symmetric_factorization", "schur_estimation")
    timeline_long <- gather(
      timeline,
      key = "peak_kind",
      value = "peak_value",
      columns
    )
    plot <- plot +
      geom_spoke(
        data = timeline_long,
        aes(
          x = middle - 0.5 * (time - middle),
          y = peak_value,
          linetype = peak_kind,
          radius = time - middle
        ),
        angle = 2 * pi
      )
  }

Next, we set up the axis and their scaling. We round the Y-axis values to two decimal places.

  plot = plot +
    scale_x_continuous("Execution time", labels = label_time) +
    scale_y_continuous(
      "Random access memory (RAM) usage [GiB]",
      labels = function(label) sprintf("%.0f", label),
      limits = c(NA, limit_max),
      breaks = tick_values
    )

We end by configuring the visual aspect of the plot.

  if(is.data.frame(timeline)) {
    plot <- plot +
      labs(
        fill = fill_legend,
        linetype = "Estimated peak usage"
      ) +
      generate_theme(
        color_labels = color_labeller,
        linetype_labels = label_peak_kind,
        legend_rows_linetype = length(columns)
      )
  } else {
    plot <- plot +
      labs(fill = fill_legend) +
      generate_theme(color_labels = color_labeller)
  }

  return(plot)
}

4.1.5.13. `eprofile`

This function returns a plot of power consumtion relatively to execution time. The first and only argument es_data of the function corresponds to the data frame containing power consumption data as well as tag information, if any (see Section 4.1.2.3.3).

eprofile <- function(es_data, zoom_on = c(NA, NA), TDP = NA, rss = TRUE,
                     flops = TRUE, bw = TRUE) {
  zoomed <- FALSE
  solvers <- unique(na.omit(es_data$solver))
  es <- es_data

  if(!is.na(zoom_on[1]) && !is.na(zoom_on[2])) {
    zoomed <- TRUE
    es <- subset(
      es, timestamp >= zoom_on[1] * 1000 & timestamp <= zoom_on[2] * 1000
    )
  }
  
  edata <- subset(es, kind_general == "e")
  
  power <- ggplot(
    data = subset(edata, is.na(type))
  )
  
  sdata <- NA
  storage <- NA

  if(rss) {
    sdata <- subset(es, kind_general == "s")
    sdata$kind <- factor(sdata$kind, levels = c("sram", "shdd"))
    
    storage <- ggplot(
      data = subset(sdata, is.na(type))
    )
  }

  fdata <- NA

  if(flops) {
    fdata <- subset(es, kind_general == "flops")
    fdata$kind <- factor(fdata$kind, levels = c("flops"))
  }

  bwdata <- NA

  if(bw) {
    bwdata <- subset(es, kind_general == "bw")
    bwdata$kind <- factor(bwdata$kind, levels = c("bw"))
  }

At first, we draw the power consumption curve. The tags are represented by colored points surrounded with black stroke and annotated with labels. Note that the latter are stored as math expressions and need to be parsed by the geom_label_repel function.

  if(flops) {
    power <- power +
      geom_line(
        data = fdata,
        aes(x = timestamp, y = consumption / 1000, color = kind)
      )
  }

  if(bw) {
    power <- power +
      geom_line(
        data = bwdata,
        aes(x = timestamp, y = consumption / 1024, color = kind)
      )
  }

  power <- power +
    geom_line(aes(x = timestamp, y = consumption, color = kind)) +
    geom_point(
      data = subset(edata, !is.na(type) & kind == "ecpu"),
      aes(x = timestamp, y = consumption, shape = tag),
      color = "black",
      fill = "white",
      size = 3
    ) +
    geom_label_repel(
      data = subset(edata, !is.na(type) & kind == "ecpu"),
      aes(x = timestamp, y = consumption, label = type_label),
      force_pull = 0,
      segment.color = "black",
      min.segment.length = 0,
      max.overlaps = 20,
      family = "CMU Serif",
      parse = TRUE,
      show.legend = FALSE
    )

  if(rss) {
    storage <- storage +
      geom_line(aes(x = timestamp, y = consumption, color = kind)) +
      geom_point(
        data = subset(sdata, !is.na(type)),
        aes(x = timestamp, y = consumption, shape = tag),
        color = "black",
        fill = "white",
        size = 3
      ) +
      geom_label_repel(
        data = subset(sdata, !is.na(type)),
        aes(x = timestamp, y = consumption, label = type_label),
        force_pull = 0,
        segment.color = "black",
        min.segment.length = 0,
        max.overlaps = 20,
        family = "CMU Serif",
        parse = TRUE,
        show.legend = FALSE
      )
  }

If the TDP parameter is provided, we add a horizontal line matching its value.

  if(!is.na(TDP)) {
    power <- power +
      geom_hline(yintercept = TDP) +
      geom_text(
        aes(
          0,
          TDP,
          label = paste("TDP:", as.character(TDP), "W"),
          vjust = 2,
          hjust = 0
        ),
        color = "black",
        family = "CMU Serif",
        size = 5
      )
  }

Next, we set up the axis and their scaling. We round the Y-axis values to two decimal places.

  power <- power +
    scale_x_continuous(
      "Execution time [s]",
      labels = function(label) sprintf("%d", as.integer(label / 1000)),
      limits = c(NA, NA),
      breaks = breaks_extended(n = 10)
    )

  if(flops) {
    coeff <- 0.2

    if(zoomed) {
      coeff <- 1.1
    }
    
    power <- power +
      scale_y_continuous(
        "Power consumption [W]",
        labels = function(label) sprintf("%.0f", label),
        breaks = breaks_width(
          as.integer(0.25 * max(edata$consumption))
        ),
        sec.axis = sec_axis(
          ~ . * 1, name = "Flop rate [Gflop/s]",
          labels = function(label) sprintf("%.0f", label),
          breaks = breaks_width(
            as.integer(coeff * (max(fdata$consumption) / 1000))
          )
        )
      )
  } else {
    power <- power +
      scale_y_continuous(
        "Power consumption [W]",
        labels = function(label) sprintf("%.0f", label),
        breaks = breaks_width(
          as.integer(0.25 * max(edata$consumption))
        )
      )
  }

  if(rss) {
    y_axis_title <- ifelse(
      length(unique(na.omit(sdata$kind))) < 2,
      "RAM usage [GiB]",
      "Storage usage [GiB]"
    )
    storage <- storage +
      scale_x_continuous(
        "Execution time [s]",
        labels = function(label) sprintf("%d", as.integer(label / 1000)),
        limits = c(NA, NA),
        breaks = breaks_extended(n = 10)
      ) +
      scale_y_continuous(
        y_axis_title,
        labels = function(label) sprintf("%.0f", label),
        breaks = breaks_width(
          as.integer(0.2 * max(sdata$consumption))
        )
      )
  }

We end by configuring the visual aspect of the plot. We use a facet grid to draw a separate plot for each solver coupling.

  if("coupled_method" %in% colnames(es_data)) {
    power <- power +
      facet_grid(
        . ~ coupled_method + solver,
        labeller = labeller(
          coupled_method = label_coupling,
          solver = label_solver
        ),
        scales = "free_x",
        switch = "y"
      )
  } else {
    power <- power +
      facet_grid(
        . ~ solver,
        labeller = labeller(
          solver = label_solver
        ),
        scales = "free_x",
        switch = "y"
      )
  }

  if(rss) {
    storage <- storage +
      facet_grid(
        . ~ solver,
        labeller = labeller(
          solver = label_solver
        ),
        scales = "free_x",
        switch = "y"
      ) +
      labs(
        shape = "Computation phase"
      )
  }

  power <- power +
    generate_theme(
      color_labels = label_kind,
      color_override_aes = list(size = 8),
      shape_labels = label_tag,
      legend_rows_shape = 1,
      legend_box = "horizontal"
    )

  if(rss) {
    power <- power +
      labs(
        color = "Device"
      ) +
      guides(
        fill = "none", shape = "none", linetype = "none"
      ) +
      theme(
        axis.title.x = element_blank()
      )
    
    storage <- storage +
      generate_theme(
        shape_labels = label_tag,
        legend_rows_shape = 1,
        legend_box = "horizontal"
      ) +
      guides(color = "none", fill = "none") +
      theme(
        strip.text.x = element_blank(), strip.background.x = element_blank()
      )

    return(
      plot_grid(
        power, storage, nrow = 2, ncol = 1, align = "v", axis = "b",
        rel_heights = c(1.1, 1)
      )
    )
  }

  power <- power +
    labs(
      color = "Device",
      shape = "Computation phase"
    ) +
    guides(
      fill = "none"
    )

  return(power)
}

4.1.5.14. `es_comparison`

es_comparison <- function(dataframe) {
  extended <- dataframe
  extended$etotal <- as.numeric(0)

  for(i in 1:nrow(extended)) {
    gcvb_retrieve_files(extended[i, ])
    eprofile <- fromJSON(file = "./eprofile.txt")
    extended[i, "etotal"] <- as.numeric(
      eprofile$data$data$tags$es_total$`joule(J)`
    )
  }

  extended$solver <- ifelse(
    extended$solver == "mumps" & extended$error < 10e-12,
    "mumps-blr", extended$solver
  )

  postamble <- generate_theme(
    color_labels = label_solver,
    legend_rows = 2,
    legend_title = element_blank()
  )

  energy <- ggplot(
    data = extended,
    aes(
      x = nbpts,
      y = etotal,
      fill = solver
    )
  ) +
    geom_bar(position = "dodge", stat = "identity") +
    scale_x_continuous(
      "# Unknowns (\U1D441)",
      breaks = extended$nbpts,
      labels = scientific
    ) +
    scale_y_continuous(
      "Energy consumption [J]",
      labels = scientific
    ) +
    postamble +
    guides(fill = "none") +
    theme(axis.title.x = element_blank())

  time <- ggplot(
    data = extended,
    aes(
      x = nbpts,
      y = tps_facto + tps_solve,
      fill = solver
    )
  ) +
    geom_bar(position = "dodge", stat = "identity") +
    scale_x_continuous(
      "# Unknowns (\U1D441)",
      breaks = extended$nbpts,
      labels = scientific
    ) +
    scale_y_continuous(
      "Computation time [s]"
    ) +
    postamble

  rss <- ggplot(
    data = extended,
    aes(
      x = nbpts,
      y = rm_peak / 1024.,
      fill = solver
    )
  ) +
    geom_bar(position = "dodge", stat = "identity") +
    scale_x_continuous(
      "# Unknowns (\U1D441)",
      breaks = extended$nbpts,
      labels = scientific
    ) +
    scale_y_continuous(
      "RAM usage peak [GiB]"
    ) +
    postamble +
    guides(fill = "none") +
    theme(axis.title.x = element_blank())

  return(
    plot_grid(
      energy, time, rss, nrow = 1, ncol = 3, align = "h", axis = "b",
      rel_widths = c(1, 1.15, 1)
    )
  )
}

4.1.5.15. `scalability`

This function returns a plot of factorization and solve computation times relatively to available processor core count.

The column names featured in this plotting function are:

p_units giving the count of cores,
tps_facto represents factorization time in seconds,
tps_solve represents solve time in seconds,
mapping provides the mapping configuration of MPI processes.

Then, data frame is converted to long format (see Section 4.1.2) so we are able to combine times of both phases using a facet grid. Also, we force the order of facet labels for the solver column by refactoring the latter.

scalability <- function(dataframe) {
  dataframe_long <- gather(
    subset(dataframe, !is.na(dataframe$nbpts)),
    key = "computation_phase",
    value = "time",
    c("tps_facto", "tps_solve")
  )

  dataframe_long$solver <- factor(
    dataframe_long$solver,
    levels = unique(na.omit(dataframe_long$solver))
  )

  plot <- ggplot(
    dataframe_long,
    aes(
      x = p_units,
      y = time,
      group = mapping,
      color = mapping
    )
  ) +
  geom_line() +
  geom_point(size = 2.5, shape = 16) +

As there may be a considerable scale difference between factorization and solve time values, we use log₁₀ scale on the Y-axis to be able to combine both metrics on the same axis without loosing on readability.

  scale_x_continuous(
    "# Cores (\U1D450)",
    breaks = (dataframe_long$p_units)
  ) +
  scale_y_continuous(
    "Computation time [s]",
    labels = scientific,
    trans = "log10"
  ) +

Finally, using facet grid we distinguish between factorization and solve times on horizontal and between each solver considered on vertical.

For the sake of readability, we specify that scales on Y-axis can be different. It is useful when multiple solvers are considered having each considerably different time values.

Also, before returning the plot we apply the common theme parameters. For scalability and for parallel efficiency plots, we split the legend items to two lines as the labels are too long.

  facet_grid(
    computation_phase ~ solver + nbpts,
    labeller = labeller(
      solver = label_solver,
      computation_phase = phase.labs,
      nbpts = label_nbpts
    ),
    scales = "free"
  ) +
  labs(color = "Parallel\nconfiguration") +
  generate_theme(
    color_breaks = c("node", "socket", "numa", "core"),
    color_labels = label_mapping,
    legend_rows = 4
  )

  return(plot)
}

4.1.5.16. `scalability_ram`

This function is a variation of scalability (see Section 4.1.5.15) and returns a plot of real memory (RAM) usage peaks relatively to available processor core count.

The column names featured in this plotting function are:

p_units giving the count of cores,
rm_peak represents real memory usage peaks, stored in mibibytes (MiB) but converted to gibibytes (GiB),
mapping provides the mapping configuration of MPI processes.

scalability_ram <- function(dataframe) {
  plot <- ggplot(
    dataframe,
    aes(
      x = p_units,
      y = rm_peak / 1024.,
      group = mapping,
      color = mapping
    )
  ) +
  geom_line() +
  geom_point(size = 2.5, shape = 16) +

Note that we round the values on the Y-axis to two decimal places and define custom axis limits and breaks adapted for the current type of plot.

  scale_x_continuous(
    "# Cores (\U1D450)",
    breaks = (dataframe$p_units)
  ) +
  scale_y_continuous(
    "Real memory (RAM) usage peaks [GiB]",
    labels = function(label) sprintf("%.0f", label),
    limits = c(NA, 126),
    breaks = c(0, 30, 60, 90, 120)
  ) +

In this case, we use using facet grid exclusively as a mean to better annotate the resulting plot.

  facet_grid(
    . ~ solver + nbpts,
    labeller = labeller(
      solver = label_solver,
      nbpts = label_nbpts
    )
  ) +
  labs(color = "Parallel\nconfiguration") +
  generate_theme(
    color_breaks = c("node", "socket", "numa", "core"),
    color_labels = label_mapping,
    legend_rows = 4
  )

  return(plot)
}

4.1.5.17. `efficiency`

This function returns a plot of parallel efficiency, i.e. the percentage of computational resource usage relatively to available processor core count.

The column names featured in this plotting function are:

p_units giving the count of cores,
efficiency_facto represents factorization phase efficiency,
efficiency_solve represents solve phase efficiency,
mapping provides the mapping configuration of MPI processes.

The function itself follows the exact same logic as the scalability function (see Section 4.1.5.15).

efficiency <- function(dataframe) {
  dataframe_long <- gather(
    subset(dataframe, !is.na(dataframe$nbpts)),
    key = "computation_phase",
    value = "efficiency",
    c("efficiency_facto", "efficiency_solve")
  )

  dataframe_long$solver <- factor(
    dataframe_long$solver,
    levels = unique(na.omit(dataframe_long$solver))
  )

  plot <- ggplot(
    dataframe_long,
    aes(
      x = p_units,
      y = efficiency,
      group = mapping,
      color = mapping
    )
  ) +
  geom_line() +
  geom_point(size = 2.5, shape = 16) +
  geom_hline(yintercept = 1) +
  scale_x_continuous(
    "# Cores (\U1D450)",
    breaks = (dataframe_long$p_units)
  ) +

There is only one additional step, i. e. to convert efficiency values from $x$ such as $0 \leq x \leq 1$ to percentages using the label_percentage label function (see Section 4.1.4.1). Also, we fix our own Y-axis tick values (breaks).

  scale_y_continuous(
    "Parallel efficiency [%]",
    breaks = c(0.0, 0.2, 0.4, 0.6, 0.8, 1.0),
    labels = label_percentage
  ) +
  facet_grid(
    computation_phase ~ solver + nbpts,
    labeller = labeller(
      solver = label_solver,
      computation_phase = phase.labs,
      nbpts = label_nbpts
    )
  ) +
  labs(color = "Parallel\nconfiguration") +
  generate_theme(
    color_breaks = c("node", "socket", "numa", "core"),
    color_labels = label_mapping,
    legend_rows = 4
  )

  return(plot)
}

See the complete R script file plot.R ¹⁸.

4.2. StarPU execution traces

To visualize StarPU execution traces in FXT format, we also make use of R, the ggplot2 package and of the dedicated StarVZ library starvz16,starvz18.

To configure the output plot produced by StarVZ, we use a Yaml configuration file. We leave cosmetic options take their default values, except the base font size (see base_size value), and we override those specifying what kind of information shall be included in resulting plots. In our case, we want to plot only the StarPU worker statistics (items st and starpu).

default:
  base_size: 14
  starpu:
    active: TRUE
    legend: TRUE
  ready:
    active: FALSE
  submitted:
    active: FALSE
  st:
    active: TRUE

Showing critical path time bound is not necessary.

    cpb: FALSE

On the other hand, we want to display all the labels, the legend and the application time span.

    labels: "ALL"
    legend: TRUE
    makespan: TRUE

To reduce the size of the resulting plot, we enable the aggregation of visualized tasks in a time step.

    aggregation:
      active: TRUE
      method: "static"
      step: 500

See the complete configuration file starvz.yaml ¹⁹.

The StarVZ configuration begins by determining the path to the latest gcvb benchmark session directory.

output = try(
  system("ls -t ~/benchmarks/results | head -n 1", intern = TRUE)
)

latest_session_path = paste(
  "~/benchmarks/results",
  output,
  sep = "/"
)

Then, we import the required R libraries and set the starvz_config variable holding the path to the Yaml configuration file of the StarVZ package.

library(ggplot2)
library(starvz)
library(grid)

starvz_config = "starvz.yaml"

4.2.1. Generating plots

Eventually, we generate the plots. To do this, we reuse the above code thanks to the :noweb reference <<starvz-header>>:

execution traces of the HMAT solver on a BEM system,

<<starvz-header>>
hmat_bem_node = panel_st_runtime(
  starvz_read(
    paste(
      latest_session_path,
      "fxt-scalability-hmat-bem-node-1x4-25000",
      sep = "/"
    ),
    starvz_config
  )
)

hmat_bem_socket = panel_st_runtime(
  starvz_read(
    paste(
      latest_session_path,
      "fxt-scalability-hmat-bem-socket-2x2-25000",
      sep = "/"
    ),
    starvz_config
  )
)

hmat_bem_core = panel_st_runtime(
  starvz_read(
    paste(
      latest_session_path,
      "fxt-scalability-hmat-bem-core-4x1-25000",
      sep = "/"
    ),
    starvz_config
  )
)

hmat_bem_node =
  hmat_bem_node +
  theme_bw() +
  theme(text = element_text(family = "Arial", size = 16))

hmat_bem_node$scales$scales[[1]]$range = c(NA, 44000)
hmat_bem_node$scales$scales[[1]]$limits = c(NA, 44000)
hmat_bem_node$theme$legend.position = "none"
hmat_bem_node$theme$axis.title.x = element_blank()
hmat_bem_node$theme$axis.text.x = element_blank()
hmat_bem_node$theme$axis.ticks.x = element_blank()

hmat_bem_socket =
  hmat_bem_socket +
  theme_bw() +
  theme(text = element_text(family = "Arial", size = 16))

hmat_bem_socket$scales$scales[[1]]$range = c(NA, 44000)
hmat_bem_socket$scales$scales[[1]]$limits = c(NA, 44000)
hmat_bem_socket$theme$legend.position = "none"
hmat_bem_socket$theme$axis.title.x = element_blank()
hmat_bem_socket$theme$axis.text.x = element_blank()
hmat_bem_socket$theme$axis.ticks.x = element_blank()

hmat_bem_core =
  hmat_bem_core +
  theme_bw() +
  theme(
    text = element_text(family = "Arial", size = 16),
    legend.text = element_text(family = "Arial", size = 14)
  )

hmat_bem_core$scales$scales[[1]]$range = c(NA, 44000)
hmat_bem_core$scales$scales[[1]]$limits = c(NA, 44000)
hmat_bem_core$theme$legend.title = element_blank()
hmat_bem_core$theme$legend.position = "bottom"

grid.newpage()
grid.draw(
  rbind(
    ggplotGrob(hmat_bem_node),
    ggplotGrob(hmat_bem_socket),
    ggplotGrob(hmat_bem_core),
    size = "max"
  )
)

execution traces of the HMAT solver on a FEM system.

<<starvz-header>>
hmat_fem_node = panel_st_runtime(
  starvz_read(
    paste(
      latest_session_path,
      "fxt-scalability-hmat-fem-node-1x4-50000",
      sep = "/"
    ),
    starvz_config
  )
)

hmat_fem_socket = panel_st_runtime(
  starvz_read(
    paste(
      latest_session_path,
      "fxt-scalability-hmat-fem-socket-2x2-50000",
      sep = "/"
    ),
    starvz_config
  )
)

hmat_fem_core = panel_st_runtime(
  starvz_read(
    paste(
      latest_session_path,
      "fxt-scalability-hmat-fem-core-4x1-50000",
      sep = "/"
    ),
    starvz_config
  )
)

hmat_fem_node =
  hmat_fem_node +
  theme_bw() +
  theme(text = element_text(family = "Arial", size = 16))

hmat_fem_node$scales$scales[[1]]$range = c(NA, 101000)
hmat_fem_node$scales$scales[[1]]$limits = c(NA, 101000)
hmat_fem_node$theme$legend.position = "none"
hmat_fem_node$theme$axis.title.x = element_blank()
hmat_fem_node$theme$axis.text.x = element_blank()
hmat_fem_node$theme$axis.ticks.x = element_blank()

hmat_fem_socket =
  hmat_fem_socket +
  theme_bw() +
  theme(text = element_text(family = "Arial", size = 16))

hmat_fem_socket$scales$scales[[1]]$range = c(NA, 101000)
hmat_fem_socket$scales$scales[[1]]$limits = c(NA, 101000)
hmat_fem_socket$theme$legend.position = "none"
hmat_fem_socket$theme$axis.title.x = element_blank()
hmat_fem_socket$theme$axis.text.x = element_blank()
hmat_fem_socket$theme$axis.ticks.x = element_blank()

hmat_fem_core =
  hmat_fem_core +
  theme_bw() +
  theme(
    text = element_text(family = "Arial", size = 16),
    legend.text = element_text(family = "Arial", size = 14)
  )

hmat_fem_core$scales$scales[[1]]$range = c(NA, 101000)
hmat_fem_core$scales$scales[[1]]$limits = c(NA, 101000)
hmat_fem_core$theme$legend.title = element_blank()
hmat_fem_core$theme$legend.position = "bottom"

grid.newpage()
grid.draw(
  rbind(
    ggplotGrob(hmat_fem_node),
    ggplotGrob(hmat_fem_socket),
    ggplotGrob(hmat_fem_core),
    size = "max"
  )
)

Note that we do not tangle the source code in this section. Each figure can be generated by executing the associated code block from within the Emacs editor.

5. Appendix

5.1. Example standard output of a `test_FEMBEM` execution

Testing : with MPF matrices.
***
*** mpf version  (commit ID )
*** Built on Jan  1 1970 at 00:00:01
***
[mpf] MPI thread level provided = MPI_THREAD_MULTIPLE
[mpf] OpenMP thread number = 24
[mpf] MKL thread number = 24
[mpf] MPF_MAX_MEMORY = 64324 Mb
  0 : [ test_FEMBEM@miriel060.plafrim.cluster is online... ]
      tmpdir=/tmp/felsoci/p-spido-25000 pid=4614
[mpf] The direct solver has been selected
[mumps] Version : 5.2.1
***   10 Command Line Parameters not read by MPFinit :
***   -nbrhs 50 -z -radius 2 -height 4 --bem -nbpts 25000
***
*** scab version  (commit ID )
*** Built on Jan  1 1970 at 00:00:01
*** Submit bug reports at https://imacs.polytechnique.fr/bugzilla/enter_bug.cgi
***
[Scab] CPUID detection: SSE2=1 AVX=1 AVX2=1
[Scab] D01 structure stored out-of-core
       (forced by globalTmp=0, nodeTmp=1 or D01SingleReader=1)
***   10 Command Line Parameters not read by SCAB_Init :
***   -nbrhs 50 -z -radius 2 -height 4 --bem -nbpts 25000
***
*** test_FEMBEM for scab version  (commit ID )
*** Built on Jan  1 1970 at 00:00:01
*** Submit bug reports at https://imacs.polytechnique.fr/bugzilla/enter_bug.cgi
***
Testing: BEM (dense matrix).
Reading nbPts = 25000
Reading radius = 2.000000
Reading height = 4.000000
Reading nbRHS = 50
   Setting lambda = 0.448399 (with 10.000000 points per wavelength)
Double Complex
Setting sparseRHS = 71
<PERFTESTS> MPF_Version =
<PERFTESTS> TEST_FEMBEM_Version =
<PERFTESTS> HMAT_Version = 2019.1.0
<PERFTESTS> ScalarType = DOUBLE COMPLEX
Number of points for the test = 25000
Number of right hand sides for the test = 50
<PERFTESTS> StepMesh = 4.483993e-02
<PERFTESTS> NbPts = 25000
<PERFTESTS> NbPtsBEM = 25000
<PERFTESTS> NbRhs = 50
<PERFTESTS> nbPtLambda = 1.000000e+01
<PERFTESTS> Lambda = 4.483993e-01
Computing RHS...

**** Computing Classical product...
BEM:   0% ......... 10% ......... 20% ......... 30% ......... 40% ......... 50% ......... 60% ......... 70% ......... 80% ......... 90% ......... 100%  Done.
<PERFTESTS> TpsCpuClassic = 1.653491
  Size of thread block : [358 x 358]
  Size of proc block : [3572 x 3572]
  Size of disk block : [3572 x 3572]
MatAssembly_SPIDO :   0% ......... 10% ......... 20% ......... 30% ......... 40% ......... 50% ......... 60% ......... 70% ......... 80% ......... 90% ......... 100%  Done.
Assembly done. Making the matrix symmetric...
<PERFTESTS> TpsCpuMPFCreation = 6.426428
VecAssembly_EMCP2 :   0% ......... 10% ......... 20% ......... 30% ......... 40% ......... 50% ......... 60% ......... 70% ......... 80% ......... 90% ......... 100%  Done.
VecAssembly_EMCP2 :   0% ......... 10% ......... 20% ......... 30% ......... 40% ......... 50% ......... 60% ......... 70% ......... 80% ......... 90% ......... 100%  Done.
MatFactorLDLt_SPIDO :   0% ......... 10% ......... 20% ......... 30% ......... 40% ......... 50% ......... 60% ......... 70% ......... 80% ......... 90% ......... 100%  Done.
MIN = 2.4028e+00	MAX = 3.5494e+00	COND = 1.4772e+00
<PERFTESTS> TpsCpuFactoMPF = 85.717232
<PERFTESTS> TpsCpuSolveMPF = 5.684338

**** Comparing results...
<PERFTESTS> Error = 6.8930e-15

test_FEMBEM : end of computation
Writing tracing report in traceCall.log
Writing tracing report in traceCall.json

bibliography.bib