Ph.D. thesis of Marek Felšöci: Source files

1. Dealing with proprietary source code
- 1.1. Defining, scheduling and computing benchmarks
2. Storage resources monitoring
3. Energetic profiling
4. Result parsing
5. Injecting results into a database
6. Wrapper scripts
- 6.1. Benchmark-related task wrappers
  - 6.1.1. Python tasks
- 6.2. Benchmark task wrappers
7. Job submission
8. Post-processing benchmark results
9. Post-processing StarPU execution traces
- 9.1. StarVZ configuration
- 9.2. Plotting FXT traces

Reproducibility of numerical experiments has always been a complex matter. Building exactly the same software environment on multiple computing platforms may be long, tedious and sometimes virtually impossible to be done manually.

Within our research work, we put strong emphasis on ensuring the reproducibility of the experimental environment. On one hand, we make use of the GNU Guix guix transactional package manager allowing for a complete reproducibility of software environments across different machines. On the other hand, we rely on the principles of literate programming Knuth84 in an effort to maintain an exhaustive, clear and accessible documentation of our experiments and the associated environment.

This document represents such a documentation for the source files commonly required for constructing our experimental environments and running the experiments. It is written using Org mode for Emacs Dominik18,emacs which defines the Org markup language allowing to combine formatted text, images and figures with traditional source code. The .org source this document is based on features the source code blocks of multiple scripts and configuration files that can to be extracted OrgTangle into separate source files ¹.

Eventually, the document can be exported to various output formats OrgExport such as the HTML page you are currently viewing.

1. Dealing with proprietary source code

To perform experiments, we use the benchmark suite test_FEMBEM and the acoustic application actipole from Airbus. test_FEMBEM is a part of their scab software package. scab as well as its direct dependencies, mpf and hmat, are proprietary packages with closed source code residing in private Git repositories. Before setting up an experimental software environment using Guix, we have to acquire their local copies.

The getsrc.sh script we define in this section is meant for acquiring the source tarballs corresponding to user-specified versions of the Airbus hmat, mpf, scab and actipole packages.

It begins with a help message function accessible through the -h option.

function help() {
  echo "Retrieve the source tarballs corresponding to user-specified versions" \
       "of the Airbus 'hmat', 'mpf', 'scab' and 'actipole' packages." >&2
  echo "Usage: $0 [options]" >&2
  echo >&2
  echo "Options:" >&2
  echo "  -A                        Do not acquire sources for 'actipole'." >&2
  echo "  -h                        Show this help message and exit." >&2
  echo "  -o PATH                   Place the output source tarballs into the" \
       "directory located at PATH." >&2
  echo "  -v SCRIPT[,SCRIPT,...]    Consider the version information given" \
       "by the environment variables defined in SCRIPT. Multiple SCRIPTs can" \
       "be specified to retrieve multiple different versions at once." >&2
}

We use a generic error message function. The error message to print is expected to be the first argument to the function. If not present, the function displays a generic error message.

function error() {
  if test $# -lt 1;
  then
    echo "An unknown error occurred!" >&2
  else
    echo "Error: $1" >&2
  fi
}

ACQUIRE_ACTIPOLE decides whether the sources for the actipole package should be acquired. As this package is rarely used, using the -A option it is convenient to disable the preparation of source tarball for it.

ACQUIRE_ACTIPOLE=1

OUTPUT_DIR specifies the location where to store the output source tarballs of the Airbus packages. This can be modified using the -o option. The default location is $HOME/src.

OUTPUT_DIR="$HOME/src"

VERSIONS shall store the version information of the packages to retrieve. The user must use the -v option to specify at least one shell script file providing the definition of the following environment variables: HMAT_PACKAGE, HMAT_VERSION, HMAT_COMMIT, HMAT_BRANCH, MPF_PACKAGE, MPF_VERSION, MPF_COMMIT, MPF_BRANCH, SCAB_PACKAGE, SCAB_VERSION, SCAB_COMMIT, SCAB_BRANCH, ACTIPOLE_PACKAGE, ACTIPOLE_VERSION, ACTIPOLE_COMMIT and ACTIPOLE_BRANCH. For each software package, they indicate the name of the package in Guix, the commit and the branch in the associated Git repository to consider. Note that if the -A option is specified, the ACTIPOLE_* variables are unnecessary.

VERSIONS=""

At this stage, we are ready to parse the options and check the validity of option arguments where applicable.

while getopts ":Aho:v:" option;
do
  case $option in
    A)
      ACQUIRE_ACTIPOLE=0
      ;;
    o)
      OUTPUT_DIR=$OPTARG

      if test ! -d "$OUTPUT_DIR";
      then
        error "'$OUTPUT_DIR' is not a valid directory!"
        exit 1
      fi
      ;;
    v)
      VERSIONS=$(echo $OPTARG | sed 's/,/\n/g')
      ;;

We must also take into account unknown options, missing option arguments or syntax mismatches as well as the -h option.

    \?)
      error "Arguments mismatch! Invalid option '-$OPTARG'."
      echo
      help
      exit 1
      ;;
    :)
      error "Arguments mismatch! Option '-$OPTARG' expects an argument!"
      echo
      help
      exit 1
      ;;
    h | *)
      help
      exit 0
      ;;
  esac
done

At first, we verify whether the -v option has been specified by the user.

if test "$VERSIONS" == "";
then
  error "No version information has been specified."
  exit 1
fi

Then, we iterate over the version information script files given through the -v option and:

for spec in $VERSIONS;
do

check if the current version information script is a valid file and source it,

  if test ! -f $spec;
  then
    error "'$spec' is not a valid version information script."
    exit 1
  fi
  
  source $spec

prepare some useful environment variables,

  HMAT_BASENAME="$HMAT_PACKAGE-$HMAT_VERSION"
  HMAT_TARBALL="$OUTPUT_DIR/$HMAT_BASENAME.tar.gz"
  MPF_BASENAME="$MPF_PACKAGE-$MPF_VERSION"
  MPF_TARBALL="$OUTPUT_DIR/$MPF_BASENAME.tar.gz"
  SCAB_BASENAME="$SCAB_PACKAGE-$SCAB_VERSION"
  SCAB_TARBALL="$OUTPUT_DIR/$SCAB_BASENAME.tar.gz"
  ACTIPOLE_BASENAME="$ACTIPOLE_PACKAGE-$ACTIPOLE_VERSION"
  ACTIPOLE_TARBALL="$OUTPUT_DIR/$ACTIPOLE_BASENAME.tar.gz"

remove any previous clones of the Airbus repositories in OUTPUT_DIR,

  rm -rf $OUTPUT_DIR/$HMAT_BASENAME $OUTPUT_DIR/$MPF_BASENAME \
     $OUTPUT_DIR/$SCAB_BASENAME $OUTPUT_DIR/$ACTIPOLE_BASENAME $HMAT_TARBALL \
     $MPF_TARBALL $SCAB_TARBALL
  
  if test $ACQUIRE_ACTIPOLE -eq 1;
  then
    rm -rf $ACTIPOLE_TARBALL
  fi

make fresh clones and checkout the required revisions,

  git clone --recurse-submodules --single-branch --branch $HMAT_BRANCH \
      https://aseris-lab.info/airbus/hmat $OUTPUT_DIR/$HMAT_BASENAME
  cd $OUTPUT_DIR/$HMAT_BASENAME
  git checkout $HMAT_COMMIT
  git submodule update

  git clone --single-branch --branch $MPF_BRANCH \
      https://aseris-lab.info/airbus/mpf $OUTPUT_DIR/$MPF_BASENAME
  cd $OUTPUT_DIR/$MPF_BASENAME
  git checkout $MPF_COMMIT

  git clone --single-branch --branch $SCAB_BRANCH \
      https://aseris-lab.info/airbus/scab $OUTPUT_DIR/$SCAB_BASENAME
  cd $OUTPUT_DIR/$SCAB_BASENAME
  git checkout $SCAB_COMMIT

  if test $ACQUIRE_ACTIPOLE -eq 1;
  then
    git clone --single-branch --branch $ACTIPOLE_BRANCH \
        https://aseris-lab.info/airbus/actipole $OUTPUT_DIR/$ACTIPOLE_BASENAME
    cd $OUTPUT_DIR/$ACTIPOLE_BASENAME
    git checkout $ACTIPOLE_COMMIT
  fi

verify that the cloned repositories are valid directories,

  if test ! -d $OUTPUT_DIR/$HMAT_BASENAME || \
      test ! -d $OUTPUT_DIR/$MPF_BASENAME || \
      test ! -d $OUTPUT_DIR/$SCAB_BASENAME || \
      (test $ACQUIRE_ACTIPOLE -eq 1 &&
         test ! -d $OUTPUT_DIR/$ACTIPOLE_BASENAME);
  then
    error "Failed to clone the Airbus reporitories!"
    exit 1
  fi

remove the .git folders from inside the clones to shrink the size of the final tarball created using the tar command,

  rm -rf $OUTPUT_DIR/$HMAT_BASENAME/.git \
     $OUTPUT_DIR/$HMAT_BASENAME/hmat-oss/.git $OUTPUT_DIR/$MPF_BASENAME/.git \
     $OUTPUT_DIR/$SCAB_BASENAME/.git

  if test $ACQUIRE_ACTIPOLE -eq 1;
  then
    rm -rf $OUTPUT_DIR/$ACTIPOLE_BASENAME/.git
    tar -czf $ACTIPOLE_TARBALL -C $OUTPUT_DIR $ACTIPOLE_BASENAME
  fi

  tar -czf $HMAT_TARBALL -C $OUTPUT_DIR $HMAT_BASENAME
  tar -czf $MPF_TARBALL -C $OUTPUT_DIR $MPF_BASENAME
  tar -czf $SCAB_TARBALL -C $OUTPUT_DIR $SCAB_BASENAME

check if the tarballs were created and remove the clones.

  if test ! -f $HMAT_TARBALL || test ! -f $MPF_TARBALL || \
      test ! -f $SCAB_TARBALL || (test $ACQUIRE_ACTIPOLE -eq 1 &&
                                    test ! -f $ACTIPOLE_TARBALL);
  then
    error "Failed to create tarballs!"
    exit 1
  fi

  rm -rf $OUTPUT_DIR/$HMAT_BASENAME $OUTPUT_DIR/$MPF_BASENAME \
     $OUTPUT_DIR/$SCAB_BASENAME

  if test $ACQUIRE_ACTIPOLE -eq 1;
  then
    rm -rf $OUTPUT_DIR/$ACTIPOLE_BASENAME
  fi
done

1.1. Defining, scheduling and computing benchmarks

To automatize the generation and the computation of benchmarks, we use gcvb gcvb, an open-source tool developed at Airbus. gcvb allows us to define benchmarks, generate corresponding shell job scripts for every benchmark or a selected group of benchmarks, submit these job scripts for execution, then gather and optionally validate or post-process the results. To generate multiple variants of the same benchmark, gcvb supports templates.

The configuration of gcvb, the benchmark definitions as well as scheduling parameters are specific to a given experimental study. We thus describe these in the study-specific documentations. Follow this link to see the list of all of our experimental studies.

2. Storage resources monitoring

To continuously measure the amount of storage (RAM, swap file and optionally hard drive) resources consumed during each benchmark run, we launch test_FEMBEM through the Python script rss.py producing as many RAM, swap and disk usage logs as there are MPI processes created during benchmark execution. The peak values are included at the very end of these log files. Note that measures are taken regularly each second and the output units are mibibytes [MiB].

import subprocess
import sys
import threading
import time
import os
import psutil
from datetime import datetime

After importing necessary Python modules, we begin by defining a help message function that can be triggered with the -h option.

def help():
    print("Monitor memory usage of PROGRAM.\n",
          "Usage: ", sys.argv[0], " [options] [PROGRAM]\n\n",
          "Options:\n  -h    Show this help message.\n",
          "  --with-hdd    Extend monitoring to disk usage.\n",
          "  --with-io    Extend monitoring to I/O volume and bandwidth.\n",
          file = sys.stderr, sep = "")

We must check if at least one argument has been passed to the script. It takes as arguments the executable the resource consumption of which shall be measured together with the arguments of the latter. Otherwise, one can specify the -h option to make the script print a help message on how to use it. Also, the --with-hdd and --with-io switches allow one to extend the monitoring to disk usage and to I/O volume and bandwidth, respectively.

if len(sys.argv) < 2:
    print("Error: Arguments mismatch!\n", file = sys.stderr)
    help()
    sys.exit(1)

if sys.argv[1] == "-h":
    help()
    sys.exit(0)

with_hdd = False
with_io = False
if "--with-hdd" in sys.argv:
    with_hdd = True
if "--with-io" in sys.argv:
    with_io = True

Follow some function definitions. At first, a function to determine the MPI rank of the currently monitored process.

def mpi_rank():
    rank = 0
    for env_name in ['MPI_RANKID', 'OMPI_COMM_WORLD_RANK', 'PMIX_RANK']:
        try:
            rank = int(os.environ[env_name])
            break
        except:
            continue
    return rank

We also need a function to determine and format the current date and time. Indeed, we include a timestamp in the output log files everytime we record a measure. This helps us to synchronize the records with other monitoring tools.

def timestamp():
    now = datetime.now()
    return now.strftime("%Y/%m/%dT%H:%M:%S")

The script monitors RAM and swap usages and optionally disk usage and bandwidth. On Linux, the values can be obtained from the /proc filesystem.

Memory usage as well as other statistics of a particular process are stored in a human-readable form in /proc/<pid>/status where <pid> is the process identifier (PID). In this file, the fields VmRSS and VmSwap hold the amount of RAM and the amount of swap memory used by the process at instant $t$, respectively proc5. See the associated function below.

def rss(f, pid):
    f.seek(0, 0)
    lines = f.readlines()
    datetime = timestamp()
    VmRSS = -1024.
    VmSwap = -1024.
    
    try:
        for line in lines:
            if line.startswith("VmRSS"):
                VmRSS = int(line.split()[1])
            elif line.startswith("VmSwap"):
                VmSwap = int(line.split()[1])
    except ValueError:
        print(
            "Warning: Failed to decode '/proc/%d/status' at %s."
            % (pid, datetime), file = sys.stderr
        )
        pass
    return datetime, (VmRSS / 1024.), (VmSwap / 1024.)

Input/output (I/O) statistics of a particular process are stored in a human-readable form in /proc/<pid>/io where <pid> is the process identifier (PID). In this file, the fields read_bytes and write_bytes hold the amount of bytes read and wrote by the process at instant $t$, respectively proc5. Based on these values, we then compute I/O bandwidth. Finally, the cancelled_byte_writes field gives the amount of bytes that, in the end, were not effectively written to the storage device. See the associated function below.

def io(f, pid, last_read, last_write):
    f.seek(0, 0)
    lines = f.readlines()
    datetime = timestamp()
    read_bytes = 0.
    read_rate = 0.
    write_bytes = 0.
    write_rate = 0.
    cancelled_byte_writes = 0.
    
    try:
        for line in lines:
            if line.startswith("read_bytes"):
                read_bytes = int(line.split()[1])
                read_rate = read_bytes - (last_read * 1024.)
            elif line.startswith("write_bytes"):
                write_bytes = int(line.split()[1])
                write_rate = write_bytes - (last_write * 1024.)
            elif line.startswith("cancelled_byte_writes"):
                cancelled_byte_writes = int(line.split()[1])
    except ValueError:
        print(
            "Warning: Failed to decode '/proc/%d/io' at %s."
            % (pid, datetime), file = sys.stderr
        )
        pass
    return datetime, (read_bytes / 1024.), (write_bytes / 1024.), \
        (cancelled_byte_writes / 1024.), (read_rate / 1024.), \
        (write_rate / 1024.)

To be able to monitor disk space used during benchmarks, we assign to each run its specific temporary folder all the files used by test_FEMEBM are put into. Finally, we monitor the evolution of the size of TMPDIR in our Python script using the du Linux command.

def hdd(tmpdir):
    du = subprocess.Popen(["du", "-s", tmpdir], stdout = subprocess.PIPE)
    datetime = timestamp()
    HddNow = -1024.
    
    try:
        HddNow = int(du.stdout.read().decode("utf-8").split("\t")[0])
    except ValueError:
        print(
            "Warning: Failed to decode the stdout of 'du' at %s." % datetime,
            file = sys.stderr
        )
        pass
    
    return datetime, (HddNow / 1024.)

Before entering into the monitoring loop, we need to:

initialize variables to store peak values and previously read values for I/O bandwidth computation (used only in presence of the --with-io option);

VmHWM = float(0)
SwapPeak = float(0)
HddPeak = float(0)
IOReadPeak = float(0)
IOWritePeak = float(0)

IOLastRead = float(0)
IOLastWrite = float(0)

get the path of the temporary folder from the TMPDIR environment variable. If the variable is not set and the --with-hdd option was specified, /tmp is returned to allow the script to detect that no separate temporary folder has been created and exit with failure to prevent taking potentially incorrect measures;
```
tmpdir = os.getenv('TMPDIR', '/tmp')

if with_hdd and tmpdir == '/tmp':
    print('Error: Temporary files were not redirected!', file = sys.stderr)
    sys.exit(1)
```

launch the program to monitor and get the rank of the currently monitored process;

myargs = sys.argv[1:]
if with_hdd == True and with_io == True:
    myargs = sys.argv[3:]
elif with_hdd == True or with_io == True:
    myargs = sys.argv[2:]

p = subprocess.Popen(myargs)
rank = mpi_rank()

open the /proc/<pid>/status and /proc/<pid>/io (if the --with-io option was used) files to be able to collect the requested data about the monitored process.

proc_status = open("/proc/%d/status" % p.pid, "r")
proc_io = None

if with_io == True:
    proc_io = open("/proc/%d/io" % p.pid, "r")

At this point, we begin the monitoring loop. The latter breaks when the monitored process exits. Note that the measured values are initially stored in memory. They are dumped into the corresponding log files at the end of the benchmark execution. This is to prevent frequent disk accesses that may impact the execution time of the monitored benchmark.

rss_log = []
hdd_log = []
io_log = []

while p.returncode == None:

The following tasks are performed to gather necessary results:

get current RAM and swap usage, update the peak values if necessary and add them to the log;

    RamSwapNow = rss(proc_status, p.pid)

    if RamSwapNow[1] > VmHWM:
        VmHWM = RamSwapNow[1]

    if RamSwapNow[2] > SwapPeak:
        SwapPeak = RamSwapNow[2]

    rss_log.append(RamSwapNow)

collect disk usage statistics, update the peak disk space usage if necessary and add the corresponding value converted from bytes to kibibytes to the log;

    if with_hdd == True:
        HddNow = hdd(tmpdir)

        if HddNow[1] > HddPeak:
            HddPeak = HddNow[1]

        hdd_log.append(HddNow)

collect I/O statistics, update the peak I/O bandwidth if necessary and add the corresponding values to the log;

    if with_io == True:
        IONow = io(proc_io, p.pid, IOLastRead, IOLastWrite)

        if IONow[4] > IOReadPeak:
            IOReadPeak = IONow[4]

        if IONow[5] > IOWritePeak:
            IOWritePeak = IONow[5]

        IOLastRead = IONow[1]
        IOLastWrite = IONow[2]
        io_log.append(IONow)

sleep for one second and repeat.
```
    time.sleep(1)
    p.poll()
```

Eventually, before exiting, we append usage peaks to the logs and dump the latter to disk.

print("Writing storage resource usage logs...")

m = open("rss-%d.log" % rank, "w")

for entry in rss_log:
    m.write("%s\t%g\t%g\n" % (entry[0], entry[1], entry[2]))

m.write("%g\t%g\n" % (VmHWM, SwapPeak))
m.flush()
m.close()

proc_status.close()

if with_hdd == True:
    d = open("hdd-%d.log" % rank, "w")

    for entry in hdd_log:
        d.write("%s\t%g\n" % (entry[0], entry[1]))

    d.write("%g\n" % HddPeak)
    d.flush()
    d.close()

if with_io == True:
    io = open("io-%d.log" % rank, "w")
    
    for entry in io_log:
        io.write("%s\t%g\t%g\t%g\t%g\t%g\n" % \
                 (entry[0], entry[1], entry[2], entry[3], entry[4], entry[5]))

    io_log_last = io_log[-1]
    io.write("%g\t%g\t%g\t%g\t%g\n" % \
             (io_log_last[1], io_log_last[2], io_log_last[3], IOReadPeak, \
              IOWritePeak))
    io.flush()
    io.close()
    proc_io.close()

sys.exit(p.returncode)

3. Energetic profiling

energy_scope ESwebsite,ESjcad2021 is a program benchmarking tool dedicated primarily to the creation of energetic profiles of HPC applications. To configure what and how it should be monitored, it uses a configuration in JSON format. Among other parameters, it allows the choose the acquisition frequency and profile.

In this section, we define a gcvb template (see Section 1.1) configuration file. We keep the default values for the most of the parameters. Then, the {es[interval]} and the {es[profile]} placeholders allow us to specify respectively the acquisition interval (in milliseconds) and profile, e.g. et for energy and CPU temperature monitoring or e for energy-only monitoring.

Note that the usage of double brackets allows us to parse this template in Python scripts.

{{
    "parameter": {{
        "comment":"configuration file template",
        "version":"2022-01",
        "owner":"energy_scope",
        "cpu_info":"/proc/cpuinfo",
        "nvidia_gpu_info":"/proc/driver/nvidia/gpus",
        "stop_file_prefix":"/tmp/es_read_rt_stop_flag_",
        "synchro_file_saved_prefix":"wrk/energy_scope_tmp_file_",
        "synchro_started_prefix":"wrk/energy_scope_tmp_started_",
        "msr_cmd":"rdmsr",
        "opt_enode":"node",
        "opt_estat":"shell",
        "opt_profile":"none",
        "opt_send":"none",
        "analyseafteracquisition":"no",
        "eprofile_period(ms)":{es[interval]},
        "default_read_rt_profile":"{es[profile]}"
    }},
    "data": {{
        "intel": {{
            "generic" : {{
                "read_rt_idle_time":{es[idle_time]},
                "registers": {{
                    "0x010": {{"when":"ae", "pct":"package"}},
                    "0x0ce": {{"when":"b", "pct":"package"}},
                    "0x0e7": {{"when":"N", "pct":"thread"}},
                    "0x0e8": {{"when":"a", "pct":"thread"}},
                    "0x198": {{"when":"N", "pct":"core"}},
                    "0x19c": {{"when":"at", "pct":"core"}},
                    "0x1A2": {{"when":"b", "pct":"package"}},
                    "0x606": {{"when":"b", "pct":"package"}},
                    "0x610": {{"when":"b", "pct":"package"}},
                    "0x611": {{"when":"ae", "pct":"package"}},
                    "0x613": {{"when":"a", "pct":"package"}},
                    "0x614": {{"when":"b", "pct":"package"}},
                    "0x619": {{"when":"ae", "pct":"package"}},
                    "0x620": {{"when":"b", "pct":"package"}},
                    "0x621": {{"when":"a", "pct":"package"}},
                    "0x639": {{"when":"ae", "pct":"package"}},
                    "0x641": {{"when":"ae", "pct":"package"}},
                    "0x64e": {{"when":"a", "pct":"thread"}},
                    "0x770": {{"when":"b", "pct":"package"}},
                    "0x771": {{"when":"b", "pct":"thread"}},
                    "0x774": {{"when":"b", "pct":"thread"}}
                }}
            }}
        }},
        "amd":{{
            "generic" : {{
                "read_rt_idle_time":{es[idle_time]},
                "registers": {{
                    "0x010": {{"when":"ae", "pct":"package"}},
                    "0x0e7": {{"when":"a", "pct":"thread"}},
                    "0x0e8": {{"when":"a", "pct":"thread"}},
                    "0xC0000104": {{"when":"b", "pct":"thread"}},
                    "0xC0010064": {{"when":"b", "pct":"core"}},
                    "0xC0010299": {{"when":"b", "pct":"package"}},
                    "0xC001029A": {{"when":"a", "pct":"core"}},
                    "0xC001029B": {{"when":"ae", "pct":"package"}}
                }}
            }}
        }}
    }}
}}

4. Result parsing

The Shell script parse.sh parses data from a test_FEMBEM benchmark or an actipole computation and store the result to a comma-separated values .csv file.

We begin with a help message function that can be triggered using the -h option.

function help() {
  echo -n "Parse data from a test_FEMBEM run and store the result to a " >&2
  echo "comma-separated (*.csv) file." >&2
  echo "Usage: ./parse.sh [options]" >&2
  echo >&2
  echo "Options:" >&2
  echo "  -h                             Show this help message." >&2
  echo -n "  -s FILE                        Read the standard output of " >&2
  echo "test_FEMBEM from FILE rather than from the standard input." >&2
  echo -n "  -r PATH                        Read memory and disk usage " >&2
  echo "logs provided by rss.py from the directory at PATH." >&2
  echo -n "  -l FILE=FUNCTION[,FUNCTION]    Read the trace log of the " >&2
  echo -n "test_FEMBEM run from FILE and parse execution time and " >&2
  echo "iteration count of one or more FUNCTION(s)."
  echo -n "  -K KEY=VALUE[,KEY=VALUE]       Parse one or more additional " >&2
  echo "KEY=VALUE pairs to be added to the output (*.csv) file." >&2
  echo "  -H                             Do not print header on output." >&2
  echo -n "  -o FILE                        Specify the file to store the " >&2
  echo "output of the parsing to." >&2
}

We use a generic error message function. The error message to print is expected to be the first argument to the function. If not present, the function displays a generic error message.

function error() {
  if test $# -lt 1;
  then
    echo "An unknown error occurred!" >&2
  else
    echo "Error: $1" >&2
  fi
}

There should be at least one argument provided on the command line.

if test $# -lt 1;
then
  help
  exit 1
fi

STDOUT holds the path to the input file containing the standard output to parse.

STDOUT=""

RSS contains path and name pattern of input files containing the log of the resource monitoring script (see Section 2).

RSS=""

TRACE is the input file containing the trace call log produced by test_FEMBEM.

TRACE=""

OUTPUT is the path to the output file to store the parsed values to.

OUTPUT=""

FUNCTIONS contains names of functions to parse information about from trace call logs.

FUNCTIONS=""

CUSTOM_KV holds custom key-value pairs to be included in the output *.csv file.

CUSTOM_KV=""

DISABLE_HEADING toggles heading in the output *.csv file.

DISABLE_HEADING=0

At this point, we parse provided arguments and check the validity of option values.

while getopts ":hs:r:l:K:o:H" option;
do
  case $option in

The standard output to parse can be passed to the script either through its standard input or in a file specified using the -s option.

    s)
      STDOUT=$OPTARG

      if test ! -f $STDOUT;
      then
        error "'$STDOUT' is not a valid file!"
        exit 1
      fi
      ;;

When the -r option is provided, the script shall also parse resource monitoring logs.

    r)
      RSS=$OPTARG

      if test ! -d $RSS;
      then
        error "'$RSS' is not a valid directory!"
        exit 1
      fi
      ;;

When the -l option is provided, the script shall read the trace call log from the file passed as argument and parse execution time, iteration count and floating-point operations (Flops) of one or more functions.

    l)
      TRACE=$(echo $OPTARG | cut -d '=' -f 1)
      FUNCTIONS=$(echo $OPTARG | cut -d '=' -f 2 | sed 's/,/\t/g')

      if test ! -f $TRACE;
      then
        error "'$TRACE' is not a valid file!"
        exit 1
      fi
      ;;

The -K option allows to add one or more KEY=VALUE pairs into the output *.csv file.

    K)
      if test "$CUSTOM_KV" == "";
      then
        CUSTOM_KV="$OPTARG"
      else
        CUSTOM_KV="$CUSTOM_KV,$OPTARG"
      fi
      ;;

The -H option toggles printing of the header in the output *.csv file.

    H)
      DISABLE_HEADING=1
      ;;

Using the -o option, one can specify the name of the output *.csv file.

    o)
      OUTPUT=$OPTARG
      ;;

Eventually, we take care of unknown options or missing arguments and raise an error if any.

    \?) # Unknown option
      error "Arguments mismatch! Invalid option '-$OPTARG'."
      echo
      help
      exit 1
      ;;
    :) # Missing option argument
      error "Arguments mismatch! Option '-$OPTARG' expects an argument!"
      echo
      help
      exit 1
      ;;
    h | *)
      help
      exit 0
      ;;
  esac
done

If the standard output to parse is not provided in a file, we try to read its content from the standard input of the script.

if test "$STDOUT" == "";
then
  rm -rf .input

  while read line;
  do
    echo $line >> .input
  done

  STDOUT=.input
fi

If no standard output to parse is found, we abort the script and raise an error.

if test $(wc --bytes $STDOUT | cut -d' ' -f 1) -lt 1;
then
  error "'$STDOUT' contains no 'test_FEMBEM' standard output to parse!"
  exit 1
fi

The treatment starts by separating custom key-value pairs, if any, by a tabulation character, so they can be looped over using a for loop.

CUSTOM_KV=$(echo $CUSTOM_KV | sed 's/,/\t/g')

We print the header line to the output file if it was not explicitly disabled using the -H option.

if test $DISABLE_HEADING -ne 1;
then
  echo -n "processes,by,mapping,ranking,binding,omp_thread_num," > $OUTPUT
  echo -n "mkl_thread_num,hmat_ncpu,mpf_max_memory,nbpts,radius," >> $OUTPUT
  echo -n "height,nbrhs,step_mesh,nbem,nbpts_lambda,lambda," >> $OUTPUT
  echo -n "thread_block_size,proc_block_size,disk_block_size," >> $OUTPUT
  echo -n "tps_cpu_facto_mpf,tps_cpu_solve_mpf,error,symmetry," >> $OUTPUT
  echo -n "h_assembly_accuracy,h_recompression_accuracy," >> $OUTPUT
  echo -n "assembled_size_mb,tps_cpu_facto,factorized_size_mb," >> $OUTPUT
  echo -n "tps_cpu_solve,mumps_blr,mumps_blr_accuracy," >> $OUTPUT
  echo -n "mumps_blr_variant,coupled_method,coupled_nbrhs," >> $OUTPUT
  echo -n "size_schur,cols_schur,sparse_ooc,singularity_container," >> $OUTPUT
  echo -n "slurm_jobid,system_kind,csc_tot,csc_init,csc_assemb," >> $OUTPUT
  echo -n "csc_analyze,csc_facto,csc_bcast,csc_retrieval,csc_free," >> $OUTPUT
  echo -n "es_cpu,es_dram,es_duration,rm_peak,rm_values,swap_peak," >> $OUTPUT
  echo -n "swap_values,hdd_peak,hdd_values,io_read_peak," >> $OUTPUT
  echo -n "io_read_values,io_write_peak,io_write_values," >> $OUTPUT
  echo -n "io_cwrite_peak,io_cwrite_values" >> $OUTPUT

The common header entries are followed by custom key-value pairs and additional details about selected functions acquired from the trace call logs, if specified.

  for kv in $CUSTOM_KV;
  do
    KEY=$(echo $kv | cut -d '=' -f 1)
    echo -n ",$KEY" >> $OUTPUT
  done

  for f in $FUNCTIONS;
  do
    echo -n ","$f"_exec_time,"$f"_nb_execs,"$f"_flops" >> $OUTPUT
  done

  echo >> $OUTPUT
fi

Next, we parse all the interesting information from the provided standard output.

processes=$(cat $STDOUT | grep "<PERFTESTS> MPI process count" | uniq | \
                  cut -d '=' -f 2 | sed 's/[^0-9]//g')
mapping=$(cat $STDOUT | grep "<PERFTESTS> MPI processes mapped by" | uniq | \
                  cut -d '=' -f 2 | sed 's/[^a-z]//g')
ranking=$(cat $STDOUT | grep "<PERFTESTS> MPI processes ranked by" | uniq |\
                  cut -d '=' -f 2 | sed 's/[^a-z]//g')
binding=$(cat $STDOUT | grep "<PERFTESTS> MPI processes bound to" | uniq | \
                  cut -d '=' -f 2 | sed 's/[^a-z]//g')
omp_thread_num=$(cat $STDOUT | grep "OpenMP thread number" | cut -d '=' -f 2 | \
                  sed 's/[^0-9]//g')
mkl_thread_num=$(cat $STDOUT | grep "MKL thread number" | cut -d '=' -f 2 | \
                    sed 's/[^0-9]//g')
hmat_ncpu=$(cat $STDOUT | grep "HMAT_NCPU" | cut -d '=' -f 4 | \
              sed 's/[^0-9]//g')
mpf_max_memory=$(cat $STDOUT | grep "MPF_MAX_MEMORY" | cut -d '=' -f 2 | \
                  sed 's/[^0-9]//g')
nbpts=$(cat $STDOUT | grep "<PERFTESTS> NbPts =" | cut -d '=' -f 2 | \
          sed 's/[^0-9]//g')
radius=$(cat $STDOUT | grep "Reading radius" | cut -d '=' -f 2 | \
          sed 's/[^0-9.]//g')
height=$(cat $STDOUT | grep "Reading height" | cut -d '=' -f 2 | \
          sed 's/[^0-9.]//g')
nbrhs=$(cat $STDOUT | grep "<PERFTESTS> NbRhs" | cut -d '=' -f 2 | \
          sed 's/[^0-9]//g')
step_mesh=$(cat $STDOUT | grep "<PERFTESTS> StepMesh" | cut -d '=' -f 2 | \
              sed 's/[^-0-9e.+]//g')
nbem=$(cat $STDOUT | grep "<PERFTESTS> NbPtsBEM" | cut -d '=' -f 2 | \
        sed 's/[^0-9]//g')
nbpts_lambda=$(cat $STDOUT | grep "<PERFTESTS> nbPtLambda" | \
                  cut -d '=' -f 2 | sed 's/[^-0-9e.+]//g')
lambda=$(cat $STDOUT | grep "<PERFTESTS> Lambda" | cut -d '=' -f 2 | \
          sed 's/[^-0-9e.+]//g')
thread_block_size=$(cat $STDOUT | grep "thread block" | cut -d ':' -f 2 | \
                      cut -d 'x' -f 1 | sed 's/[^0-9]//g')
proc_block_size=$(cat $STDOUT | grep "proc block" | cut -d ':' -f 2 | \
                    cut -d 'x' -f 1 | sed 's/[^0-9]//g')
disk_block_size=$(cat $STDOUT | grep "disk block" | cut -d ':' -f 2 | \
                    cut -d 'x' -f 1 | sed 's/[^0-9]//g')
tps_cpu_facto_mpf=$(cat $STDOUT | grep "<PERFTESTS> TpsCpuFactoMPF =" | \
                      cut -d '=' -f 2 | sed 's/[^0-9.]//g')
tps_cpu_solve_mpf=$(cat $STDOUT | grep "<PERFTESTS> TpsCpuSolveMPF =" | \
                      cut -d '=' -f 2 | sed 's/[^0-9.]//g')
error=$(cat $STDOUT | grep "<PERFTESTS> Error" | cut -d '=' -f 2 | \
          sed 's/[^-0-9e.+]//g')

symmetry=$(cat $STDOUT | grep "Testing: Non-Symmetric matrices and solvers.")
if test "$symmetry" != "";
then
  symmetry="non-symmetric"
else
  symmetry="symmetric"
fi

h_assembly_accuracy=$(cat $STDOUT | grep "\[HMat\] Compression epsilon" | \
                        cut -d ':' -f 2 | sed 's/[^-0-9e.+]//g')
h_recompression_accuracy=$(cat $STDOUT | \
                             grep "\[HMat\] Recompression epsilon" | \
                             cut -d ':' -f 2 | sed 's/[^-0-9e.+]//g')
assembled_size_mb=$(cat $STDOUT | grep "<PERFTESTS> AssembledSizeMb" | \
                      cut -d '=' -f 2 | sed 's/[^0-9.]//g')
tps_cpu_facto=$(cat $STDOUT | grep "<PERFTESTS> TpsCpuFacto =" | \
                  cut -d '=' -f 2 | sed 's/[^0-9.]//g')
factorized_size_mb=$(cat $STDOUT | grep "<PERFTESTS> FactorizedSizeMb" | \
                       cut -d '=' -f 2 | sed 's/[^0-9.]//g')
tps_cpu_solve=$(cat $STDOUT | grep "<PERFTESTS> TpsCpuSolve =" | \
                  cut -d '=' -f 2 | sed 's/[^0-9.]//g')
mumps_blr="NA"

if test "$(cat $STDOUT | grep 'NOT Using Block-Low-Rank compression')" != "";
then
  mumps_blr=0
elif test "$(cat $STDOUT | grep 'Using Block-Low-Rank compression')" != "";
then
  mumps_blr=1
fi

mumps_blr_accuracy=$(cat $STDOUT | \
                       grep "Accuracy parameter for Block-Low-Rank" | \
                       cut -d ':' -f 2 | sed 's/[^-0-9e.+]//g')
mumps_blr_variant=$(cat $STDOUT | grep "BLR Factorization variant" | \
                      cut -d ':' -f 2 | sed 's/[^-0-9e.+]//g')

coupled_method=""
if test "$(cat $STDOUT | grep 'Multi solve method.')" != "";
then
  coupled_method="multi-solve"
elif test "$(cat $STDOUT | grep 'Multi facto method.')" != "";
then
  coupled_method="multi-facto"
fi

coupled_nbrhs=$(cat $STDOUT | grep "Number of simultaneous RHS" | \
                  cut -d ':' -f 2 | sed 's/[^0-9.]//g')

if test "$coupled_nbrhs" == "";
then
  coupled_nbrhs=32 # Default value
fi

size_schur=$(cat $STDOUT | grep "Size of the block Schur" | \
                  cut -d ':' -f 2 | sed 's/[^0-9.]//g')

cols_schur=$(cat $STDOUT | \
               grep "Number of columns in the H-matrix Schur block" | \
               cut -d ':' -f 2 | sed 's/[^0-9.]//g')

sparse_ooc=$(cat $STDOUT | grep "[mumps] Out-of-core.")
if test "$sparse_ooc" != "";
then
  sparse_ooc=1
else
  sparse_ooc=0
fi

singularity_container=$(cat $STDOUT | grep "<PERFTESTS> Singularity image")
if test "$singularity_container" != "";
then
  singularity_container=$(echo $singularity_container | cut -d '=' -f 2 | \
                            sed 's/[[:space:]]*//g')
else
  singularity_container="NA"
fi

slurm_jobid=$(cat $STDOUT | grep "<PERFTESTS> Slurm job identifier" | uniq | \
                  cut -d '=' -f 2 | sed 's/[^0-9.]//g')

if test "$(cat $STDOUT | grep 'Testing : FEM-BEM')" != "";
then
  system_kind="fem-bem"
elif test "$(cat $STDOUT | grep 'Testing : FEM')" != "";
then
  system_kind="fem"
elif test "$(cat $STDOUT | grep 'Testing : BEM')" != "";
then
  system_kind="bem"
else
  system_kind="unknown"
fi

csc_tot="0.0"
csc_init="0.0"
csc_assemb="0.0"
csc_analyze="0.0"
csc_facto="0.0"
csc_bcast="0.0"
csc_retrieval="0.0"
csc_free="0.0"

if test "$coupled_method" == "multi-facto";
then
  for line in $(cat $STDOUT | grep -o 'CreateSchurComplement.*A\[.*$' | \
                  sed 's/ /;/g');
  do
    short=$(echo $line | grep -o 'Timers.*$')

    csc_tot_value="0.0"
    if test "$(echo $line | grep 'QR_MUMPS')" != "";
    then
      csc_tot_value=$(echo $short | cut -d ';' -f 2)
    else
      csc_tot_value=$(echo $short | cut -d ';' -f 1 | sed 's/[^0-9.]//g')
    fi

    short=$(echo $line | grep -o 'init.*$')

    csc_init_value=$(echo $short | cut -d ';' -f 1 | cut -d '=' -f 2)
    csc_assemb_value=$(echo $short | cut -d ';' -f 2 | cut -d '=' -f 2)
    csc_analyze_value=$(echo $short | cut -d ';' -f 3 | cut -d '=' -f 2)
    csc_facto_value=$(echo $short | cut -d ';' -f 4 | cut -d '=' -f 2)

    csc_retrieval_value="0.0"
    csc_bcast_value="0.0"
    if test "$(echo $line | grep 'QR_MUMPS')" != "";
    then
      csc_retrieval_value=$(echo $short | cut -d ';' -f 5 | cut -d '=' -f 2)
    else
      csc_bcast_value=$(echo $short | cut -d ';' -f 5 | cut -d '=' -f 2)
    fi
    
    csc_free_value=$(echo $short | cut -d ';' -f 6 | cut -d '=' -f 2)

    csc_tot="$csc_tot + $csc_tot_value"
    csc_init="$csc_init + $csc_init_value"
    csc_assemb="$csc_assemb + $csc_assemb_value"
    csc_analyze="$csc_analyze + $csc_analyze_value"
    csc_facto="$csc_facto + $csc_facto_value"
    csc_bcast="$csc_bcast + $csc_bcast_value"
    csc_retrieval="$csc_retrieval + $csc_retrieval_value"
    csc_free="$csc_free + $csc_free_value"
  done

  csc_tot=$(echo $csc_tot | bc -l)
  csc_init=$(echo $csc_init | bc -l)
  csc_assemb=$(echo $csc_assemb | bc -l)
  csc_analyze=$(echo $csc_analyze | bc -l)
  csc_facto=$(echo $csc_facto | bc -l)
  csc_bcast=$(echo $csc_bcast | bc -l)
  csc_retrieval=$(echo $csc_retrieval | bc -l)
  csc_free=$(echo $csc_free | bc -l)
fi

es_cpu="NA"
es_dram="NA"
es_duration="NA"

if test -f energy_scope_estat*;
then
  es_estat=$(cat energy_scope_estat*)
  es_cpu=$(echo $es_estat | jq '.arch.data."joule(J)"."ecpu(J)"')
  es_dram=$(echo $es_estat | jq '.arch.data."joule(J)"."edram(J)"')
  es_duration=$(echo $es_estat | jq '."duration(sec)"')
fi

If the -r option was provided, we parse also storage resource monitoring logs. RAM and swap usage log files are named rss-x.log and disk usage log files are named hdd-x.log where $x$ is MPI process rank.

if test "$RSS" != "";
then
  rm_peak="0.0"
  rm_values=""
  swap_peak="0.0"
  swap_values=""
  hdd_peak="0.0"
  hdd_values=""
  io_read_peak="0.0"
  io_read_values=""
  io_write_peak="0.0"
  io_write_values=""
  io_cwrite_peak="0.0"
  io_cwrite_values=""

  for rss in $(ls $RSS | grep -e "^rss");
  do
    rm_value=$(cat $RSS/$rss | tail -n 1 | cut -f 1)
    rm_peak="$rm_peak + $rm_value"
    rm_values="$rm_value $rm_values"

    swap_value=$(cat $RSS/$rss | tail -n 1 | cut -f 2)
    swap_peak="$swap_peak + $swap_value"
    swap_values="$swap_value $swap_values"
  done

  for hdd in $(ls $RSS | grep -e "^hdd");
  do
    hdd_value=$(cat $RSS/$hdd | tail -n 1)
    hdd_peak="$hdd_peak + $hdd_value"
    hdd_values="$hdd_value $hdd_values"
  done

  for io in $(ls $RSS | grep -e "^io");
  do
    io_read_value=$(cat $RSS/$io | tail -n 1 | cut -f 2)
    io_read_peak="$io_read_peak + $io_read_value"
    io_read_values="$io_read_value $io_read_values"

    io_write_value=$(cat $RSS/$io | tail -n 1 | cut -f 3)
    io_write_peak="$io_write_peak + $io_write_value"
    io_write_values="$io_write_value $io_write_values"

    io_cwrite_value=$(cat $RSS/$io | tail -n 1 | cut -f 4)
    io_cwrite_peak="$io_cwrite_peak + $io_cwrite_value"
    io_cwrite_values="$io_cwrite_value $io_cwrite_values"
  done
  
  rm_peak=$(echo $rm_peak | bc -l)
  swap_peak=$(echo $swap_peak | bc -l)
  hdd_peak=$(echo $hdd_peak | bc -l)
  io_read_peak=$(echo $io_read_peak | bc -l)
  io_write_peak=$(echo $io_write_peak | bc -l)
  io_cwrite_peak=$(echo $io_cwrite_peak | bc -l)
fi

Eventually, we print parsed values, values from custom key-value pairs, if any, and additional function information possibly parsed from trace call logs.

echo -n "$processes,$by,$mapping,$ranking,$binding,$omp_thread_num," >> $OUTPUT
echo -n "$mkl_thread_num,$hmat_ncpu,$mpf_max_memory,$nbpts," >> $OUTPUT
echo -n "$radius,$height,$nbrhs,$step_mesh,$nbem,$nbpts_lambda," >> $OUTPUT
echo -n "$lambda,$thread_block_size,$proc_block_size," >> $OUTPUT
echo -n "$disk_block_size,$tps_cpu_facto_mpf,$tps_cpu_solve_mpf," >> $OUTPUT
echo -n "$error,$symmetry,$h_assembly_accuracy," >> $OUTPUT
echo -n "$h_recompression_accuracy,$assembled_size_mb," >> $OUTPUT
echo -n "$tps_cpu_facto,$factorized_size_mb,$tps_cpu_solve," >> $OUTPUT
echo -n "$mumps_blr,$mumps_blr_accuracy,$mumps_blr_variant," >> $OUTPUT
echo -n "$coupled_method,$coupled_nbrhs,$size_schur,$cols_schur," >> $OUTPUT
echo -n "$sparse_ooc,$singularity_container,$slurm_jobid," >> $OUTPUT
echo -n "$system_kind,$csc_tot,$csc_init,$csc_assemb,$csc_analyze," >> $OUTPUT
echo -n "$csc_facto,$csc_bcast,$csc_retrieval,$csc_free,$es_cpu," >> $OUTPUT
echo -n "$es_dram,$es_duration,$rm_peak,$rm_values,$swap_peak," >> $OUTPUT
echo -n "$swap_values,$hdd_peak,$hdd_values,$io_read_peak," >> $OUTPUT
echo -n "$io_read_values,$io_write_peak,$io_write_values," >> $OUTPUT
echo -n "$io_cwrite_peak,$io_cwrite_values" >> $OUTPUT

for kv in $CUSTOM_KV;
do
  VALUE=$(echo $kv | cut -d '=' -f 2)
  echo -n ",$VALUE" >> $OUTPUT
done

for f in $FUNCTIONS;
do
  exec_time=$(cat $TRACE | grep $f | cut -d '|' -f 2 | sed 's/[^0-9.]//g')
  nb_execs=$(cat $TRACE | grep $f | cut -d '|' -f 3 | sed 's/[^0-9]//g')
  flops=$(cat $TRACE | grep $f | cut -d '|' -f 4 | sed 's/[^0-9]//g')
  echo -n ",$exec_time,$nb_execs,$flops" >> $OUTPUT
done

At the very end, we print the trailing new line character to the output file, perform cleaning and terminate.

echo >> $OUTPUT
rm -rf .input
exit 0

5. Injecting results into a database

The Python script inject.py allows to call a custom parsing script producing a .csv file on output and gather the results exported in the latter, then inject the values into a gcvb NoSQL database (see Section 1.1) for an eventual post-processing.

At the beginning, we import necessary Python modules as well as the gcvb module.

import gcvb
import subprocess
import csv
import sys

The script expect the -h option or at least two arguments:

DATASET which stands for the .csv file to gather the data from,
PARSER which represents the path to the parsing script to use.

DATASET should be a .csv. file containing two lines, one heading providing the captions and then the associated values.

Next, may follow the arguments to call PARSER with.

The script continues with a help message function, that can be triggered with the -h option, and argument check.

def help():
    print("Launch PARSER (with ARGUMENTS if needed) instrumented to produce a ",
          "comma-separated values .csv output DATASET, deserialize it and ",
          "inject to the current gcvb database provided by the gcvb module.\n",
          "Usage: ", sys.argv[0], " DATASET PARSER [ARGUMENTS]\n\n",
          "Options:\n  -h    Show this help message.",
          file = sys.stderr, sep = "")

if len(sys.argv) < 2:
    print("Error: Arguments mismatch!\n", file = sys.stderr)
    help()

if sys.argv[1] == "-h":
    help()
    sys.exit(0)

In the main function, we parse the arguments

def main():
    args = sys.argv
    args.pop(0) # Drop the script name.
    dataset = args.pop(0)
    parser = args.pop(0)

and launch the parsing script.

    subprocess.run([parser] + args)

Once it's finished, we open the data set produced, gather data into a dictionary and inject key-value pairs one by one into the currently used gcvb database provided by the imported gcvb module.

    with open(dataset, mode = 'r') as data:
        reader = csv.DictReader(data)

        for row in reader:
            for item in row:
                gcvb.add_metric(item, row[item])

if __name__ == '__main__':
    main()

6. Wrapper scripts

Before launching a benchmark or a benchmark-related command there are some preparation actions to perform. Similarly, there are some completion and cleaning actions to perform once the task finishes. For the sake of convenience, especially in case of distributed parallel execution, we use a wrapper shell script to perform these actions. In case of benchmarks themselves, this script has the form of a gcvb template file (see Section 1.1) allowing us to make the script fit the individual parameters of each benchmark. In case of a benchmark-related command, we produce a directly usable shell script.

6.2. Benchmark task wrappers

Although, the major part of the script template is identical for all kinds of benchmarks we define (see Section 1.1), some parts may differ to meet the specific needs of a given benchmark or set of benchmarks, e.g. setting of the environment variables related to out-of-core computation activation. In this section, we define multiple variations of the benchmark task wrapper script template.

6.2.1. In-core benchmarks

The wrapper script template wrapper-ic.sh is meant for benchmarks run in-core, i.e. with the out-of-core computation disabled.

At first, we try to determine the rank of the current process.

if test "$OMPI_COMM_WORLD_RANK" != "";
then
  MPI_RANK=$OMPI_COMM_WORLD_RANK
elif test "$MPI_RANKID" != "";
then
  MPI_RANK=$MPI_RANKID
else
  echo "Failed to get the MPI of the current process! Can not proceed." >&2
  exit 1
fi

and if we are the 0^th process, we print out:

the MPI configuration (see the parallel placeholder in Section 1.1),

if test $MPI_RANK -eq 0;
then
   echo "<PERFTESTS> MPI process count = {parallel[np]}"
   echo "<PERFTESTS> MPI processes per node = {parallel[nodewise]}"
   echo "<PERFTESTS> MPI processes mapped by = {parallel[map]}"
   echo "<PERFTESTS> MPI processes ranked by = {parallel[rank]}"
   echo "<PERFTESTS> MPI processes bound to = {parallel[bind]}"
fi

the path to the Singularity container, if the execution is taking place within one of them,

if test "$SINGULARITY_CONTAINER" != "" && test $MPI_RANK -eq 0;
then
  echo "<PERFTESTS> Singularity image = $SINGULARITY_CONTAINER."
fi

the Slurm job identifier.

if test "$SLURM_JOBID" != "" && test $MPI_RANK -eq 0;
then
  echo "<PERFTESTS> Slurm job identifier = $SLURM_JOBID"
fi

To measure disk space usage (see Section 2) during execution of benchamrks we need to setup a dedicated temporary folder. Depending on the script template, the target storage media may vary. We thus use an environment variable TMPBASE which can be set for each script template separately.

export TMPBASE=/tmp

The {@job[id]} is gcvb placeholder exposed by default within template expansion and represents a unique benchmark identifier. Note that we remove the chosen temporary directory first to ensure there is no data left behind by any previous benchmark run. Setting MUMPS_OOC_TMPDIR is necessary for out-of-core in MUMPS to work properly in all cases. If the corresponding shell wrapper is present in the current working directory, we need to apply the workaround for Singularity containers from Section 6.1.1 too.

rm -rf $TMPBASE/vive-pain-au-chocolat/{@job[id]}/$MPI_RANK
mkdir -p $TMPBASE/vive-pain-au-chocolat/{@job[id]}/$MPI_RANK
export TMPDIR=$TMPBASE/vive-pain-au-chocolat/{@job[id]}/$MPI_RANK
export MUMPS_OOC_TMPDIR=$TMPDIR
if test -f "./wrapper-python-task.sh";
then
<<code:sanitize-python-in-singularity>>
fi

Then, we position the environment variables specifying the number of threads to use available through the parallel placeholder (see Section 1.1).

export OMP_NUM_THREADS={parallel[nt]} MKL_NUM_THREADS={parallel[nt]} \
       HMAT_NCPU={parallel[nt]} QRM_NCPU={parallel[nt]}

Also, we need to disable the out-of-core computation feature. For MUMPS, it is disabled by default. For SPIDO and HMAT we need to set a couple of environment variables to disable it explicitly. For qr_mumps, we need to set a StarPU environment variable, i.e. STARPU_LIMIT_CPU_MEM representing the maximum amount of RAM StarPU can use. We set it to an arbitrary large value (1 PiB). In case of HMAT, the HMAT_LIMIT_MEM environment variable is used to achieve the same effect.

export MPF_SPIDO_INCORE=1 HMAT_DISABLE_OOC=1 STARPU_LIMIT_CPU_MEM=1073741824 \
       HMAT_LIMIT_MEM=1073741824

At this point, we can launch the benchmark command passed as argument.

eval "$*"

After the execution, we have to clean all the files produced during the benchmark execution except for logs, results, FXT traces, configuration files, scheduling and launching scripts, i.e. files matching *.log, *.yaml, es_config.json, prof_file*, *.out, *batch, coupled, scalability, or *.sh.

if test $MPI_RANK -eq 0;
then
  rm -f $(find . -type f ! -name "*.log" ! -name "*.yaml" ! -name "prof_file*" \
               ! -name "*.out" ! -name "*batch" ! -name "coupled" \
               ! -name "scalability" ! -name "*.sh" ! -name "*energy_scope*" \
               ! -name "es_config.json" ! -name "*.csv")
fi

6.2.1.1. TGCC in-core benchmarks

The template for in-core benchmarks wrapper-ic-tgcc.sh sets TMPBASE such as to allow in-core benchmarks to run on the TGCC platform.

export TMPBASE=$CCCSCRATCHDIR
if test "$PMIX_RANK" != "";
then
  MPI_RANK=$PMIX_RANK
else
  echo "Failed to get the MPI of the current process! Can not proceed." >&2
  exit 1
fi
if test $MPI_RANK -eq 0;
then
   echo "<PERFTESTS> MPI process count = {parallel[np]}"
fi
rm -rf $TMPBASE/vive-pain-au-chocolat/{@job[id]}/$MPI_RANK
mkdir -p $TMPBASE/vive-pain-au-chocolat/{@job[id]}/$MPI_RANK
export TMPDIR=$TMPBASE/vive-pain-au-chocolat/{@job[id]}/$MPI_RANK
export OMP_NUM_THREADS={parallel[nt]} MKL_NUM_THREADS={parallel[nt]} \
       HMAT_NCPU={parallel[nt]} QRM_NCPU={parallel[nt]}
<<code:incore-incore>>
<<code:incore-completion>>

6.2.2. Out-of-core benchmarks

By default, the template for out-of-core benchmarks wrapper-ooc.sh sets TMPBASE so as to use the /tmp storage.

export TMPBASE=/tmp

Compared to the template for in-core benchmarks (see Section 6.2.1), here we make sure the environment variables disabling purely in-core computation for qr_mumps, SPIDO and HMAT are unset if the placeholder {sparse[ooc]} or the placeholder {dense[ooc]} is set to 1. For MUMPS, the activation of out-of-core is handled using a dedicated switch of test_FEMBEM. Also, by default, HMAT starts to dump data on disk only when the solver's consumption reaches 55% of available system memory. This leaves only 45% of available memory for all the other data and computations. Therefore, we decrease this threshold, i.e. to 1%. We set the same threshold for qr_mumps using the dedicated StarPU environment variable.

<<code:incore-preparation>>
if test {sparse[ooc]} -gt 0 || test {dense[ooc]} -gt 0;
then
  unset MPF_SPIDO_INCORE
  unset HMAT_DISABLE_OOC
  TOTAL_MEMORY_KiB=$(grep "MemTotal" /proc/meminfo | sed 's/[^0-9]//g')
  TOTAL_MEMORY_MiB=$(expr \( \( $TOTAL_MEMORY_KiB / 1024 \) / 100 \) \* {dense[ooc]})
  export HMAT_LIMIT_MEM=$TOTAL_MEMORY_MiB STARPU_LIMIT_CPU_MEM=$TOTAL_MEMORY_MiB
else
  export MPF_SPIDO_INCORE=1 HMAT_DISABLE_OOC=1
  unset HMAT_LIMIT_MEM
  unset STARPU_LIMIT_CPU_MEM
fi
<<code:incore-completion>>

6.2.2.1. Custom out-of-core storage

The template for out-of-core benchmarks wrapper-ooc-custom.sh allows for setting TMPBASE using a gcvb placeholder instead of setting it statically.

export TMPBASE={job[storage]}

<<code:ooc>>

6.2.2.2. TGCC out-of-core benchmarks

The template for out-of-core benchmarks wrapper-ooc-tgcc.sh sets TMPBASE such as to allow in-core benchmarks to run on the TGCC platform.

export TMPBASE=$CCCSCRATCHDIR
if test "$PMIX_RANK" != "";
then
  MPI_RANK=$PMIX_RANK
else
  echo "Failed to get the MPI of the current process! Can not proceed." >&2
  exit 1
fi
if test $MPI_RANK -eq 0;
then
   echo "<PERFTESTS> MPI process count = {parallel[np]}"
fi
rm -rf $TMPBASE/vive-pain-au-chocolat/{@job[id]}/$MPI_RANK
mkdir -p $TMPBASE/vive-pain-au-chocolat/{@job[id]}/$MPI_RANK
export TMPDIR=$TMPBASE/vive-pain-au-chocolat/{@job[id]}/$MPI_RANK
export MUMPS_OOC_TMPDIR=$TMPDIR
export OMP_NUM_THREADS={parallel[nt]} MKL_NUM_THREADS={parallel[nt]} \
       HMAT_NCPU={parallel[nt]} QRM_NCPU={parallel[nt]}
if test {dense[ooc]} -gt 0;
then
  unset MPF_SPIDO_INCORE
  unset HMAT_DISABLE_OOC
  TOTAL_MEMORY_KiB=$(grep "MemTotal" /proc/meminfo | sed 's/[^0-9]//g')
  TOTAL_MEMORY_MiB=$(expr \( \( $TOTAL_MEMORY_KiB / 1024 \) / 100 \) \* {dense[ooc]})
  export HMAT_LIMIT_MEM=$TOTAL_MEMORY_MiB STARPU_LIMIT_CPU_MEM=$TOTAL_MEMORY_MiB
else
  export MPF_SPIDO_INCORE=1 HMAT_DISABLE_OOC=1
  unset HMAT_LIMIT_MEM
  unset STARPU_LIMIT_CPU_MEM
fi
<<code:incore-completion>>

6.2.3. FXT execution tracing template

In the template wrapper-fxt.sh for benchmarks ensuring FXT execution tracing, we need to set specific StarPU environment variables related to this functionality and add the path to the StarVZ R library starvz-readme in PATH. Note that the out-of-core computation remains disabled.

export TMPBASE=/tmp

<<code:incore-preparation>>
<<code:incore-incore>>
export STARPU_FXT_PREFIX=$(pwd)/ STARPU_PROFILING=1 STARPU_WORKER_STATS=1
export PATH=$GUIX_ENVIRONMENT/site-library/starvz/tools/:$PATH
<<code:incore-completion>>

7. Job submission

To simplify the submission of job scripts produced by gcvb, we use the shell script submit.sh.

We begin by defining a help message function that can be triggered if the script is run with the -h option.

function help() {
  echo "Submit job scripts from benchmark SESSION." >&2
  echo "Usage: $(basename $0) [options]" >&2
  echo >&2
  echo "Options:" >&2
  echo "  -c            Use 'ccc_msub' to submit jobs instead of 'sbatch'." >&2
  echo "  -h            Print this help message." >&2
  echo "  -E REGEX      Submit all job scripts except those matching REGEX" \
       "(mutually exclusive with '-F')." >&2
  echo "  -F REGEX      Submit only job scripts matching REGEX." >&2
  echo -n "  -l            Only list the job scripts to submit instead of " >&2
  echo "actually submitting them." >&2
  echo "  -s SESSION    Session to submit the benchmarks from." >&2
}

We use a generic error message function. The error message to print is expected to be the first argument to the function. If not present, the function displays a generic error message.

function error() {
  if test $# -lt 1;
  then
    echo "An unknown error occurred!" >&2
  else
    echo "Error: $1" >&2
  fi
}

Follows the parsing of options.

SUBMIT_CMD="sbatch"
REGEX=""
EXCLUDE=0
INCLUDE=0
SESSION=""
LIST_ONLY=0

while getopts ":chE:F:ls:" option;
do
  case $option in

The -c option allows for using the TGCC's specific job submitting command ccc_msub instead of the usual Slurm command sbatch.

    c)
      SUBMIT_CMD="ccc_msub -m scratch -x"
      ;;

Using the -E option, one can decide to exclude from submitting the job scripts the names of which match a given regular expression.

    E)
      REGEX=$OPTARG
      EXCLUDE=1

      if test "$REGEX" == "";
      then
        error "Bad usage of the '-E' option (empty regular expression)!"
        exit 1
      fi
      ;;

Using the -F option, one can decide to submit only the job scripts the names of which match a given regular expression.

    F)
      REGEX=$OPTARG
      INCLUDE=1

      if test "$REGEX" == "";
      then
        error "Bad usage of the '-F' option (empty regular expression)!"
        exit 1
      fi
      ;;

The -l option enables to list job scripts selected for submitting without actually submitting them. It may be particularly useful, for example, in combination with the -F option to verify whether the provided regular expression matches the intended job scripts.

    l)
      LIST_ONLY=1
      ;;

The -s option is used to specify the benchmark session in the results directory (see Section 1.1) to submit the job scripts from.

    s)
      SESSION=$OPTARG

      if test ! -d $SESSION;
      then
        error "'$SESSION' is not a valid directory!"
        exit 1
      fi
      ;;

Eventually, we have to take care of unknown options or missing arguments, if any, raise an error and terminate the script in that case.

    \?) # Unknown option
      error "Arguments mismatch! Invalid option '-$OPTARG'."
      echo
      help
      exit 1
      ;;
    :) # Missing option argument
      error "Arguments mismatch! Option '-$OPTARG' expects an argument!"
      echo
      help
      exit 1
      ;;
    h | *)
      help
      exit 0
      ;;
  esac
done

Options -E and -F are mutually exclusive.

if test $EXCLUDE -ne 0 && test $INCLUDE -ne 0;
then
  error "Options '-E' and '-F' must not be used together!"
  exit 1
fi

After argument parse and check, we naviagte to the root of the gcvb filesystem of the given benchmark session folder.

cd $SESSION/../../

We also check whether we are in a valid gcvb filesystem contaning a config.yaml and a results folder (see Section 1.1).

if test ! -f config.yaml || test ! -d results;
then
  error "'$SESSION' is not a correct gcvb session directory!"
  exit 1
fi

Before submitting, we look for all the job scripts. If there is none, we do nothing.

cd $SESSION

RUNS=$(ls *.sh)

if test "$RUNS" == "";
then
  error "No job script found! Nothing to do."
  exit 0
fi

Then, if either the -E or the -F option was specified, we filter the job scripts accordingly. We also need to return back to the root of the gcvb filesystem.

RUNS_COUNT=0

if test $INCLUDE -ne 0;
then
  RUNS=$(ls *.sh | grep "$REGEX")
  RUNS_COUNT=$(ls *.sh | grep "$REGEX" | wc -l)
elif test $EXCLUDE -ne 0;
then
  RUNS=$(ls *.sh | grep -v "$REGEX")
  RUNS_COUNT=$(ls *.sh | grep -v "$REGEX" | wc -l)
fi

cd ../../

Finally, we call the sbatch Slurm submission command for each job script. If the -l option was specified, the job scripts are not submitted, only their list is printed out on the screen.

i=1

for job_script in $RUNS;
do
  PREFIX="[ $i / $RUNS_COUNT ]"

  if test $LIST_ONLY -ne 0;
  then
    echo "$PREFIX Listing job script '$job_script'."
  else
    $SUBMIT_CMD $SESSION/$job_script
    
    if test $? -ne 0;
    then
      error "$PREFIX Failed to submit batch job '$job_script'!"
      exit 1
    else
      echo "$PREFIX Submitted batch job '$job_script'."
    fi
  fi

  i=$(expr $i + 1)
done

exit 0

8. Post-processing benchmark results

To visualize benchmark results we use R and rely mainly on the plot drawing library ggplot2 ggplot2. The R script file plot.R we describe in this section is meant to be used from within plot drawing R source code blocks in concerned Org documents, e.g. reports, articles and so on.

8.1. Prerequisites

At the beginning we need to import several libraries for:

manipulating data sources, i.e. R data frames, SQLite databases and JSON files,
```
library(plyr)
library(dplyr)
library(readr)
library(DBI)
library(rjson)
```
transforming data frames,
```
library(tidyr)
```
decompressing gzip files,
```
library(R.utils)
```

defining and exporting plots.

library(ggplot2)
library(scales)
library(grid)
library(gridExtra)
require(cowplot)
library(stringr)
library(ggrepel)

We also need to prevent using the cairo library to produce figures. This does not work on Linux.

set_null_device(cairo_pdf)

One can specify the name of the font family to use for typefacing text in plots in a variable named plot_font before sourcing this script. If this variable does not exist the default value will be used, i.e. CMU Serif corresponding to the 'Computer Modern Serif' font face.

plot_font <- get0("plot_font", ifnotfound = "CMU Serif")

8.2. Importing data

8.2.1. Benchmark results

The benchmark tool we rely on, gcvb, stores benchmark results into a local SQLite database (see Section 1.1). We define a global variable having the name gcvb_db_path to store the path to the gcvb database we are currently using.

gcvb_db_path <- "results.db"

To extract the results from the database, we define the function gcvb_import which takes up to three arguments:

session indicating the gcvb benchmark session to extract the results from (see Section 1.1),
like representing a list or a vector of benchmark name patterns,
path containing the path to the gcvb SQLite database file and by default is set to gcvb_db_path (see above).

Using the like argument one can extract only the results of benchmarks the identifiers of which match given pattern or patterns (see Section 1.1). The latter may contain wildcard characters supported by the SQL's operator LIKE sqliteLike.

The function begins by opening a connection to gcvb SQLite database and setting the LIKE keyword to be case-sensitive.

gcvb_import <- function(session, like = c(), path = gcvb_db_path) {
  conn <- dbConnect(RSQLite::SQLite(), path)
  dbExecute(conn, "PRAGMA case_sensitive_like = true;")

A gcvb database contains 6 tables (see Figure 1).

Figure 1: EER diagram of a gcvb database.

At first, we need to select all the benchmarks belonging to session from the test table and optionally restrict the selection to benchmarks matching the patterns of like.

  request <- "" 
  for(p in 1:length(like)) {
    if(p > 1) {
      request <- paste(request, "OR ")
    }
    request <- paste0(request, "name LIKE '", like[p], "'")
  }
  if(length(like) > 0) {
    request <- paste(
      "SELECT id, name, start_date FROM test WHERE (",
      request,
      ") AND run_id IN",
      "(SELECT id FROM run WHERE gcvb_id =", session, ")"
    )
  } else {
    request <- paste(
      "SELECT id, name, start_date FROM test WHERE run_id IN",
      "(SELECT id FROM run WHERE gcvb_id =", session, ")"
    )
  }

  benchmarks <- dbGetQuery(conn, request)

Then, if a given benchmark was executed more then once, we want to keep only the data associated with the latest run. To do this, we sort the data by the execution start timestamp, i.e. the start_date column and remove duplicates based on benchmark identifiers in the name column. Note that start_date is imported as string, so we need to convert it to a datetime format at first. At the end, we drop the start_date column being useless for further treatment.

  benchmarks$start_date <- as.POSIXct(benchmarks$start_date)
  benchmarks <- benchmarks[order(benchmarks$start_date, decreasing = TRUE), ]
  benchmarks <- benchmarks[!duplicated(benchmarks$name), ]
  benchmarks <- subset(benchmarks, select = -c(start_date))

Benchmark results are stored in the separate valid table. Each row references a benchmark in the test table and represents one metric and its value (see an example in Table 1). This representation is referred to as long format.

Table 1: Example of the `valid` table.
id	metric	value	test_id	task_step
39870	nbrhs	50	1260	NULL
39871	step_mesh	0.02241996	1260	NULL
…	…	…	…	…

In addition to the valid table, the file table holds files potentially associated with a given benchmark, e.g. resource consumption logs (see an example in Table 2. Note that the files themselves are stored in a gzip-compressed binary form.

Table 2: Example of the `files` table.
id	filename	file	test_id
9	rss-0.log	BLOB	3666
10	hdd-0.log	BLOB	3666
11	…/energy_scope_eprofile_1473279.txt	BLOB	3666
…	…	…	…

So, for each of the selected benchmarks, we extract the associated rows from the valid and the files tables based on the test_id key. Note that the value column is of mixed type, i.e. it contains both strings and numerical values. To prevent the R dbGetQuery function from doing inappropriate type conversions, we cast the value column to text in the corresponding SQL request. We do the same for the file column in the files table. Note also that for the union operation below to work, we have to rename the selected columns of the files table to match those in the valid table.

  for(row in 1:nrow(benchmarks)) {
    test_id <- benchmarks[row, "id"]
    metrics <- dbGetQuery(
      conn,
      paste(
        "SELECT metric, CAST(value AS TEXT) AS 'value'",
        "FROM valid WHERE test_id = :id",
        "UNION",
        "SELECT filename AS 'metric', QUOTE(file) AS 'value'",
        "FROM files WHERE test_id = :id"
      ),
      params = list(id = test_id)
    )

At the same time, we convert the original long format representation (see Table 1) into wide format for easier data manipulation. In the latter, to each metric corresponds a separate column (see an example in Table 3). This way, each row contains all the results of a given benchmark.

Table 3: Example of a table containing benchmark results in wide format.
id	name	nbrhs	step_mesh	…
1260	spido-100000	50	0.02241996	…

    if(nrow(metrics) > 0) {
      for(metric in 1:nrow(metrics)) {
        metric_name <- metrics[metric, "metric"]
        if(!(metric_name %in% colnames(benchmarks))) {
          benchmarks[[metric_name]] <- NA
        }

        benchmarks[which(benchmarks$id == test_id), metric_name] <-
          metrics[metric, "value"]
      }
    }
  }

Finally, we convert columns containing numerical values to the appropriate data type, close the database connection and return the resulting data frame. Note that the names of some columns, e.g. rss-0.log, hdd-1.log or tmp_energy_scope_1473279/energy_scope_eprofile_1473279.txt may vary from one benchmark to another so we have to determine their names dynamically using a regular expression.

  cols_to_numeric <- colnames(benchmarks)
  dynamic_columns <- cols_to_numeric[
    grepl(
      paste(
        "^output\\.log$",
        "^rss.*\\.log$",
        "^hdd.*\\.log$",
        "^likwid.*\\.csv$",
        ".*energy_scope_eprofile.*\\.txt$",
        sep = "|"
      ),
      cols_to_numeric
    )
  ]
  
  cols_to_numeric <- cols_to_numeric[! cols_to_numeric %in%
                                     c(
                                       c("name", "mapping", "ranking", "scheme",
                                         "binding", "coupled_method", "solver",
                                         "variation", "node", "symmetry",
                                         "config", "singularity_container",
                                         "system_kind", "energy_scope",
                                         "config_kind", "solver_config",
                                         "rm_values", "swap_values",
                                         "execution", "hdd_values",
                                         "output.log"),
                                       dynamic_columns
                                     )]
  benchmarks <- benchmarks %>% mutate_at(cols_to_numeric, as.numeric)
  dbDisconnect(conn)
  return(benchmarks)
}

8.2.2. Trace files

We use multiple benchmarking tools to monitor the usage of storage resources (memory and hard drive), the power and energy consumption as well as the floating-point operation rate and memory bandwidth of our solver suite.

To monitor the memory and hard drive usage, we use a custom Python script named rss.py (see Section 2). A measure is taken every second during the execution. There is one consumption log file per MPI process. The names of the log files matche rss-N.log for memory usage and hdd-N.log for hard drive usage where N stands for the rank of the corresponding MPI process. In a log file, each line contains the current timestamp and usage in mibibytes (MiB). The very last line corresponds to the peak value.

To measure the power and the energy consumption of processing units and RAM, we use the energy_scope tool ESwebsite,ESjcad2021. In this case, the acquisition interval can be defined by the user. As a result, energy_scope produces an archive containing all the measures in a raw form. It can then produce a JSON trace file based on this archive. The output file is stored under ./tmp_energy_scope_X/energy_scope_eprofile_X.txt where X represents the identifier of the corresponding Slurm job (see Section 1.1).

Finally, to measure the floating-point operation rate and the memory bandwidth, we rely on the likwid tool likwid,treibig2010likwid. likwid benchmarks each MPI process of the application separately and stores all the measures into a comma-separated value file the name of which mathes likwid-N.csv where N stands for the rank of the corresponding MPI process.

At the end of a benchmark execution, the above log and trace files are compressed and stored into the SQLite database holding all the benchmark results (see Section 8.2.1). Therefore, in the first place, we need to define a function allowing us to extract the files from the database and decompress them.

8.2.3. Retrieving files

The function gcvb_retrieve_files takes a data frame containing the results of one single benchmark, through the benchmark argument, retrieved by the gcvb_import function (see Section 8.2.1) and formatted using the gcvb_format function (see Section 8.4.1), extracts the corresponding trace files and decompresses them into the current working directory. Note that the energy_scope trace file is renamed to eprofile.txt to ensure a static file name and simplify its later processing.

In the first place, we need to dynamically determine the names of the columns containing the log and trace files based on the patterns discussed above.

gcvb_retrieve_files <- function(benchmark) {
  filename_pattern <- paste(
    "^output\\.log$",
    "^rss.*\\.log$",
    "^hdd.*\\.log$",
    "^likwid.*\\.csv$",
    ".*energy_scope_eprofile.*\\.txt$",
    sep = "|"
  )
  trace_file_columns <- colnames(benchmark)
  trace_file_columns <- trace_file_columns[
    grepl(filename_pattern, trace_file_columns)
  ]

Then, we remove any previously retrieved files matching the same file name patterns to prevent data mismatch in plot functions (see Section 8.6)

  filename_pattern <- paste(
    "^output\\.log$",
    "^rss.*\\.log$",
    "^hdd.*\\.log$",
    "^likwid.*\\.csv$",
    "^eprofile\\.txt$",
    sep = "|"
  )
  for(file in list.files(pattern = filename_pattern)) {
    unlink(file)
  }

and iterate over the corresponding columns in benchmark and:

  for(trace_file_column in trace_file_columns) {

retrieve the compressed version of the file as string,

    trace_file_name <- benchmark[[trace_file_column]]

if the file is not empty, strip the X' prefix and the ' suffix from this string,

    if(!is.na(trace_file_name)) {
      trace_file_name <- substring(
        trace_file_name, 3, nchar(trace_file_name) - 1
      )

split the string into a vector of two-character strings representing the hexadecimal binary content of the file,

      trace_file_name <- strsplit(trace_file_name, "")[[1]]
      trace_file_name <- paste0(
        trace_file_name[c(TRUE, FALSE)],
        trace_file_name[c(FALSE, TRUE)]
      )

change the output file name to eprofile.txt, if the file corresponds to an energy_scope trace file,

      if(grepl("energy_scope", trace_file_column, fixed = TRUE)) {
        filename <- "eprofile.txt.gz"
      } else {
        filename <- paste0(basename(trace_file_column), ".gz")
      }

convert the vector of two-character strings into a raw binary form and restore the corresponding gzip-compressed file,
```
      writeBin(as.raw(as.hexmode(trace_file_name)), filename)
```
decompress the file into the current working directory and remove its compressed counterpart.
```
      gunzip(filename, overwrite = TRUE)
    }
  }
}
```

8.2.3.1. Reading `rss.py` trace files

The function read_rss constructs an R data frame based on memory usage and optionally on hard drive usage trace files (if the include_hdd switch is set to TRUE) located in logs_path. If a named vector is provided through the optional aesthetics argument, the function adds extra constant columns into the data frame using the names from the named vector and initialized with the values of the named vector. Adding such columns may be useful later for using aesthetics function (see Section 8.5) when drawing plots. If the output of the function is intended to be combined with an energy_scope trace (see Section 8.2.3.3), the optional override_beginning argument can be used to synchronize the output data frame according to the energy_scope trace by considering the datetime given by this argument as the alternative beginning of the acquisition. Note that the energy_scope trace always starts first as energy_scope is always launched before the other benchmarking tools.

The output data frame shall contain four columns, the timestamp column giving the time of the measure in milliseconds since the beginning of the acquisition, the corresponding consumption value, the kind of the measure (sram for memory usage or shdd for hard drive usage) and the process rank column.

read_rss <- function(logs_path, include_hdd = FALSE, aesthetics = NA,
                     override_beginning = NA) {
  columns <- c("timestamp", "consumption", "kind", "process")
  output <- data.frame(matrix(nrow = 0, ncol = length(columns)))
  colnames(output) <- columns

We begin by listing all the memory and, optionally, hard drive usage trace files in logs_path.

  pattern <- ifelse(include_hdd, "rss|hdd.*\\.log", "rss.*\\.log")
  logs <- list.files(
    logs_path, pattern = pattern, full.names = TRUE, include.dirs = FALSE
  )

Then, we read each log file, strip its last line containing the peak value, which we do not use here, and append the data into the output data frame data.

Note that in a previous version of rss.py, the trace file format was different. There was only one column in the trace file corresponding to the amount of memory or hard drive used. For backward compatibility with old trace files, we continue supporting the former trace file format too.

  for(log in logs) {
    count <- length(readLines(log, warn = FALSE)) - 1
    data <- read.table(log, nrow = count)

    if(ncol(data) < 2) { # Legacy format
      colnames(data) <- c("consumption")
      data$timestamp <- 0:(count - 1)
    } else {
      if(ncol(data) > 2) { # Newest format including swap monitoring
         colnames(data) <- c("timestamp", "residual", "swap")
         data$consumption <- data$residual + data$swap
         data <- data[, !(names(data) %in% c("residual", "swap"))]
      } else { # Older format without swap monitoring
        colnames(data) <- c("timestamp", "consumption")
      }

      data$timestamp <- as.POSIXct(data$timestamp, format = "%Y/%m/%dT%H:%M:%S")
      start <- min(data$timestamp)

If the override_beginning argument was provided, we need to synchronize the timestamp column in such a ways as to consider override_beginning to be the beginning of the trace.

      if(!is.na(override_beginning)) {
        beginning <- as.integer(min(data$timestamp) - override_beginning)
        data$timestamp <- data$timestamp + beginning
      }

Then, we convert the timestamp to milliseconds and the measured value to gibibytes (GiB).

      data$timestamp <- (as.integer(data$timestamp) - as.integer(start)) * 1000
    }
    
    data$consumption <- as.numeric(data$consumption / 1024.)

Based on the name of the currently processed trace file, we determine the kind of the measure as well as the MPI process rank. We merge the data frame corresponding to the currently processed trace file with the global output data frame and continue with the other trace files, if any.

    base <- basename(log)
    kind <- ifelse(!include_hdd || startsWith(base, "rss"), "sram", "shdd")
    data$kind <- rep(kind, nrow(data))

    process <- gsub("([a-z]+)-(\\d+)\\.log", "\\2", base) 
    data$process <- rep(as.integer(process), nrow(data))

    output <- rbind(output, data)
  }

In case of multiple log files per benchmark, i.e. when a benchmark is run in a distributed fashion, we need to sum up the consumption of all the processes.

  processes <- max(output$process)
  if(processes > 0) {
    output <- output %>% group_by(timestamp, kind) %>%
      summarise(
        consumption = sum(consumption)
      )
  }

Finally, we process the aesthetics parameter, if provided, and return the output data frame.

  if(length(aesthetics) > 0) {
    for(aesthetic in aesthetics) {
      name <- names(aesthetics)[aesthetics == aesthetic]
      output[[name]] <- rep(as.character(aesthetic), nrow(output))
    }
  }

  return(output)
}

8.2.3.2. Reading `likwid` trace files

The function read_likwid constructs a data frame based on likwid trace files located in logs_path. If a named vector is provided through the optional aesthetics argument, the function adds extra constant columns into the data frame using the names from the named vector and initialized with the values of the named vector (see Section 8.2.3.1). If the output of the function is intended to be combined with an energy_scope trace (see Section 8.2.3.3), the optional shift argument can be used to synchronize the output data frame according to the energy_scope trace by shifting the output timestamps by the value (in millisecons) of this argument. metrics indicates the number of metrics to look for in trace file. The default value is 2, i.e. flop rate and memory bandwidth (see below). As the measuring tool provides only timestamps relative to the beginning of the execution, we are unable to provide a more precise synchronization procedure. Therefore, we fallback to considering the beginning of the rss.py trace (see Section 8.2.3.1) corresponding to the same test case as t₀ for the likwid trace file.

The output data frame shall contain four columns, the timestamp column giving the time of the measure in milliseconds since the beginning of the acquisition, the corresponding consumption value, the kind of the measure (flops for floating-point operation rate and bw for memory bandwidth) and the process rank column.

read_likwid <- function(logs_path, aesthetics = NA, shift = 0, metrics = 2) {
  columns <- c("timestamp", "consumption", "kind", "process")
  output <- data.frame(matrix(nrow = 0, ncol = length(columns)))
  colnames(output) <- columns

We begin by listing all the trace files in logs_path.

  pattern <- "likwid.*\\.csv"
  logs <- list.files(
    logs_path, pattern = pattern, full.names = TRUE, include.dirs = FALSE
  )

A likwid trace file has two or more heading lines at its beginning. The count of heading lines depends on the count of monitored hardware performance counters or groups of the latter. In our case, we use two groups of performance counters, one for floating-point operation rate and one for memory bandwidth. We thus have three heading lines in our likwid-N.csv trace files. Then, the first four columns (# GID, MetricsCount, CpuCount and Total runtime [s]) of the trace file are common for all the performance counters. Each group of performance counters may have a different number of associated metrics and thus a different number of associated columns in the trace file. The MetricsCount column gives the number of metrics and allows to compute the number of columns in the trace file associated to a given performance group. In our case, there are 6 metrics for the floating-point operation rate group and 10 for the memory bandwidth group. Moreover, there is a separate column for each processing unit and for each metric. Note that the number of processing units used during the execution is given by the CpuCount column. In summary, the trace file contains 4 common columns followed by 6 × CpuCount columns contaning the values for the metrics of the floating-point operation rate group followed by 10 × CpuCount columns containing the values for the metrics of the memory bandwidth group.

We read each log file and:

  for(log in logs) {
    input <- read.csv(log, header = FALSE, skip = metrics + 1)

determine the count of processing units used for the corresponding test case,
```
    cpus <- as.integer(input[1, 3])
```

create an intermediate data frame with the timestamps, the floating-point operation rate, corresponding to the metric Packed DP [MFLOP/s], and the memory bandwidth, corresponding to the metric Memory bandwidth [MBytes/s],

    data_flops <- data.frame(
      timestamp = input[which(input[, 2] == 6), c(4)],
      consumption = rowSums(
        input[which(input[, 2] == 6), c((4 + 5 * cpus + 1):(4 + 6 * cpus - 1))]
      )
    )
    if(metrics > 1) {
      data_bw <- data.frame(
        timestamp = input[which(input[, 2] == 10), c(4)],
        consumption = rowSums(
          input[which(input[, 2] == 10), c((4 + 8 * cpus + 1):(4 + 9 * cpus - 1))]
        )
      )
    }

initialize the kind columns in these separate data frames,

    data_flops$kind <- rep("flops", nrow(data_flops))
    if(metrics > 1) {
      data_bw$kind <- rep("bw", nrow(data_bw))
    }

merge the separate data frames again into a unique data frame,

    if(metrics > 1) {
      data <- rbind(data_flops, data_bw)
    } else {
      data <- data_flops
    }

convert the timestamp to milliseconds and synchronize it if the shift argument was provided,

    data$timestamp <- as.integer(round(data$timestamp, 0) * 1000)
    
    if(shift > 0) {
      data$timestamp <- data$timestamp + shift
    }

determine the MPI process rank corresponding to the current trace file based on the name of the latter and initialize the rank column,

    process <- gsub("([a-z]+)-(\\d+)\\.csv", "\\2", basename(log))
    data$process <- rep(as.integer(process), nrow(data))

merge the data frame corresponding to the currently processed trace file with the global output data frame and continue with the other trace files, if any.
```
    output <- rbind(output, data)
  }
```

In case of multiple log files per benchmark, i.e. when a benchmark is run in distributed fashion, we need to sum up the consumption of all the processes.

  processes <- max(output$process)
  if(processes > 0) {
    output <- output %>% group_by(timestamp, kind) %>%
      summarise(
        consumption = sum(consumption)
      )
  }

Finally, we process the aesthetics parameter, if provided, and return the output data frame.

  if(length(aesthetics) > 0) {
    for(aesthetic in aesthetics) {
      name <- names(aesthetics)[aesthetics == aesthetic]
      output[[name]] <- rep(as.character(aesthetic), nrow(output))
    }
  }

  return(output)
}

8.2.3.3. Reading `energy_scope` trace files

The function read_es constructs an R data frame based on a given energy_scope JSON data file es_data. If a named vector is provided through the optional aesthetics argument, the function adds extra constant columns into the data frame using the names from the named vector and initialized with the values of the named vector. Adding such columns may be useful later for using aesthetics function (see Section 8.5) when drawing plots. The data frame contains also tags. Benchmark executable may print special energy_scope tags to delimit the beginning and the end of a particular portion of computation in the energy_scope trace. If the only optional argument is provided, only the tags specified by the latter are treated. The interval argument allows one to set the interval (in milliseconds) the measures were taken in. By default, the function considers the interval of 1000 milliseconds.

The resulting data frame is composed of the following columns:

tag representing the tag label,
type indicating whether a line corresponds to a start or a stop tag,
type_label holding prettified version of type for plot drawing,
timestamp representing the time of the measure in milliseconds since the beginning of the acquisition,
consumption representing current power consumption of CPU or RAM (in Watts [W]),
kind indicating whether consumption represents power consumption of CPU or RAM,
aesthetics representing an optional named vector describing constant columns to be added to the output data frame in order to simplify plot drawing based on the latter.

Note that the format of the output data frame is compatible, in terms of columns it conains, to the format of data frames produced by the functions read_rss (see Section 8.2.3.1) and read_likwid (see Section 8.2.3.2) for reading trace files produced by the other benchmarking tools we use. This is to allow for combining different kinds of benchmark data. Indeed, one can provide this function with a data frame obtained with read_rss through the rss_data argument and extend the output data frame with the measures done by the rss.py monitoring script (see Section 2). Similarly, we can extend the output data frame with the measures done by likwid and extracted using the read_likwid function through the likwid_data argument. likwid_metrics indicates the number of metrics to look for in trace file. The default value is 2, i.e. flop rate and memory bandwidth (see Section 8.2.3.2). The timestamp column present also in rss_data as well as in likwid_data shall allow us to synchronize the measures.

The first step is to read the data from the JSON data file at es_data,

read_es <- function(es_data, aesthetics = c(), only = NA, interval = 1000,
                    rss_data = NA, include_hdd = FALSE, likwid_data = NA,
                    likwid_metrics = 2) {
  es_raw <- fromJSON(file = es_data)

extract the timestamp corresponding to the beginning of acquisition

  es_start <- as.POSIXct(
    es_raw$data$data$tags$es_appli_total$start, format = "%Y/%m/%d %H:%M:%S"
  )

and the relative start and stop dates of acquisition.

  es_start_sec <- as.numeric(
    es_raw$data$data$tags$es_appli_total$`start(sec)`
  )
  es_stop_sec <- as.numeric(
    es_raw$data$data$tags$es_appli_total$`stop(sec)`
  )

Then, we collect CPU and RAM power consumption data into the respective data frames es_cpu and es_ram. The timestamp column is generated based on es_start. Note that the acquisition interval is one second.

  es_cpu <- data.frame(
    consumption = as.numeric(
      es_raw$data$data$`ecpu(W)`[(es_start_sec + 1):(es_stop_sec + 1)]
    ),
    timestamp = as.integer(0)
  )
  
  for(i in 2:nrow(es_cpu)) {
    es_cpu$timestamp[i] <- es_cpu$timestamp[i - 1] + interval
  }
  
  es_cpu$kind <- rep("ecpu", nrow(es_cpu))

  es_ram <- data.frame(
    power_total = as.numeric(
      es_raw$data$data$`etotal(W)`[(es_start_sec + 1):(es_stop_sec + 1)]
    ),
    power_cpu = as.numeric(
      es_raw$data$data$`ecpu(W)`[(es_start_sec + 1):(es_stop_sec + 1)]
    ),
    timestamp = as.integer(0)
  )

  es_ram$consumption <- es_ram$power_total - es_ram$power_cpu
  es_ram <- subset(es_ram, select = c("consumption", "timestamp"))

  for(i in 2:nrow(es_ram)) {
    es_ram$timestamp[i] <- es_ram$timestamp[i - 1] + interval
  }

  es_ram$kind <- rep("eram", nrow(es_ram))

We also make a local copy of rss_data if provided.

  rss_ram <- NA
  rss_hdd <- NA
  likwid <- NA

  if(!is.na(rss_data)) {
    rss <- read_rss(rss_data, include_hdd, aesthetics, es_start)
    rss_ram <- subset(rss, kind == "sram")

    if(include_hdd) {
      rss_hdd <- subset(rss, kind == "shdd")
    }

    actual_start <- min(rss_ram$timestamp)
    actual_end <- max(rss_ram$timestamp)
    
    es_cpu <- subset(
      es_cpu, timestamp >= actual_start & timestamp <= actual_end
    )
    es_ram <- subset(
      es_ram, timestamp >= actual_start & timestamp <= actual_end
    )
  }

  if(!is.na(likwid_data)) {
    shift <- 0
    
    if(!is.na(rss_data)) {
      shift <- min(rss_ram$timestamp)
    }
    
    likwid <- read_likwid(likwid_data, aesthetics, shift, likwid_metrics)
  }

Now, we initialize additional columns to store tag information.

  es_cpu$tag <- NA
  es_cpu$type <- NA
  es_cpu$type_label <- NA

  es_ram$tag <- NA
  es_ram$type <- NA
  es_ram$type_label <- NA

  if(!is.na(rss_data)) {
    rss_ram$tag <- NA
    rss_ram$type <- NA
    rss_ram$type_label <- NA

    if(include_hdd) {
      rss_hdd$tag <- NA
      rss_hdd$type <- NA
      rss_hdd$type_label <- NA
    }
  }

  if(!is.na(likwid_data)) {
    likwid$tag <- NA
    likwid$type <- NA
    likwid$type_label <- NA
  }

For each tag considered, we add four new lines into the es data frame, i.e. one line with start time and CPU power consumption, one for stop time and CPU power consumption, one for start time and RAM power consumption and one for stop time and RAM power consumption.

If during benchmark program execution a loop generates more than one occurrence of a given energy_scope tag, a unique identifier is appended to the latter, e.g. tag_0, tag_1, etc. Otherwise, energy_scope would discard all the data related to the tag. However, the presence of such identifiers complicates the post-treatment. Therefore, we need to strip them off the tag name tag but keep them for later usage in instance.

  for(entire_tag in names(es_raw$data$data$tags)) {
    tag <- sub("_[^_]+$", "", entire_tag)
    instance <- tail(strsplit(entire_tag, "_")[[1]], n = 1)
    instance <- suppressWarnings(as.integer(instance))
    instance <- ifelse(is.na(instance), "i", instance)

Here we use the imported JSON data file es_raw to get the power consumption values as well as the start and stop timestamps of tags.

    if(is.na(only) || tag %in% only) {
      es_start <- as.integer(es_start)
      start_time <- as.integer(
        as.POSIXct(
          es_raw$data$data$tags[[entire_tag]]$start,
          format = "%Y/%m/%d %H:%M:%S"
        )
      )
      stop_time <- as.integer(
        as.POSIXct(
          es_raw$data$data$tags[[entire_tag]]$stop,
          format = "%Y/%m/%d %H:%M:%S"
        )
      )

      begin <- (start_time - es_start) * 1000
      end <- (stop_time - es_start) * 1000

      row <- data.frame(
        tag = as.character(tag),
        type = "start",
        type_label = paste0(
          "bold(alpha)~", label_tag_short(c(tag), instance)
        ),
        timestamp = begin,
        consumption = es_cpu[
          es_cpu$timestamp == begin & is.na(as.character(es_cpu$tag)),
          "consumption"
        ],
        kind = "ecpu"
      )

      es_cpu <- rbind(es_cpu, row)

      row <- data.frame(
        tag = as.character(tag),
        type = "stop",
        type_label = paste0(
          "bold(omega)~", label_tag_short(c(tag), instance)
        ),
        timestamp = end,
        consumption = es_cpu[
          es_cpu$timestamp == end & is.na(as.character(es_cpu$tag)),
          "consumption"
        ],
        kind = "ecpu"
      )

      es_cpu <- rbind(es_cpu, row)

      row <- data.frame(
        tag = as.character(tag),
        type = "start",
        type_label = paste0(
          "bold(alpha)~", label_tag_short(c(tag), instance)
        ),
        timestamp = begin,
        consumption = es_ram[
          es_ram$timestamp == begin & is.na(as.character(es_ram$tag)),
          "consumption"
        ],
        kind = "eram"
      )

      es_ram <- rbind(es_ram, row)
  
      row <- data.frame(
        tag = as.character(tag),
        type = "stop",
        type_label = paste0(
          "bold(omega)~", label_tag_short(c(tag), instance)
        ),
        timestamp = end,
        consumption = es_ram[
          es_ram$timestamp == end & is.na(as.character(es_ram$tag)),
          "consumption"
        ],
        kind = "eram"
      )

      es_ram <- rbind(es_ram, row)
  
      if(!is.na(rss_data)) {
        consumption <- rss_ram[
          rss_ram$timestamp == begin & is.na(as.character(rss_ram$tag)),
          "consumption"
        ]
        consumption <- ifelse(length(consumption) > 0, consumption, NA)

        if(!is.na(consumption)) {
          row <- data.frame(
            tag = as.character(tag),
            type = "start",
            type_label = paste0(
              "bold(alpha)~", label_tag_short(c(tag), instance)
            ),
            timestamp = begin,
            consumption = as.numeric(consumption),
            kind = "sram"
          )
          
          rss_ram <- rbind.fill(rss_ram, row)
        }

        consumption <- rss_ram[
          rss_ram$timestamp == end & is.na(as.character(rss_ram$tag)),
          "consumption"
        ]
        consumption <- ifelse(length(consumption) > 0, consumption, NA)

        if(!is.na(consumption)) {
          row <- data.frame(
            tag = as.character(tag),
            type = "stop",
            type_label = paste0(
              "bold(omega)~", label_tag_short(c(tag), instance)
            ),
            timestamp = end,
            consumption = as.numeric(consumption),
            kind = "sram"
          )
          
          rss_ram <- rbind.fill(rss_ram, row)
        }

        if(include_hdd) {
          consumption <- rss_hdd[
            rss_hdd$timestamp == begin & is.na(as.character(rss_hdd$tag)),
            "consumption"
          ]
          consumption <- ifelse(length(consumption) > 0, consumption, NA)

          if(!is.na(consumption)) {
            row <- data.frame(
              tag = as.character(tag),
              type = "start",
              type_label = paste0(
                "bold(alpha)~", label_tag_short(c(tag), instance)
              ),
              timestamp = begin,
              consumption = as.numeric(consumption),
              kind = "shdd"
            )
        
            rss_hdd <- rbind.fill(rss_hdd, row)
          }

          consumption <- rss_hdd[
            rss_hdd$timestamp == end & is.na(as.character(rss_hdd$tag)),
            "consumption"
          ]
          consumption <- ifelse(length(consumption) > 0, consumption, NA)

          if(!is.na(consumption)) {
            row <- data.frame(
              tag = as.character(tag),
              type = "stop",
              type_label = paste0(
                "bold(omega)~", label_tag_short(c(tag), instance)
              ),
              timestamp = end,
              consumption = as.numeric(consumption),
              kind = "shdd"
            )

            rss_hdd <- rbind.fill(rss_hdd, row)
          }
        }
      }
      
      if(!is.na(likwid_data)) {
        consumption <- likwid[
          likwid$timestamp == begin & is.na(as.character(likwid$tag)),
          "consumption"
        ]
        kind <- likwid[
          likwid$timestamp == begin & is.na(as.character(likwid$tag)),
          "kind"
        ]
        consumption <- ifelse(length(consumption) > 0, consumption, NA)
        kind <- ifelse(length(kind) > 0, kind, NA)

        if(!is.na(consumption)) {
          row <- data.frame(
            tag = as.character(tag),
            type = "start",
            type_label = paste0(
              "bold(alpha)~", label_tag_short(c(tag), instance)
            ),
            timestamp = begin,
            consumption = as.numeric(consumption),
            kind = kind
          )
          
          likwid <- rbind.fill(likwid, row)
        }

        consumption <- likwid[
          likwid$timestamp == end & is.na(as.character(likwid$tag)),
          "consumption"
        ]
        kind <- likwid[
          likwid$timestamp == end & is.na(as.character(likwid$tag)),
          "kind"
        ]
        consumption <- ifelse(length(consumption) > 0, consumption, NA)
        kind <- ifelse(length(kind) > 0, kind, NA)

        if(!is.na(consumption)) {
          row <- data.frame(
            tag = as.character(tag),
            type = "stop",
            type_label = paste0(
              "bold(omega)~", label_tag_short(c(tag), instance)
            ),
            timestamp = end,
            consumption = as.numeric(consumption),
            kind = kind
          )
          
          likwid <- rbind.fill(likwid, row)
        }
      }
    }
  }

If the aesthetics argument is provided, we include constant columns described by the latter to the resulting data frame. Finally, we return a single data frame combining es_cpu, es_ram and es_rss if the rss_data argument was provided. In this case, we also include an additional column to the three data frames to distinguish between power and storage resources consumption. This is useful for plotting figures (see Section 8.6.12).

  if(length(aesthetics) > 0) {
    for(aesthetic in aesthetics) {
      name <- names(aesthetics)[aesthetics == aesthetic]
      es_cpu[[name]] <- rep(as.character(aesthetic), nrow(es_cpu))
      es_ram[[name]] <- rep(as.character(aesthetic), nrow(es_ram))

      if(!is.na(rss_data)) {
        rss_ram[[name]] <- rep(as.character(aesthetic), nrow(rss_ram))
        
        if(include_hdd) {
          rss_hdd[[name]] <- rep(as.character(aesthetic), nrow(rss_hdd))
        }
      }
      
      if(!is.na(likwid_data)) {
        likwid[[name]] <- rep(as.character(aesthetic), nrow(likwid))
      }
    }
  }

  es <- rbind(es_cpu, es_ram)
  es$kind_general <- "e"

  if(!is.na(likwid_data)) {
    likwid$kind_general <- likwid$kind
    es <- rbind.fill(es, likwid)
  }

  if(!is.na(rss_data)) {
    rss_ram$kind_general <- "s"

    if(include_hdd) {
      rss_hdd$kind_general <- "s"
      es <- rbind.fill(es, rbind(rss_ram, rss_hdd))
    } else {
      es <- rbind.fill(es, rss_ram)
    }
    es$timestamp <- es$timestamp - min(rss_ram$timestamp)
  }

  return(es)
}

8.2.3.4. Reading timeline trace files

The function read_timeline allows us to read a test_FEMBEM timeline trace trace, extract data about selected computation phases from the latter and return it in form of a data frame suitable for drawing axis intercepting lines with the associated labels using the rss_by_time plot function (see Section 8.6.11).

phases represents a data frame of selected computation phases to extract information about. Note that one computation phase label can appear more than once if the corresponding function was called multiple times. This is why the data frame contains the occurrence column (see Table 4) indicating which occurrence of a given phase should be considered. The type column indicates whether a given row in the data frame designs the beginning (ENTER) or the end (EXIT) of the corresponding computation phase. In this data frame, one can provide additional information, i.e. prettified computation phase labels for better data readability (see Table 4). The phases data frame is used to construct the output data frame of this function.

Table 4: Example of a data frame that can be passed to `read_timeline` through the `phases` argument.
type	name	label	occurrence
EXIT	mpf_solvegemm()	Block-wise Schur complement computation	1
…	…	…	…

We begin by reading the timeline trace trace.

read_timeline <- function(trace, phases) {
  complete_timeline <- read.csv(trace, header = FALSE)

For an easier column access, we add column names to the newly created data frame.

  colnames(complete_timeline) <- c("rank", "time", "type", "duration", "path")

Then, we construct the resulting timeline data frame containing only the data we are interested in based on the input phases data frame. The new data frame contains three additional columns:

time providing the corresponding computation phase delimiter timestamp,
zero a constant zero column useful for plotting vertical or horizontal axis intercepts,
middle providing the values representing the middles of the intervals defined by the values of the time column useful to center phase labels in the rss_by_time plot function (see Section 8.6.11).

  timeline <- phases
  timeline$time <- NA
  timeline$middle <- NA
  timeline$zero <- rep(0, nrow(timeline))
  for(i in 1:length(phases)) {
    time <- complete_timeline[
      complete_timeline$type == phases[i, "type"] &
      str_detect(complete_timeline$path, paste0(phases[i, "name"], "$")),
      "time"
    ]
    time <- time[[as.integer(phases[i, "occurrence"])]]
    print(time)
    middle <- ifelse(
      i > 1,
      time - ((time - timeline[i - 1, "time"]) / 2),
      time / 2
    )
    timeline[i, "time"] <- as.integer(time)
    timeline[i, "middle"] <- as.double(middle)
  }

At the end, we return the timeline data frame.

  return(timeline)
}

8.3. Compressing data

8.3.1. `compress_consumption`

The larger a trace file (see Section 8.2.2), the larger the data frame produced by the associated parsing function defined in Section 8.2.3 and consequently the bigger the final figures (in terms of file size) plotted based on these data frames. This phenomenon may then significantly the time needed to produce PDF documents featuring the aforementionned figures.

To prevent this, we define here a function named compress_consumption allowing us to reduce the number of entries in a given data frame produced by one of the functions in Section 8.2.3 except for read_timeline (see Section 8.2.3.4). compress_consumption takes two arguments, the data frame to compress and a threshold epsilon (see below).

compress_consumption <- function(data, epsilon) {

data may contain tag information when passed through the read_es function (see Section 8.2.3.3). At first, we need to identify the corresponding entries in data in order to be able to preserve them during compression.

  keep <- subset(data, !is.na(tag))
  keep <- keep$timestamp

The idea of this compression function is to start with the first entry base in data, iterate over the data points in the latter and drop all the points until the difference between base and the current data point exceeds the threshold epsilon. At that point, we choose the current point as base and continue. Note that a data point can be dropped only if it is not in keep and if the jump between the current and the very next point is not greater than epsilon. This is to prevent the degradation of peak curve shapes in figures.

  drop <- c()
  
  data_size <- nrow(data)
  base <- data[1, "consumption"]
  for(i in 2:(data_size - 1)) {
    c_timestamp <- data[i, "timestamp"]
    c_consumption <- data[i, "consumption"]
    
    if(abs(base - c_consumption) > epsilon) {
      base <- c_consumption
    } else if(!(c_timestamp %in% keep)) {
      if(abs(data[i + 1, "consumption"] - c_consumption) <= epsilon) {
        drop <- c(drop, i)
      }
    }
  }
  
  data <- data[-drop, ]

Before returning the compressed data frame, we print out some compression statistics.

  print(paste0("Removed ", length(drop), " points out of ", data_size, "."))

  return(data)
}

8.3.2. `compress_global`

The read_es function allows one to combine data frames produced by all of the functions in Section 8.2.3 except for read_timeline (see Section 8.2.3.4). To simplify compression of such combined data frames, we define here a function named compress_global allowing to apply compress_consumption (see Section 8.3.1) on each subpart of a combined data frame.

compress_global <- function(data, epsilon) {
  kinds <- unique(na.omit(data$kind))
  
  compressed <- NA
  compressed_is_na <- TRUE
  for(c_kind in kinds) {
    c_compressed <- compress_consumption(subset(data, kind == c_kind), epsilon)
    if(compressed_is_na) {
      compressed <- c_compressed
      compressed_is_na <- FALSE
    } else {
      compressed <- rbind(compressed, c_compressed)
    }
  }

  return(compressed)
}

8.4. Formatting data

8.4.1. Benchmark results

Before actually visualizing data, we have to format it. This is the role of the function gcvb_format which takes a single argument - data, an R data frame containing benchmark results imported by the gcvb_import function (see Section 8.2).

gcvb_format <- function(data) {

Due to the way certain benchmarks are defined in gcvb, the solver column may contain non-uniform values, i.e., in some cases, coupled solver names use dashes instead of slashes, which are the expected separators. For example, we need to replace the dash in mumps-spido by a slash so it becomes mumps/spido.

  if("solver" %in% colnames(data)) {
    data$solver <- ifelse(
      data$solver == "mumps-spido", "mumps/spido",
                   ifelse(data$solver == "mumps-hmat", "mumps/hmat",
                          data$solver)
    )
  }

Names of columns holding the same information, such as factorization and solve time, expected result accuracy and so on, vary depending on the solver used during a given benchmark. To ease further data treatment, we fusion these into common columns.

  data$tps_facto <- ifelse(
    is.na(data$tps_cpu_facto_mpf),
    data$tps_cpu_facto,
    data$tps_cpu_facto_mpf
  )

  data$tps_solve <- ifelse(
    is.na(data$tps_cpu_solve_mpf),
    data$tps_cpu_solve,
    data$tps_cpu_solve_mpf
  )

  data$desired_accuracy <- ifelse(
    !is.na(data$mumps_blr_accuracy),
    data$mumps_blr_accuracy,
    NA
  )

  data$size_schur <- ifelse(
    is.na(data$size_schur) & data$coupled_method == "multi-facto",
    data$disk_block_size,
    data$size_schur
  )

  if(all(c("cols_schur", "solver", "coupled_method", "coupled_nbrhs") %in%
         colnames(data))) {
    data$cols_schur <- ifelse(
      data$coupled_method == "multi-solve" & data$solver == "mumps/spido",
      data$coupled_nbrhs,
      data$cols_schur
    )
  }

  if("solver" %in% colnames(data)) {
    data$desired_accuracy <- ifelse(
      (data$solver == "hmat-bem" | data$solver == "hmat-fem" |
       data$solver == "hmat-fem-nd") &
      !is.na(data$h_assembly_accuracy),
      data$h_assembly_accuracy,
      data$desired_accuracy
    )
  }

Then, we must remove from the data frame the lines corresponding to unfinished jobs, i.e. lines without computation time information or where the computation time is null.

  data <- subset(data, !is.na(tps_facto))
  data <- subset(data, !is.na(tps_solve))
  data <- subset(data, tps_facto > 0.0)
  data <- subset(data, tps_solve > 0.0)

If the dense_ooc column is present, we have to take care of NA values so that we can make the column a factor. NA appears in case of purely in-core benchmarks that do not position this variable at all. Finaly, we replace NA by -1.

  if("dense_ooc" %in% colnames(data)) {
    data$dense_ooc <- ifelse(
      is.na(data$dense_ooc), -1, data$dense_ooc
    )
    data$dense_ooc[data$dense_ooc > 0] <- 1
    data$dense_ooc <- as.character(data$dense_ooc)
    data$dense_ooc <- factor(data$dense_ooc, levels = c("-1", "0", "1"))
  }

The same goes for the energy_scope column. In this case, NA means that the tool was not used.

  if("energy_scope" %in% colnames(data)) {
    data$energy_scope <- ifelse(
      is.na(data$energy_scope), "off", data$energy_scope
    )
    data$energy_scope <- as.character(data$energy_scope)
    data$energy_scope <- factor(data$energy_scope, levels = c("off", "on"))
  }

The memory usage peaks are held in two separate columns rm_peak, corresponding to the VmRSS field in /proc/[PID]/status, and swap_peak, corresponding to the VmSwap field in /proc/[PID]/status, which we need to sum up to obtain the total peak value proc5.

  if("swap_peak" %in% colnames(data)) {
    data$swap_peak <- ifelse(
      is.na(data$swap_peak), 0, data$swap_peak
    )
    data$rm_peak <- data$rm_peak + data$swap_peak
    data <- data[, !(names(data) %in% c("swap_peak"))]
  }

Some required columns are not present in the data frame by default, so we have to compute them based on existing data.

The total of cores used for computation is the combination of MPI process count and thread count.

  data$p_units <- data$omp_thread_num * data$processes

The value of n_blocks represents the number of blocks the dense submatrix $A_{ss}$ is split into during the Schur complement computation in the two-stage multi-factorization implementation. When this value is not explicitly set by the user, n_blocks equals auto. Although, for the plot drawing functions to operate correctly, we need to recompute the corresponding numerical values.

  if("n_blocks" %in% colnames(data)) {
    data$n_blocks <- ifelse(
      data$n_blocks == "auto",
      ceiling(data$nbem / data$size_schur),
      as.numeric(data$n_blocks)
    )
  }

Eventually, factorization and solve efficiency computation is split into several steps:

gathering best sequential times for each solver,

  if("solver" %in% colnames(data)) {
    sequentials <- subset(
      data,
      select = c("solver", "tps_facto", "tps_solve", "nbpts"),
      data$p_units == 1
    )
    
    by_node <- FALSE
    
    if(nrow(sequentials) < 1) {
      sequentials <- subset(
        data,
        select = c("solver", "tps_facto", "tps_solve", "nbpts"),
        data$processes == 1
      )
      
      by_node <- TRUE
    }
    
    if(nrow(sequentials) > 0) {
      sequentials <- merge(
        aggregate(tps_facto ~ solver + nbpts, min, data = sequentials),
        sequentials
      )
      
      sequentials <- sequentials %>% dplyr::rename(
                                              tps_facto_seq = tps_facto,
                                              tps_solve_seq = tps_solve
                                            )

adding a column containing sequential time for every run in the data frame,

      data <- dplyr::left_join(data, sequentials, by = c("solver", "nbpts"))

computing the efficiency itself.

      if(by_node) {
        data$efficiency_facto <-
          data$tps_facto_seq / (data$processes * data$tps_facto)
        data$efficiency_solve <-
          data$tps_solve_seq / (data$processes * data$tps_solve)
        data$efficiency_total <-
          (data$tps_facto_seq + data$tps_solve_seq) /
          (data$processes * (data$tps_facto + data$tps_solve))
      } else {
        data$efficiency_facto <-
          data$tps_facto_seq / (data$p_units * data$tps_facto)
        data$efficiency_solve <-
          data$tps_solve_seq / (data$p_units * data$tps_solve)
        data$efficiency_total <-
          (data$tps_facto_seq + data$tps_solve_seq) /
          (data$p_units * (data$tps_facto + data$tps_solve))
      }
    }
  }

  return(data)
}

8.5. Style presets

We also define a common set of data visualization elements to ensure a unified graphic output when drawing plots.

8.5.1. Label functions

Label functions allow us to format and prettify column names or values when used in plot legends.

8.5.1.1. `label_percentage`

This function multiplies value by 100 and returns the results as a string with appended percentage sign.

label_percentage <- function(value) {
  return(paste0(value * 100, "%"))
}

8.5.1.2. `label_time`

This function converts time values in seconds into more human-readable equivalents.

label_time <- function(labels) {
  out <- c()

  for (i in 1:length(labels)) {
    if(!is.na(labels[i])) {
      days <- as.integer(labels[i] / 86400)
      hours <- as.integer((labels[i] - (86400 * days)) / 3600)
      minutes <- as.integer((labels[i] - (86400 * days) - (3600 * hours)) / 60)
      seconds <- as.integer(
        labels[i] - (86400 * days) - (3600 * hours) - (60 * minutes)
      )
      
      out[i] <- ""
      
      if(days > 0) {
        out[i] <- paste0(out[i], days, "d")
      }
      
      if(hours > 0) {
        out[i] <- paste0(out[i], hours, "h")
      }
      
      if(minutes > 0 && days < 1) {
        out[i] <- paste0(out[i], minutes, "m")
      }
      
      if(seconds > 0 && days < 1 && hours < 1) {
        out[i] <- paste0(out[i], seconds, "s")
      }

      if(seconds < 1 && minutes < 1 && hours < 1 && days < 1) {
        out[i] <- "0s"
      } 
    } else {
      out[i] <- "?"
    }
  }
  
  return(out)
}

8.5.1.3. `label_time_short`

This function converts time values in seconds into short more human-readable equivalents.

label_time_short <- function(labels) {
  out <- c()

  for (i in 1:length(labels)) {
    if(!is.na(labels[i])) {
      days <- as.integer(labels[i] / 86400)
      hours <- as.integer((labels[i] - (86400 * days)) / 3600)
      minutes <- as.integer((labels[i] - (86400 * days) - (3600 * hours)) / 60)
      seconds <- as.integer(
        labels[i] - (86400 * days) - (3600 * hours) - (60 * minutes)
      )
      
      if(days > 0) {
        out[i] <- paste0(round(labels[i] / 86400., 1), "d")
      } else if(hours > 0) {
        out[i] <- paste0(round(labels[i] / 3600., 1), "h")
      } else if(minutes > 0) {
        out[i] <- paste0(round(labels[i] / 60., 1), "m")
      } else if(seconds > 0) {
        out[i] <- paste0(round(labels[i], 1), "s")
      } else {
        out[i] <- "0s"
      } 
    } else {
      out[i] <- "?"
    }
  }
  
  return(out)
}

8.5.1.4. `label_epsilon`

This function converts solution accuracy threshold values from numeric values to strings having the form $1 \times 10^{-n}$ or to ‘unset (no compression)’ if the value is not assigned like in the case of solvers not using data compression.

label_epsilon <- function(labels) {
  out <- c()

  for (i in 1:length(labels)) {
    if(!is.na(labels[i])) {
      out[i] <- formatC(
        as.numeric(labels[i]),
        format = "e",
        digits = 1
      )
    } else {
      out[i] <- "unset (no compression)"
    }
  }

  return(out)
}

8.5.1.5. `label_storage`

This function prettifies storage support names.

label_storage <- function(labels) {
  out <- c()
  for(i in 1:length(labels)) {
    out[i] <- switch(
      as.character(labels[i]),
      "rm_peak" = "Memory (RAM)",
      "hdd_peak" = "Disk drive"
    )
  }

  return(out)
}

8.5.1.6. `label_ooc`

This function prettifies the names of facet grids distinguishing between benchmarks with and without out-of-core computation (column dense_ooc in the data frame).

label_ooc <- function(labels) {
  out <- c()
  for(i in 1:length(labels)) {
    out[i] <- switch(
      labels[i],
      "-1" = "All in-core",
       "0" = "Out-of-core except Schur complement",
       "1" = "All out-of-core"
    )
  }

  return(out)
}

8.5.1.7. `label_solver`

This function prettifies solver names.

label_solver <- function(labels) {
  out <- c()
  for(i in 1:length(labels)) {
    out[i] <- switch(
      as.character(labels[i]),
      "spido" = "SPIDO",
      "hmat-bem" = "HMAT",
      "hmat-fem" = "HMAT",
      "hmat-fem-nd" = "HMAT (proto-ND)",
      "hmat-oss" = "HMAT-OSS",
      "chameleon" = "Chameleon",
      "mumps" = "MUMPS",
      "mumps-blr" = "MUMPS (compressed)",
      "mumps-no-blr" = "MUMPS (uncompressed)",
      "qr_mumps" = "qr_mumps",
      "mumps/spido" = "MUMPS/SPIDO",
      "qr_mumps/spido" = "qr_mumps/SPIDO",
      "qr_mumps/hmat" = "qr_mumps/HMAT",
      "mumps/hmat" = "MUMPS/HMAT",
      "hmat/hmat" = "HMAT"
    )
  }

  return(out)
}

8.5.1.8. `label_solver_config`

This function prettifies solver configuration labels.

label_solver_config <- function(labels) {
  out <- c()
  for(i in 1:length(labels)) {
    out[i] <- switch(
      as.character(labels[i]),
      "vanilla" = "without advanced features",
      "sparse-rhs" = "sparse right-hand side",
      "blr-dense-rhs" = "compression",
      "blr-sparse-rhs" = "compression and sparse right-hand side"
    )
  }

  return(out)
}

8.5.1.9. `label_tag`

This function prettifies energy_scope tag names.

label_tag <- function(labels) {
  out <- c()
  for(i in 1:length(labels)) {
    out[i] <- switch(
      as.character(labels[i]),
      # Multi-solve
      "CreateSchurComplement_MultiSolve_MUMPS" = expression(
        italic(S[i] == A[ss[i]] - A[sv[i]]*A[vv]^{-1}*A[sv[i]]^{T}) ~
        "(Schur compl. block)"
      ),
      "CreateSchurComplement_MultiSolve_MUMPS_solve" = expression(
        italic(A[vv]^{-1}*A[sv[i]]^{T}) ~
        "(sparse solve)"
      ),
      "CreateSchurComplement_MultiSolve_MUMPS_product" = expression(
        italic(A[sv[i]]*A[vv]^{-1}*A[sv[i]]^{T}) ~
        "(SpMM)"
      ),
      "MPF_SolveGEMM_MUMPS_HMAT_assembly" = expression(
        italic(compress * group("(", A[sv[i]]*A[vv]^{-1}*A[sv[i]]^{T}, ")")) ~
        "(SpMM)"
      ),
      "MPF_SolveGEMM_MUMPS_SPIDO_AXPY" = expression(
        italic(A[ss[i]] - A[sv[i]]*A[vv]^{-1}*A[sv[i]]^{T}) ~
        "(AXPY)"
      ),
      "MPF_SolveGEMM_MUMPS_HMAT_AXPY" = expression(
        italic(A[ss[i]] - A[sv[i]]*A[vv]^{-1}*A[sv[i]]^{T}) ~
        "(compressed AXPY)"
      ),
      "MPF_SolveGEMM_MUMPS" = expression(
        italic(S == A[ss] - A[sv]*A[vv]^{-1}*A[sv]^{T}) ~
        "(Schur complement)"
      ),
      # Multi-factorization
      "CreateSchurComplement" = expression(
        italic(S[i] == A[ss[ij]] - A[sv[i]]*A[vv]^{-1}*A[sv[j]]^{T}) ~
        "(Schur compl. block)"
      ),
      "CreateSchurComplement_factorization" = expression(
        italic(X[ij] == - A[sv[i]]*A[vv]^{-1}*A[sv[j]]^{T}) ~
        "(Schur compl. block)"
      ),
      "MPF_multifact_MUMPS" = expression(
        italic(S == A[ss] - A[sv]*A[vv]^{-1}*A[sv]^{T}) ~
        "(Schur complement)"
      ),
      "MPF_multifact_MUMPS_HMAT" = expression(
        italic(S == A[ss] - A[sv]*A[vv]^{-1}*A[sv]^{T}) ~
        "(Schur complement)"
      ),
      # Common
      "testMPF_factorisation_Schur" = expression(
        italic(S^{-1}) ~ "(Schur factorization)"
      ),
      "testMPF_factorisation" = expression(
        italic(A^{-1}) ~ "(factorization of" ~ italic(A) * ")"
      ),
      "testMPF_solve" = expression(
        italic(A^{-1}*x) ~ "(system solution)"
      )
    )
  }

  return(out)
}

8.5.1.10. `label_tag_short`

This function do the same thing as label_tag but returns shorter labels than the latter.

label_tag_short <- function(labels, instance = "i") {
  out <- c()
  for(i in 1:length(labels)) {
    out[i] <- switch(
      as.character(labels[i]),
      # Multi-solve
      "CreateSchurComplement_MultiSolve_MUMPS" = paste0(
        "italic(S[", instance, "])"
      ),
      "CreateSchurComplement_MultiSolve_MUMPS_solve" = paste0(
        "italic(solve[", instance, "])"
      ),
      "CreateSchurComplement_MultiSolve_MUMPS_product" = paste0(
        "italic(SpMM[", instance, "])"
      ),
      "MPF_SolveGEMM_MUMPS_HMAT_assembly" = paste0(
        "italic(compress[", instance, "])"
      ),
      "MPF_SolveGEMM_MUMPS_HMAT_AXPY" = paste0(
        "italic(CAXPY[", instance, "])"
      ),
      "MPF_SolveGEMM_MUMPS_SPIDO_AXPY" = paste0(
        "italic(AXPY[", instance, "])"
      ),
      "MPF_SolveGEMM_MUMPS" = "italic(S)",
      "CreateSchurComplement" = paste0(
        "italic(S[", instance, "])"
      ),
      # Multi-factorization
      "CreateSchurComplement_factorization" = paste0(
        "italic(X[", instance, "])"
      ),
      "MPF_multifact_MUMPS" = "italic(S)",
      "MPF_multifact_MUMPS_HMAT" = "italic(S)",
      # Common
      "testMPF_factorisation_Schur" = "italic(S^{-1})",
      "testMPF_factorisation" = "italic(A^{-1})",
      "testMPF_solve" = "italic(A^{-1}*x)"
    )
  }

  return(out)
}

8.5.1.11. `label_kind`

This function prettifies power and storage resources consumption kinds (see Section 8.2.3.3).

label_kind <- function(labels) {
  out <- c()
  for(i in 1:length(labels)) {
    out[i] <- switch(
      as.character(labels[i]),
      "ecpu" = "CPU",
      "eram" = "RAM",
      "sram" = "RAM",
      "shdd" = "Disk",
      "flops" = "Flop rate",
      "bw" = "RAM bandwidth"
    )
  }

  return(out)
}

8.5.1.12. `label_ooc_storage`

This function prettifies power and storage resources consumption kinds (see Section 8.2.3.3).

label_ooc_storage <- function(labels) {
  out <- c()
  for(i in 1:length(labels)) {
    out[i] <- switch(
      as.character(labels[i]),
      "network" = "BeeGFS (10 Gbit/s link)",
      "hdd" = "Hard drive",
      "ssd" = "Solid state drive"
    )
  }

  return(out)
}

8.5.1.13. `label_mapping`

This function transforms MPI process mapping values into strings giving a more detailed information about the underlying parallel configuration.

label_mapping <- function(labels) {
  out <- c()
  for(i in 1:length(labels)) {
    out[i] <- switch(
      as.character(labels[i]),
      "node" = "1 unbound MPI process \U2A09 1 to 24 threads",
      "socket" = "2 MPI processes bound to sockets \U2A09 1 to 12 threads",
      "numa" = "4 MPI processes bound to NUMAs \U2A09 1 to 6 threads",
      "core" = "1 to 24 MPI processes bound to cores \U2A09 1 thread"
    )
  }

  return(out)
}

8.5.1.14. `label_nbpts`

This function converts problem's unknown counts into the form N = <count> where <count> uses the scientific notation of the associated numerical values.

label_nbpts <- function(labels) {
  out <- c()

  for (i in 1:length(labels)) {
    out[i] <- paste(
      "\U1D441 =",
      as.character(
        round(as.numeric(labels[i]) / 1e+06, digits = 2)
      ),
      "M"
    )
  }

  return(out)
}

8.5.1.15. `label_nbpts_ticks`

This function converts problem's unknown counts into the form <count>M or <count>K where <count> is converted to millions and rounded to two decimals or thousands and rounded to integer values, respectively.

label_nbpts_ticks <- function(labels) {
  out <- c()

  for (i in 1:length(labels)) {
    value <- as.numeric(labels[i])
    
    if(value < 1e+06) {
      divide_by <- 1000
      show_digits <- 0
      unit <- "K"
    } else {
      divide_by <- 1e+06
      show_digits <- 2
      unit <- "M"
    }
    
    out[i] <- paste0(
      as.character(
        round(value / divide_by, digits = show_digits)
      ),
      unit
    )
  }

  return(out)
}

8.5.1.16. `label_scientific`

This function converts numeric label values to scientific notation.

label_scientific <- function(labels) {
  out <- c()

  for (i in 1:length(labels)) {
    out[i] <- formatC(
      as.numeric(labels[i]),
      format = "e",
      digits = 1
    )
  }

  return(out)
}

8.5.1.17. `label_coupling` and `label_coupling_two_stage`

These functions prettify names of the implementation schemes for the solution of coupled linear systems.

label_coupling <- function(labels) {
  out <- c()
  for(i in 1:length(labels)) {
    out[i] <- switch(
      as.character(labels[i]),
      "multi-solve" = "Multi-solve (two-stage)",
      "multi-facto" = "Multi-factorization (two-stage)",
      "full-hmat" = "Partially sparse-aware (single-stage)",
      "single-stage" = "Single-stage (prototype)"
    )
  }

  return(out)
}

label_coupling_short <- function(labels) {
  out <- c()
  for(i in 1:length(labels)) {
    out[i] <- switch(
      as.character(labels[i]),
      "multi-solve" = "Multi-solve",
      "multi-facto" = "Multi-factorization",
      "full-hmat" = "Partially sparse-aware",
      "single-stage" = "Single-stage (prototype)"
    )
  }

  return(out)
}

8.5.1.18. `label_peak_kind`

This function prettifies peak memory usage value kinds.

label_peak_kind <- function(labels) {
  out <- c()
  for(i in 1:length(labels)) {
    out[i] <- switch(
      as.character(labels[i]),
      "assembly_estimation" = expression("assembled system"),
      "schur_estimation" = expression("Schur complement matrix " * italic(S)),
      "peak_symmetric_factorization" = expression(
        "symmetric factorization of " * italic(A[vv])),
      "peak_non_symmetric_factorization" = expression(
        "non-symmetric factorization of " * italic(A[vv]))
    )
  }

  return(out)
}

8.5.1.19. `label_execution`

These functions prettify names of execution modes.

label_execution <- function(labels) {
  out <- c()
  for(i in 1:length(labels)) {
    out[i] <- switch(
      as.character(labels[i]),
      "sync" = "synchronous",
      "async" = "asynchronous"
    )
  }

  return(out)
}

8.5.2. Label maps

A label map is a variation of a label function. It provides alternative names for given columns of a dataframe.

8.5.2.1. `phase.labs`

This label map provides prettified names for factorization and solve time columns tps_facto and tps_solve, the corresponding efficiency columns efficiency_facto and efficiency_solve as well as the column holding the associated execution time of the Schur block creation phase CreateSchurComplement_exec_time.

phase.labs <- c(
  "Factorization",
  "Solve",
  "Factorization",
  "Solve",
  "Schur complement computation"
)

names(phase.labs) <- c(
  "tps_facto",
  "tps_solve",
  "efficiency_facto",
  "efficiency_solve",
  "CreateSchurComplement_exec_time"
)

8.5.3. Plot theme

To preserve a unique color palette as well as sets of point shapes and line types throughout all the plots, we provide each legend entry with the information on its:

color,

colors <- c(
  "0.001" = "#F07E26",
  "1e-06" = "#9B004F",
  "node" = "#1488CA",
  "socket" = "#95C11F",
  "numa" = "#FFCD1C",
  "core" = "#6561A9",
  "1 (36 threads)" = "#E63312",
  "2 (72 threads)" = "#F07E26",
  "4 (144 threads)" = "#9B004F",
  "8 (288 threads)" = "#1488CA",
  "(32, 32)" = "#F07E26",
  "(48, 48)" = "#9B004F",
  "(64, 64)" = "#1488CA",
  "(128, 128)" = "#95C11F",
  "(256, 256)" = "#E63312",
  "(256, 512)" = "#FFCD1C",
  "(512, 512)" = "#800080",
  "(256, 1024)" = "#6561A9",
  "(256, 2048)" = "#89CCCA",
  "(256, 4096)" = "#384257",
  "1" = "#1488CA",
  "2" = "#95C11F",
  "3" = "#FFCD1C",
  "4" = "#6561A9",
  "5" = "#384257",
  "7" = "#89CCCA",
  "10" = "#1488CA",
  "11" = "#9B004F",
  "30" = "#FFCD1C",
  "250000" = "#384257",
  "5e+05" = "#1488CA",
  "1e+06" = "#E63312",
  "1200000" = "#95C11F",
  "1400000" = "#FFCD1C",
  "1500000" = "#6561A9",
  "2e+06" = "#F07E26",
  "2500000" = "#9B004F",
  "multi-facto" = "#6561A9",
  "multi-solve" = "#89CCCA",
  "full-hmat" = "#C7D64F",
  "single-stage" = "#C7D64F",
  "spido" = "#384257",
  "hmat" = "#9B004F",
  "mumps" = "#F07E26",
  "mumps-blr" = "#1488CA",  
  "qr_mumps" = "#95C11F",
  "mumps/spido" = "#95C11F",
  "mumps/hmat" = "#FFCD1C",
  "ecpu" = "#384257",
  "eram" = "#E63312",
  "sram" = "#1488CA",
  "shdd" = "#95C11F",
  "network" = "#384257",
  "hdd" = "#1488CA",
  "ssd" = "#95C11F",
  "flops" = "#9B004F",
  "bw" = "#1488CA",  
  "vanilla" = "#1488CA",
  "sparse-rhs" = "#95C11F",
  "blr-dense-rhs" = "#FFCD1C",
  "blr-sparse-rhs" = "#6561A9"
)

point shape style,

shapes <- c(
  "spido" = 16,
  "hmat-bem" = 15,
  "hmat-fem" = 15,
  "hmat-oss" = 15,
  "hmat-fem-nd" = 23,
  "chameleon" = 18,
  "mumps" = 16,
  "qr_mumps" = 17,
  "mumps/spido" = 16,
  "qr_mumps/spido" = 17,
  "qr_mumps/hmat" = 17,
  "mumps/hmat" = 15,
  "hmat/hmat" = 18,
  "rm_peak" = 16,
  "swap_peak" = 17,
  "hdd_peak" = 15,
  "sync" = 16,
  "async" = 15,
  "CreateSchurComplement_MultiSolve_MUMPS" = 22,
  "CreateSchurComplement_MultiSolve_MUMPS_solve" = 22,
  "CreateSchurComplement_MultiSolve_MUMPS_product" = 21,
  "MPF_SolveGEMM_MUMPS_HMAT_assembly" = 23,
  "MPF_SolveGEMM_MUMPS_SPIDO_AXPY" = 24,
  "MPF_SolveGEMM_MUMPS_HMAT_AXPY" = 25,
  "MPF_SolveGEMM_MUMPS" = 23,
  "CreateSchurComplement" = 22,
  "CreateSchurComplement_factorization" = 22,
  "MPF_multifact_MUMPS" = 23,
  "MPF_multifact_MUMPS_HMAT" = 23,
  "testMPF_factorisation_Schur" = 24,
  "testMPF_factorisation" = 23,
  "testMPF_solve" = 24
)

line type.

linetypes <- c(
  "spido" = "solid",
  "hmat-bem" = "dotted",
  "hmat-fem" = "dotted",
  "hmat-oss" = "dotted",
  "hmat-fem-nd" = "longdash",
  "chameleon" = "longdash",
  "mumps" = "solid",
  "qr_mumps" = "dashed",
  "mumps/spido" = "solid",
  "qr_mumps/spido" = "dashed",
  "qr_mumps/hmat" = "dashed",
  "mumps/hmat" = "dotted",
  "hmat/hmat" = "longdash",
  "rm_peak" = "solid",
  "swap_peak" = "dotted",
  "hdd_peak" = "longdash",
  "sync" = "solid",
  "async" = "dotted",
  "assembly_estimation" = "solid",
  "peak_symmetric_factorization" = "dotted",
  "peak_non_symmetric_factorization" = "longdash",
  "schur_estimation" = "dashed"
)

Furthermore, each plot object uses a set of common theme layers provided by the function generate_theme. If needed, the latter allows us to provide custom breaks for color, point shape or line type aesthetics, legend title, the number of lines allowed in the legend as well as the legend box placement.

generate_theme <- function(
  color_breaks = waiver(),
  color_labels = waiver(),
  color_override_aes = list(),
  shape_breaks = waiver(),
  shape_labels = waiver(),
  shape_override_aes = list(),
  linetype_breaks = waiver(),
  linetype_labels = waiver(),
  linetype_override_aes = list(),
  legend_title = element_text(family = plot_font, size = 14, face = "bold"),
  legend_text = element_text(family = plot_font, size = 14),
  theme_text = element_text(family = plot_font, size = 16),
  legend_rows = 1,
  legend_rows_color = NA,
  legend_rows_fill = NA,
  legend_rows_shape = NA,
  legend_rows_linetype = NA,
  legend_box = "vertical",
  legend_position = "bottom"
) {
  return(list(
    scale_color_manual(
      values = colors,
      na.value = "#384257",
      labels = color_labels,
      breaks = color_breaks
    ),
    scale_fill_manual(
      values = colors,
      na.value = "#384257",
      labels = color_labels,
      breaks = color_breaks
    ),
    scale_shape_manual(
      values = shapes,
      labels = shape_labels,
      breaks = shape_breaks,
      na.translate = F
    ),
    scale_linetype_manual(
      values = linetypes,
      labels = linetype_labels,
      breaks = linetype_breaks
    ),
    guides(
      shape = guide_legend(
        order = 1,
        nrow = ifelse(
          is.na(legend_rows_shape),
          legend_rows,
          legend_rows_shape
        ),
        byrow = TRUE,
        override.aes = shape_override_aes
      ),
      linetype = guide_legend(
        order = 1,
        nrow = ifelse(
          is.na(legend_rows_linetype),
          legend_rows,
          legend_rows_linetype
        ),
        byrow = TRUE,
        override.aes = linetype_override_aes
      ),
      color = guide_legend(
        order = 2,
        nrow = ifelse(
          is.na(legend_rows_color),
          legend_rows,
          legend_rows_color
        ),
        byrow = TRUE,
        override.aes = color_override_aes
      ),
      fill = guide_legend(
        order = 2,
        nrow = ifelse(
          is.na(legend_rows_fill),
          legend_rows,
          legend_rows_fill
        ),
        byrow = TRUE,
        override.aes = color_override_aes
      )
    ),

By default (see above), we set font family to plot_font (see Section 8.1), font size to 16 points and legend text size to 14 points.

    theme_bw(),
    theme(
      text = theme_text,
      legend.text = legend_text,
      legend.title = legend_title,

We place the legend at the desired position and sourround it with a rectangle.

      legend.position = legend_position,
      legend.background = element_rect(color = "gray40", size = 0.5),

We place legend items either side by side or one by line.

      legend.box = legend_box,

For better visibility, we make legend symbols longer.

      legend.key.width = unit(3, "line"),

Finally, we rotate X-axis text to avoid long label overwriting.

      axis.text.x = element_text(angle = 60, hjust = 1)
    )
  ))
}

8.6. Plot functions

In performance analysis, the metrics we study are in general the same. So, we prepared a series of plot functions for each type of data visualization. This way, we can for example reuse the same function to plot disk usage peaks by problem unknown's count for any combination of solvers.

8.6.1. `times_by_nbpts`

This function returns a plot of factorization and solve times relatively to linear system's unknown count for a given set of either dense or sparse solvers.

We combine both metrics in one plot in which they are distinguished thanks to a facet grid. Although, to be able to use the latter, we need to convert the data frame from wide format into long format (see Section 8.2) RwideToLong.

The solver_config option allows one to distinguish between different solver configurations on the figure instead of different precision parameter $\epsilon$ values.

By default, the trans_x and trans_y arguments set the axis scales to logarithmic using log10. To use standard scale, the value identity should be used. The time_break_width allows one to set custom Y-axis break widths (in seconds).

times_by_nbpts <- function(dataframe, solver_config = FALSE,
                           trans_x = "log10", trans_y = "log10",
                           time_break_width = 3600) {
  dataframe_long <- gather(
    dataframe,
    key = "computation_phase",
    value = "time",
    c("tps_facto", "tps_solve")
  )

Now, we can begin to define the resulting ggplot2 plot object. The column names featured in this plotting function are:

nbpts linear system's unknown count,
tps_facto represents factorization time in seconds,
tps_solve represents solve time in seconds,
solver_config gives the solver configuration used,
desired_accuracy represents solution accuracy threshold $\epsilon$,
computation_phase tells whether the row contains factorization or solve time (used in the long formatted data frame dataframe_long),
time contains factorization or solve time (used in the long formatted data frame dataframe_long).

  plot <- ggplot(dataframe_long, aes(x = nbpts, y = time))

If solver_config is TRUE, we want to distinguish between different solver configurations.

  if(solver_config) {
    plot <- ggplot(
      dataframe_long,
      aes(
        x = nbpts,
        y = time,
        color = as.character(solver_config),
        shape = solver,
        linetype = solver
      )
    )
  } else if(!all(is.na(dataframe_long$desired_accuracy))) {

Otherwise, we will try to distinguish between various data compression levels.

    plot <- ggplot(
      dataframe_long,
      aes(
        x = nbpts,
        y = time,
        color = as.character(desired_accuracy),
        shape = solver,
        linetype = solver
      )
    )
  }

Then, we scale the axes.

  plot <- plot +
    geom_line() +
    geom_point(size = 2.5) +
    scale_x_continuous(
      "# Unknowns (\U1D441)",
      trans = trans_x,
      breaks = dataframe_long$nbpts,
      labels = label_nbpts_ticks
    ) +
    scale_y_continuous(
      "Computation time",
      trans = trans_y,
      breaks = breaks_width(time_break_width),
      labels = label_time_short
    ) +

Finally, using facet grid we distinguish between factorization and solve time on horizontal and between each solver considered on vertical row. The plot theme is provided by the generate_theme function (see Section 8.5.3).

The color label functions depends on the value of solver_config.

    facet_grid(
      . ~ computation_phase,
      labeller = labeller(computation_phase = phase.labs)
    )
  if(solver_config) {
    plot <- plot +
      labs(color = "Configuration", shape = "Solver", linetype = "Solver") +
      generate_theme(
        color_labels = label_solver_config,
        color_override_aes = list(linetype = 0, shape = 15, size = 8),
        shape_labels = label_solver,
        linetype_labels = label_solver,
        legend_rows = length(unique(na.omit(dataframe_long$solver_config))) / 2
      )
  } else {
    plot <- plot +
      labs(color = "\U1D700", shape = "Solver", linetype = "Solver") +
      generate_theme(
        color_labels = label_epsilon,
        color_override_aes = list(linetype = 0, shape = 15, size = 8),
        shape_labels = label_solver,
        linetype_labels = label_solver
      )
  }

  return(plot)
}

8.6.2. `multisolve_times_by_nbrhs_and_nbpts`

This function returns a plot of factorization time relatively to linear system's unknown count for different counts of right-hand sides used during the Schur complement computation when relying on the two-stage multi-solve implementation scheme. Using the ooc switch, we can include a facet grid to distinguish between runs with and without out-of-core Schur complement computation. The distributed switch shall adapt the output figure for multi-node benchmarks if set to TRUE. y_break_width allows one to customize the distance between Y-axis breaks. By default, it corresponds to 20% of the maximum Y-axis value, i.e. the maximum value of the tps_facto column.

Now, we can begin to define the plot object. The column names featured in this plotting function are:

nbpts linear system's unknown count,
tps_facto represents factorization time in seconds,
tps_solve represents solve time in seconds,
processes gives the number of MPI processes used for the computation,
p_units gives the total number of threads used for the computation,
cols_schur gives the number of columns in a Schur complement block,
solver contains the names of solvers featured in the coupling.

Note that we create an additional column p_labels. It contains the legend labels for the color aesthetics combining the values of processes and p_units, e.g. 2 (72 threads) for 2 nodes and 72 threads in total.

multisolve_times_by_nbpts_and_nbrhs <- function(dataframe, ooc = FALSE,
                                                distributed = FALSE) {
  dataframe$cols_schur_labels <- paste0(
    "(", dataframe$coupled_nbrhs, ", ", dataframe$cols_schur, ")"
  )
  
  cbreaks <- unique(na.omit(dataframe$cols_schur_labels))
  cbreaks <- cbreaks[order(
    as.integer(gsub("\\(([0-9]+), ([0-9]+)\\)",
                    "\\1", cbreaks)),
    as.integer(gsub("\\(([0-9]+), ([0-9]+)\\)", "\\2", cbreaks)))]

  if(distributed) {
    dataframe$p_labels <- paste0(
      dataframe$processes, " (", dataframe$p_units, " threads)"
    )
    plot <- ggplot(
      dataframe,
      aes(
        x = nbpts,
        y = tps_facto,
        color = p_labels,
        shape = solver,
        linetype = solver
      )
    )
  } else {
    plot <- ggplot(
      dataframe,
      aes(
        x = nbpts,
        y = tps_facto,
        color = as.character(cols_schur_labels),
        shape = solver,
        linetype = solver
      )
    )
  }
  plot <- plot +
    geom_line() +
    geom_point(size = 2.5) +

The X-axis shows the count of unknowns in the linear system and the Y-axis shows the computation time.

    scale_x_continuous(
      "# Unknowns (\U1D441)",
      breaks = dataframe$nbpts,
      labels = label_nbpts_ticks
    ) +
    scale_y_continuous(
      "Factorization time",
      breaks = breaks_log(n = 10, base = 10),
      trans = "log10",
      labels = label_time_short
    )

If the ooc switch is set to TRUE, include a facet grid based on the dense_ooc column values.

  if(ooc) {
    plot <- plot +
      facet_grid(
        . ~ dense_ooc,
        labeller = labeller(dense_ooc = label_ooc)
      )
  }

Finally, we set the legend title, apply the custom theme and return the plot object. The color breaks are automatically determined based on the values of the cols_schur column.

  if(distributed) {
    plot <- plot +
      labs(
        shape = "Solver coupling",
        linetype = "Solver coupling",
        color = "# Nodes"
      ) +
      generate_theme(
        color_breaks = as.character(sort(unique(na.omit(dataframe$p_labels)))),
        shape_labels = label_solver,
        linetype_labels = label_solver
      )
  } else {
    plot <- plot +
      labs(
        shape = "Solver coupling",
        linetype = "Solver coupling",
        color = expression(
          bold(group("(", list(n[italic(c)], n[italic(S)]), ")"))
        )
      ) +
      generate_theme(
        color_breaks = cbreaks,
        shape_labels = label_solver,
        linetype_labels = label_solver
      )
  }

  return(plot)
}

8.6.3. `multisolve_rss_by_nbrhs_and_nbpts`

This function returns a plot of real memory (RAM) usage peaks relatively to linear system's unknown count for different counts of right-hand sides used during the Schur complement computation when relying on the two-stage multi-solve implementation scheme. The function can take two extra arguments, limit_max and tick_values, allowing to redefine the default Y-axis maximum and tick values respectively. Using the ooc switch, we can include a facet grid to distinguish between runs with and without out-of-core Schur complement computation as well as between RAM and disk space consumption.

The column names featured in this plotting function are:

nbpts linear system's unknown count,
rm_peak real memory usage peaks, stored in mibibytes (MiB) but converted to gibibytes (GiB),
hdd_peak hard drive usage peaks, stored in mibibytes (MiB) but converted to gibibytes (GiB),
processes gives the number of MPI processes used for the computation,
p_units gives the total number of threads used for the computation,
cols_schur gives the number of columns in a Schur complement block,
solver contains the names of solvers featured in the coupling.

Note that we create an additional column p_labels. It contains the legend labels for the color aesthetics combining the values of processes and p_units, e.g. 2 (48 threads) for 2 nodes and 48 threads in total.

We begin by defining the plot object directly.

multisolve_rss_by_nbpts_and_nbrhs <- function(dataframe, limit_max = 128,
                                              tick_values = c(
                                                0, 30, 60, 90, 120),
                                              ooc = FALSE,
                                              distributed = FALSE
                                              ) {
  df <- dataframe
  df$cols_schur_labels <- paste0(
    "(", df$coupled_nbrhs, ", ", df$cols_schur, ")"
  )

  cbreaks <- unique(na.omit(df$cols_schur_labels))
  cbreaks <- cbreaks[order(as.integer(gsub("\\(([0-9]+), ([0-9]+)\\)", "\\1", cbreaks)),as.integer(gsub("\\(([0-9]+), ([0-9]+)\\)", "\\2", cbreaks)))]

  y_axis_label <- "Real memory (RAM) usage peak [GiB]"

  if(distributed) {
    df$p_labels <- paste0(
      df$processes, " (", df$p_units, " threads)"
    )

    plot <- ggplot(
      df,
      aes(
        x = nbpts,
        y = rm_peak / 1024.,
        color = p_labels,
        shape = solver,
        linetype = solver
      )
    )
  } else if(ooc) {
    rss_kinds <- c("rm_peak", "hdd_peak")
    df <- gather(
      df,
      key = "rss_kind", value = "rss_peak",
      all_of(rss_kinds)
    )
    df$rss_kind <- factor(df$rss_kind, levels = rss_kinds)
    y_axis_label <- "Storage usage peak [GiB]"
    plot <- ggplot(
      df,
      aes(
        x = nbpts,
        y = rss_peak / 1024.,
        color = as.character(cols_schur_labels),
        shape = solver,
        linetype = solver
      )
    )
  } else {
    plot <- ggplot(
      df,
      aes(
        x = nbpts,
        y = rm_peak / 1024.,
        color = as.character(cols_schur_labels),
        shape = solver,
        linetype = solver
      )
    )
  }

  plot <- plot +
    geom_line() +
    geom_point(size = 2.5) +
    geom_hline(
      data = subset(df, rss_kind == "rm_peak"),
      aes(yintercept = limit_max)
    ) +
    geom_text(
      data = subset(df, rss_kind == "rm_peak"),
      aes(
        min(df$nbpts), limit_max,
        label = "Physical memory limit", vjust = 2, hjust = 0
      ),
      color = "black",
      family = plot_font,
      size = 5
    ) +

The X-axis shows the count of unknowns in the linear system and the Y-axis shows the RAM usage peaks. On the Y-axis, we round the values to 0 decimal places.

    scale_x_continuous(
      "# Unknowns (\U1D441)",
      breaks = df$nbpts,
      labels = label_nbpts_ticks
    ) +
    scale_y_continuous(
      y_axis_label,
      labels = function(label) sprintf("%.0f", label),
      limits = c(NA, NA),
      breaks = tick_values
    )

If the ooc switch is set to TRUE, include a facet grid based on the rss_kind and the dense_ooc column values.

  if(ooc) {
    plot <- plot +
      facet_grid(
        rss_kind ~ dense_ooc,
        labeller = labeller(rss_kind = label_storage, dense_ooc = label_ooc)
      )
  }

Finally, we set the legend title, apply the custom theme and return the plot object. The color breaks are automatically determined based on the values of the cols_schur column.

  if(distributed) {
    plot <- plot +
      labs(
        shape = "Solver coupling",
        linetype = "Solver coupling",
        color = "# Nodes"
      ) +
      generate_theme(
        color_breaks = as.character(sort(unique(na.omit(df$p_labels)))),
        shape_labels = label_solver,
        linetype_labels = label_solver
      )
  } else {
    plot <- plot +
      labs(
        shape = "Solver coupling",
        linetype = "Solver coupling",
        color = expression(
          bold(group("(", list(n[italic(c)], n[italic(S)]), ")"))
        )
      ) +
      generate_theme(
        color_breaks = cbreaks,
        shape_labels = label_solver,
        linetype_labels = label_solver
      )
  }

  return(plot)
}

8.6.4. `coupled_memory_aware`

This function returns a plot of factorization time relatively to residual memory (RAM) usage peaks during the Schur complement computation when relying on the two-stage multi-solve implementation scheme. In this case, we show the results for a fixed problem's size, i.e. my_nbpts. narrow allow to optimize the output for multi-column document layout.

The column names featured in this plotting function are:

tps_facto represents factorization time in seconds,
rm_peak real memory usage peaks, stored in mibibytes (MiB) but converted to gibibytes (GiB),
solver contains the names of solvers featured in the coupling.

We begin by shrinking the data frame on input according to my_nbpts.

coupled_memory_aware <- function(dataframe, my_nbpts, narrow = FALSE,
                                 font = plot_font, fontsize = 16, log = TRUE,
                                 scheme = "multi-solve", multi_node = FALSE) {
  dataframe_largest <- NA

  if(multi_node) {
    dataframe_largest <- subset(
      dataframe, coupled_method == scheme
    )
  } else {
    dataframe_largest <- subset(
      dataframe, coupled_method == scheme & nbpts == my_nbpts
    )
  }

Then, we create an additional column in the data frame to hold prettified labels providing the Schur complement computation phase parameter values.

  if(multi_node) {
    if(scheme == "multi-solve") {
      dataframe_largest$label_content <- ifelse(
        dataframe_largest$solver != "mumps/spido",
        paste0(
          "group(\"(\", list(", dataframe_largest$coupled_nbrhs, ", ",
          "", dataframe_largest$cols_schur, "), \")\")"
        ),
        paste0(
          "group(\"(\", list(", dataframe_largest$coupled_nbrhs, ", ",
          "", dataframe_largest$coupled_nbrhs, "), \")\")"
        )
      )
    } else {
      dataframe_largest$label_content <- as.character(
        dataframe_largest$n_blocks
      )
    }    
  } else {
    if(scheme == "multi-solve") {
      dataframe_largest$label_content <- ifelse(
        dataframe_largest$solver != "mumps/spido",
        paste0(
          "atop(n[c]==", dataframe_largest$coupled_nbrhs, ", ",
          "n[S]==", dataframe_largest$cols_schur, ")"
        ),
        paste0("n[c]==", dataframe_largest$coupled_nbrhs)
      )
    } else {
      dataframe_largest$label_content <- paste0(
        "n[b]==", dataframe_largest$n_blocks
      )
    }
  }

Also, we need to reorder entries in the data frame following the coupled_nbrhs, cols_schur and n_blocks columns.

  if(scheme == "multi-solve") {
    dataframe_largest <-
      dataframe_largest %>% arrange(coupled_nbrhs, cols_schur)
  } else {
    dataframe_largest <-
      dataframe_largest %>% arrange(desc(n_blocks))
  }

Finally, we have to determine the maximal memory usage peak among all the processes involved in the computation.

  for(i in 1:nrow(dataframe_largest)) {
    dataframe_largest[i, "rm_peak_max"] <- max(
      as.numeric(
        strsplit(dataframe_largest[i, "rm_values"], split = " ")[[1]]
      )
    )
  }

Now, we define the plot object.

  plot <- NA

  if(multi_node) {
    dataframe_largest$p_labels <- paste0(
      dataframe_largest$processes, " (", dataframe_largest$p_units, " threads)"
    )
    plot <- ggplot(
      dataframe_largest,
      aes(
        x = rm_peak_max / 1024.,
        y = tps_facto,
        color = p_labels,
        label = label_content,
        shape = solver,
        linetype = solver
      )
    )
  } else {
    plot <- ggplot(
      dataframe_largest,
      aes(
        x = rm_peak / 1024.,
        y = tps_facto,
        color = coupled_method,
        label = label_content,
        shape = solver,
        linetype = solver
      )
    )
  }
  plot <- plot +
    geom_path(size = 0.8) +
    geom_point(size = 2.5) +
    coord_cartesian(clip = "off") +

We make use of the geom_label_repel function from the ggrepel R package to obtain better label layout.

    geom_label_repel(
      parse = TRUE,
      xlim = c(NA, NA), ylim = c(NA, NA),
      color = "black",
      family = font,
      size = 3,
      min.segment.length = 0,
      max.overlaps = Inf
    )

The X-axis shows the RAM usage peaks and the Y-axis shows the count of unknowns in the linear system. We use custom break and limit values for better readability.

  if(log) {
    plot <- plot +
      scale_x_continuous(
        "Maximal random access memory (RAM) usage peak [GiB]",
        breaks = breaks_log(10),
        limits = c(NA, NA),
        trans = "log10"
      ) +
      scale_y_continuous(
        "Factorization time",
        trans = "log10",
        labels = label_time_short,
        breaks = breaks_log(6)
      )
  } else {
    plot <- plot +
      scale_x_continuous(
        "Maximal random access memory (RAM) usage peak [GiB]",
        breaks = breaks_extended(10),
        limits = c(NA, NA)
      ) +
      scale_y_continuous(
        "Factorization time",
        labels = label_time_short,
        breaks = breaks_extended(6)
      )
  }

Finally, we set the legend title, apply the custom theme and return the plot object. Note that we do not show the legend for the color aesthetics as there is only one item, the multi-solve implementation scheme. However, we still define the color aesthetics in order to preserve the general color scheme.

  if(narrow) {
    plot <- plot +
      labs(
        shape = "Solver coupling",
        linetype = "Solver coupling"
      ) +
      generate_theme(
        shape_labels = label_solver,
        shape_override_aes = list(color = colors[scheme]),
        linetype_labels = label_solver,
        linetype_override_aes = list(color = colors[scheme]),
        legend_rows = 2,
        legend_box = "horizontal",
        legend_title = element_text(
          family = font, size = fontsize - 2, face = "bold"
        ),
        legend_text = element_text(family = font, size = fontsize - 2),
        theme_text = element_text(family = font, size = fontsize)
      ) +
      guides(
        color = FALSE,
        fill = FALSE
      )
  } else if(multi_node) {
    plot <- plot +
      facet_wrap(
        . ~ nbpts,
        labeller = labeller(nbpts = label_nbpts),
        scales = "free"
      ) +
      labs (
        shape = "Solver coupling",
        linetype = "Solver coupling",
        color = "Parallel configuration"
      ) +
      generate_theme(
        color_breaks = as.character(
          sort(unique(na.omit(dataframe_largest$p_labels)))
        ),
        shape_labels = label_solver,
        linetype_labels = label_solver,
        legend_title = element_text(
          family = font, size = fontsize - 2, face = "bold"
        ),
        legend_text = element_text(family = font, size = fontsize - 2),
        theme_text = element_text(family = font, size = fontsize)
      ) +
      guides(
        fill = FALSE
      )
  } else {
    plot <- plot +
      labs(
        shape = "Solver coupling",
        linetype = "Solver coupling"
      ) +
      generate_theme(
        shape_labels = label_solver,
        shape_override_aes = list(color = colors[scheme]),
        linetype_labels = label_solver,
        linetype_override_aes = list(color = colors[scheme]),
        legend_title = element_text(
          family = font, size = fontsize - 2, face = "bold"
        ),
        legend_text = element_text(family = font, size = fontsize - 2),
        theme_text = element_text(family = font, size = fontsize)
      ) +
      guides(
        color = FALSE,
        fill = FALSE
      )
  }

  return(plot)
}

8.6.5. `multifacto_times_by_nbpts_and_schur_size`

This function returns a plot of factorization time relatively to linear system's unknown count and the number of blocks $n_b$ the Schur submatrix is split into during the Schur complement computation phase when relying on the two-stage multi-factorization implementation scheme.

The out-of-core block size column is not the same for all the solver couplings we consider (column disk_block_size in case of MUMPS/SPIDO and schur_size in case of MUMPS/HMAT), so we need to copy the data to the same column, size_schur.

multifacto_times_by_nbpts_and_schur_size <- function(dataframe,
                                                     ooc = FALSE,
                                                     distributed = FALSE) {
  local <- dataframe
  local$size_schur[local$solver == "mumps/spido"] =
    local$disk_block_size[local$solver == "mumps/spido"]

Now, we can begin to define the plot object. The column names featured in this plotting function are:

nbpts linear system's unknown count,
tps_facto represents factorization time in seconds,
n_blocks gives the count of blocks the Schur complement matrix was split to,
size_schur gives the out-of-core block size in number of matrix lines or columns
solver contains the names of solvers featured in the coupling.

  if(distributed) {
    local$p_labels <- paste0(
      local$processes, " (", local$p_units, " threads)"
    )
    plot <- ggplot(
      local,
      aes(
        x = nbpts,
        y = tps_facto,
        color = p_labels,
        shape = solver,
        linetype = solver
      )
    )
  } else {
    plot <- ggplot(
      local,
      aes(
        x = nbpts,
        y = tps_facto,
        color = as.character(n_blocks),
        shape = solver,
        linetype = solver
      )
    )
  }
  plot <- plot +
    geom_line() +
    geom_point(size = 2.5) +

The X-axis shows the count of unknowns in the linear system and the Y-axis shows the computation time.

    scale_x_continuous(
      "# Unknowns (\U1D441)",
      breaks = local$nbpts,
      labels = label_nbpts_ticks
    ) +
    scale_y_continuous(
      "Factorization time",
      breaks = breaks_log(n = 10, base = 10),
      trans = "log10",
      labels = label_time_short
    )

If the ooc switch is set to TRUE, include a facet grid based on the dense_ooc column values.

  if(ooc) {
    plot <- plot +
      facet_grid(
        . ~ dense_ooc,
        labeller = labeller(dense_ooc = label_ooc)
      )
  }

Finally, we set the legend title, apply the custom theme and return the plot object. The color breaks are automatically determined based on the values of the nbpts column.

  if(distributed) {
    plot <- plot +
      labs(
        shape = "Solver coupling",
        linetype = "Solver coupling",
        color = "# Nodes"
      ) +
      generate_theme(
        color_breaks = as.character(sort(unique(na.omit(local$p_labels)))),
        shape_labels = label_solver,
        linetype_labels = label_solver
      )
  } else {
    plot <- plot +
      labs(
        shape = "Solver coupling",
        linetype = "Solver coupling",
        color = expression(n[italic(b)])
      ) +
      generate_theme(
        color_breaks = as.character(sort(unique(na.omit(local$n_blocks)))),
        shape_labels = label_solver,
        linetype_labels = label_solver
      )
  }

  return(plot)
}

8.6.6. `multifacto_rss_by_nbpts_and_schur_size`

This function returns a plot of real memory (RAM) usage peaks relatively to linear system's unknown count and the number of blocks $n_b$ the Schur submatrix is split into during the Schur complement computation phase when relying on the two-stage multi-factorization implementation scheme. The function can take two extra arguments, limit_max and tick_values, allowing to redefine the default Y-axis maximum and tick values respectively.

multifacto_rss_by_nbpts_and_schur_size <- function(dataframe, limit_max = 128,
                                                   tick_values = c(
                                                     0, 30, 60, 90, 120),
                                                   ooc = FALSE,
                                                   distributed = FALSE) {
  local <- dataframe
  local$size_schur[local$solver == "mumps/spido"] =
    local$disk_block_size[local$solver == "mumps/spido"]

Now, we can begin to define the plot object. The column names featured in this plotting function are:

nbpts linear system's unknown count,
rm_peak real memory usage peaks, stored in mibibytes (MiB) but converted to gibibytes (GiB),
n_blocks gives the count of blocks the Schur complement matrix was split to,
size_schur gives the out-of-core block size in number of matrix lines or columns
solver contains the names of solvers featured in the coupling.

  if(distributed) {
    local$p_labels <- paste0(
      local$processes, " (", local$p_units, " threads)"
    )
    
    plot <- ggplot(
      local,
      aes(
        x = nbpts,
        y = rm_peak / 1024.,
        color = as.character(p_labels),
        shape = solver,
        linetype = solver
      )
    )
  } else if(ooc) {
    local <- gather(
      local,
      key = "rss_kind", value = "rss_peak",
      c("rm_peak", "hdd_peak")
    )
    local$rss_kind <- factor(local$rss_kind, levels = c("rm_peak", "hdd_peak"))
    plot <- ggplot(
      local,
      aes(
        x = nbpts,
        y = rss_peak / 1024.,
        color = as.character(n_blocks),
        shape = solver,
        linetype = solver
      )
    )
  } else {
    plot <- ggplot(
      local,
      aes(
        x = nbpts,
        y = rm_peak / 1024.,
        color = as.character(n_blocks),
        shape = solver,
        linetype = solver
      )
    )
  }
  plot <- plot +
    geom_line() +
    geom_point(size = 2.5) +
    geom_hline(
      data = subset(local, rss_kind == "rm_peak"),
      aes(yintercept = limit_max)
    ) +
    geom_text(
      data = subset(local, rss_kind == "rm_peak"),
      aes(
        min(local$nbpts), limit_max,
        label = "Physical memory limit", vjust = 2, hjust = 0
      ),
      color = "black",
      family = plot_font,
      size = 5
    ) +

The X-axis shows the count of unknowns in the linear system and the Y-axis shows the RAM usage peaks. On the Y-axis, we round the values to 0 decimal places.

    scale_x_continuous(
      "# Unknowns (\U1D441)",
      breaks = local$nbpts,
      labels = label_nbpts_ticks
    ) +
    scale_y_continuous(
      "Storage usage peak [GiB]",
      labels = function(label) sprintf("%.0f", label),
      limits = c(NA, NA),
      breaks = tick_values
    )

If the ooc switch is set to TRUE, include a facet grid based on the rss_kind and the dense_ooc column values.

  if(ooc) {
    plot <- plot +
      facet_grid(
        rss_kind ~ dense_ooc,
        labeller = labeller(rss_kind = label_storage, dense_ooc = label_ooc)
      )
  }

Finally, we set the legend title, apply the custom theme and return the plot object. The color breaks are automatically determined based on the values of the nbpts column.

  if(distributed) {
    plot <- plot +
      labs(
        shape = "Solver coupling",
        linetype = "Solver coupling",
        color = "# Nodes"
      ) +
      generate_theme(
        color_breaks = as.character(sort(unique(na.omit(local$p_labels)))),
        shape_labels = label_solver,
        linetype_labels = label_solver
      )
  } else {
    plot <- plot +
      labs(
        shape = "Solver coupling",
        linetype = "Solver coupling",
        color = expression(n[italic(b)])
      ) +
      generate_theme(
        color_breaks = as.character(sort(unique(na.omit(local$n_blocks)))),
        shape_labels = label_solver,
        linetype_labels = label_solver
      )
  }

  return(plot)
}

8.6.7. `compare_coupled`

This function returns a plot of the best factorization times relatively to linear system's unknown count for each implementation scheme for solving coupled systems.

For example, if we want to plot only the results regarding two-stage schemes, we can use the legend_rows argument to adjust the number of rows in the legend. Then, the shorten argument allows to remove legend titles and use shorter legend key labels in order to reduce the final canvas size.

We begin by restraining the input data frame to the best results of two-stage and single-stage schemes.

Note that as the latter imply the use of only one solver and not a coupling of solvers, we need to specify the coupling method name manually to enable data frame treatment.

compare_coupled <- function(dataframe, legend_rows = 3,
                            legend_box = "horizontal", shorten = FALSE,
                            check_epsilon = FALSE, font = plot_font,
                            fontsize = 16) {
  dataframe_best <- dataframe
  dataframe_best$coupled_method <- as.character(dataframe_best$coupled_method)
  dataframe_best$coupled_method[dataframe_best$solver == "hmat/hmat"] <-
  "full-hmat"
  dataframe_best$coupled_method[dataframe_best$solver == "qr_mumps/hmat"] <-
  "single-stage"
  dataframe_best <- merge(
    aggregate(
      tps_facto ~ coupled_method + nbpts + solver,
      min,
      data = dataframe_best
    ),
    dataframe_best
  )

Now, we can begin to define the plot object. The column names featured in this plotting function are:

nbpts linear system's unknown count,
tps_facto represents factorization time in seconds,
coupled_method gives the name of the approach used to solve coupled FEM/BEM systems (e. g. multi-solve, etc.),
solver contains the names of solvers featured in the coupling.

  plot <- NA
  if(check_epsilon) {
    plot <- ggplot(
      dataframe_best,
      aes(
        x = nbpts,
        y = error,
        color = coupled_method,
        shape = solver,
        linetype = solver
      )
    )
  } else {
    plot <- ggplot(
      dataframe_best,
      aes(
        x = nbpts,
        y = tps_facto,
        color = coupled_method,
        shape = solver,
        linetype = solver
      )
    )
  }

  plot <- plot +
    geom_line() +
    geom_point(size = 2.5)

The X-axis shows the count of unknowns in the linear system and the Y-axis shows the computation time.

  if(check_epsilon) {
    vjust <- -0.6
    size <- 5
    if(shorten) {
      vjust <- -0.3
      size <- 4
    }
    plot <- plot +
      geom_hline(yintercept = 1.0e-3) +
      geom_text(
        aes(
          1.0e+6,
          1.0e-3,
          label = as.character(expression(epsilon == 10^{-3})),
          vjust = vjust,
          hjust = 0
        ),
        parse = TRUE,
        color = "black",
        family = font,
        size = size
      )
  }

  plot <- plot +
    scale_x_continuous(
      "# Unknowns (\U1D441)",
      breaks = dataframe_best$nbpts,
      labels = label_nbpts_ticks
    )

  if(check_epsilon) {
    plot <- plot +
      scale_y_continuous(
        expression(paste("Relative error (", E[rel], ")")),
        limits = c(min(dataframe_best$error), 2.0e-3),
        labels = scientific,
        breaks = c(1e-16, 1e-13, 1e-10, 1e-6, 1e-3),
        trans = "log10"
      )
  } else {
    plot <- plot +
      scale_y_continuous(
        "Factorization time",
        labels = label_time_short,
        trans = "log10",
        limits = c(NA, NA)
      )
  }

Finally, we set the legend title, apply the default theme and return the plot object. The color, shape and line type breaks are automatically determined based on the values of the coupled_method and solver columns.

  plot <- plot +
    labs(
      shape = "Solvers",
      linetype = "Solvers",
      color = "Implementation\nscheme"
    )
  if(shorten) {
    plot <- plot +
      generate_theme(
        color_breaks = unique(na.omit(dataframe_best$coupled_method)),
        color_labels = label_coupling_short,
        color_override_aes = list(linetype = 0, shape = 15, size = 6),
        shape_breaks = unique(na.omit(dataframe_best$solver)),
        shape_labels = label_solver,
        linetype_breaks = unique(na.omit(dataframe_best$solver)),
        linetype_labels = label_solver,
        legend_rows = legend_rows,
        legend_box = legend_box,
        legend_title = element_blank(),
        legend_text = element_text(family = font, size = fontsize - 2),
        theme_text = element_text(family = font, size = fontsize)
      )
  } else {
    plot <- plot +
      generate_theme(
        color_breaks = unique(na.omit(dataframe_best$coupled_method)),
        color_labels = label_coupling,
        color_override_aes = list(linetype = 0, shape = 15, size = 8),
        shape_breaks = unique(na.omit(dataframe_best$solver)),
        shape_labels = label_solver,
        linetype_breaks = unique(na.omit(dataframe_best$solver)),
        linetype_labels = label_solver,
        legend_rows = legend_rows,
        legend_box = legend_box,
        legend_text = element_text(family = font, size = fontsize - 2),
        theme_text = element_text(family = font, size = fontsize)
      )
  }

  return(plot)
}

8.6.8. `compare_rss_coupled`

This function returns a plot of the real memory (RAM) usage peaks for the best factorization times relatively to linear system's unknown count for each implementation scheme for solving coupled systems.

We begin by restraining the input data frame to the best results of two-stage schemes and the results of the single-stage scheme implemented using HMAT.

Note that as the latter only implies the use of one single solver, we need to specify the coupling method name manually to enable the data frame treatment.

compare_rss_coupled <- function(dataframe) {
  dataframe_best <- dataframe
  dataframe_best$coupled_method <- as.character(dataframe_best$coupled_method)
  dataframe_best$coupled_method[dataframe_best$solver == "hmat/hmat"] <-
  "full-hmat"
  dataframe_best <- merge(
    aggregate(
      tps_facto ~ coupled_method + nbpts + solver,
      min,
      data = dataframe_best
    ),
    dataframe_best
  )

Now, we can begin to define the plot object. The column names featured in this plotting function are:

nbpts linear system's unknown count,
tps_facto represents factorization time in seconds,
rm_peak real memory usage peaks, stored in mibibytes (MiB) but converted to gibibytes (GiB),
coupled_method gives the name of the approach used to solve coupled FEM/BEM systems (e. g. multi-solve, etc.),
solver contains the names of solvers featured in the coupling.

  plot <- ggplot(
    dataframe_best,
    aes(
      x = nbpts,
      y = rm_peak / 1024.,
      color = coupled_method,
      shape = solver,
      linetype = solver
    )
  ) +
  geom_line() +
  geom_point(size = 2.5) +

The X-axis shows the count of unknowns in the linear system and the Y-axis shows the RAM usage peaks.

  scale_x_continuous(
    "# Unknowns (\U1D441)",
    breaks = dataframe_best$nbpts,
    labels = scientific
  ) +
  scale_y_continuous(
    "Real memory (RAM) usage peak [GiB]",
    labels = function(label) sprintf("%.0f", label),
    limits = c(NA, 126),
    breaks = c(0, 30, 60, 90, 120)
  ) +

  labs(
    shape = "Solvers",
    linetype = "Solvers",
    color = "Implementation\nscheme"
  ) +
  generate_theme(
    color_breaks = length(unique(na.omit(dataframe_best$coupled_method))),
    color_labels = label_coupling,
    shape_breaks = length(unique(na.omit(dataframe_best$solver))),
    shape_labels = label_solver,
    linetype_breaks = length(unique(na.omit(dataframe_best$solver))),
    linetype_labels = label_solver,
    legend_rows = 3,
    legend_box = "horizontal"
  )

  return(plot)
}

8.6.9. `accuracy_by_nbpts`

This function returns a plot of relative solution error with respect to linear system's unknown count.

The solver_config option allows one to distinguish between different solver configurations on the figure instead of different precision parameter $\epsilon$ values.

The column names featured in this plotting function are:

nbpts linear system's unknown count,
solver_config gives the solver configuration used,
error relative forward error of the solution approximation.

accuracy_by_nbpts <- function(dataframe, solver_config = FALSE) {
  if(solver_config) {
    plot <- ggplot(
      dataframe,
      aes(
        x = nbpts,
        y = error,
        color = as.character(solver_config),
        shape = solver,
        linetype = solver
      )
    )
  } else {
    plot <- ggplot(
      dataframe,
      aes(
        x = nbpts,
        y = error,
        color = as.character(desired_accuracy),
        shape = solver,
        linetype = solver
      )
    )
  }
  plot <- plot +
    geom_line() +
    geom_point(size = 2.5) +

We draw a horizontal line representing the maximal admitted value of $E_{rel}$ which is 10^-3.

    geom_hline(yintercept = 1.0e-3) +
    geom_text(
      aes(
        min(dataframe$nbpts),
        1.0e-3,
        label = as.character(expression(epsilon == 10^{-3})),
        vjust = -0.6,
        hjust = 0
      ),
      parse = TRUE,
      family = plot_font,
      color = "black",
      size = 5
    ) +

For the Y-axis, we define the tick values manually for better comparison to the associated relative error values.

    scale_x_continuous(
      "# Unknowns (\U1D441)",
      trans = "log10",
      breaks = dataframe$nbpts,
      labels = label_nbpts_ticks
    ) +
    scale_y_continuous(
      expression(paste("Relative error (", E[rel], ")")),
      limits = c(min(dataframe$error), 1e-1),
      breaks = c(1e-16, 1e-13, 1e-10, 1e-6, 1e-3),
      labels = scientific,
      trans = "log10"
    )

Finally, we set the legend labels, apply the common theme parameters and return the plot object.

  if(solver_config) {
    plot <- plot +
      labs(color = "Configuration", shape = "Solver", linetype = "Solver") +
      generate_theme(
        color_labels = label_solver_config,
        color_override_aes = list(linetype = 0, shape = 15, size = 8),
        shape_labels = label_solver,
        linetype_labels = label_solver,
        legend_rows = length(unique(na.omit(dataframe$solver_config))) / 2
      )    
  } else {
    plot <- plot +
      labs(color = "\U1D700", shape = "Solver", linetype = "Solver") +
      generate_theme(
        color_labels = label_epsilon,
        color_override_aes = list(linetype = 0, shape = 15, size = 8),
        shape_labels = label_solver,
        linetype_labels = label_solver
      )
  }

  return(plot)
}

8.6.10. `rss_peaks_by_nbpts`

This function returns a plot of real memory (RAM) and disk space usage peaks relatively to linear system's unknown count. The function can take five extra arguments, limit_max, tick_values, solver_config, trans_x and trans_y.

The first two of them can be used to redefine the default Y-axis maximum and tick values respectively. The solver_config option allows one to distinguish between different solver configurations on the figure. By default, the trans_x argument sets the X-axis scale to logarithmic. To use standard scale, the value identity should be used.

The column names featured in this plotting function are:

nbpts linear system's unknown count,
solver_config gives the solver configuration used,
rm_peak real memory usage peaks, stored in mibibytes (MiB) but converted to gibibytes (GiB),
hdd_peak disk space usage peaks, stored in MiB but converted to GiB.

At first, we convert the data frame from wide to long format.

rss_peaks_by_nbpts <- function(dataframe, limit_max = 126,
                               tick_values = c(0, 30, 60, 90, 120),
                               solver_config = FALSE, trans_x = "log10") {
  dataframe_long <- gather(
    dataframe,
    key = "memory_type",
    value = "memory_usage",
    c("rm_peak", "hdd_peak")
  )

Also, we fix the order of the values in memory_type so that real memory (RAM) usage peaks come before disk usage peaks in the facet grid.

dataframe_long$memory_type <- factor(
  dataframe_long$memory_type,
  c('rm_peak', 'hdd_peak')
)

Then, we configure the plot object itself as described above and handle different levels of expected relative solution error in case of solvers using data compression by adding the shape aesthetics. If solver_config is TRUE, we want to distinguish between different solver configurations instead.

  if(solver_config) {
    plot <- ggplot(
      dataframe_long,
      aes(
        x = nbpts,
        y = memory_usage / 1024.,
        color = as.character(solver_config),
        shape = solver,
        linetype = solver
      )
    )
  } else {
    plot <- ggplot(
      dataframe_long,
      aes(
        x = nbpts,
        y = memory_usage / 1024.,
        color = as.character(desired_accuracy),
        shape = solver,
        linetype = solver
      )
    )
  }

  plot <- plot +
    geom_line() +
    geom_point(size = 2.5) +

The limit_max argument is used to plot a horizontal line indicating physical memory limit.

    geom_hline(
      data = subset(dataframe_long, memory_type == "rm_peak"),
      aes(yintercept = limit_max)
    ) +
    geom_text(
      data = subset(dataframe_long, memory_type == "rm_peak"),
      aes(
        min(dataframe_long$nbpts), limit_max,
        label = "Physical memory limit", vjust = 2, hjust = 0
      ),
      color = "black",
      family = plot_font,
      size = 5
    ) +

The X-axis shows the count of unknowns in the linear system and the Y-axis shows the storage usage peaks. On the Y-axis, we round the values to 0 decimal places.

  scale_x_continuous(
    "# Unknowns (\U1D441)",
    trans = trans_x,
    breaks = dataframe_long$nbpts,
    labels = label_nbpts_ticks
  ) +
  scale_y_continuous(
    "Storage usage peak [GiB]",
    labels = function(label) sprintf("%.0f", label),
    limits = c(NA, NA),
    breaks = tick_values
  ) +

Storage types are distinguished using a facet grid.

  facet_grid(
    . ~ memory_type,
    labeller = labeller(memory_type = label_storage)
  )

We end by setting legend titles, applying the common theme parameters and returning the plot object.

  if(solver_config) {
    plot <- plot +
      labs(color = "Configuration", shape = "Solver", linetype = "Solver") +
      generate_theme(
        color_labels = label_solver_config,
        color_override_aes = list(linetype = 0, shape = 15, size = 8),
        shape_labels = label_solver,
        linetype_labels = label_solver,
        legend_rows = length(unique(na.omit(dataframe_long$solver_config))) / 2
      )
  } else {
    plot <- plot +
      labs(color = "\U1D700", shape = "Solver", linetype = "Solver") +
      generate_theme(
        color_labels = label_epsilon,
        color_override_aes = list(linetype = 0, shape = 15, size = 8),
        shape_labels = label_solver,
        linetype_labels = label_solver
      )
  }

  return(plot)
}

8.6.11. `rss_by_time`

This function returns a plot of real memory (RAM) usage relatively to the execution time. The dataframe argument represents the data frame containing RAM consumption data read using the read_rss function (see Section 8.2.3.1). Moreover, using limit_max and tick_values one can redefine the default Y-axis maximum and tick values respectively to improve readibility of the plot. The timeline argument is useful when dataframe contains coupled FEM/BEM benchmark results. timeline represents an additional data frame that can be used to highlight selected computation phases and estimated reference RAM usage peaks of a sparse FEM factorization and Schur matrix size in the plot (see Section 8.2.3.4).

The column names featured in this plot function are:

time timestamp in seconds giving the time elapsed since the beginning of the execution,
rss global real memory usage in mibibytes (MiB) but converted to gibibytes (GiB).

We begin directly by defining the plot object.

rss_by_time <- function(dataframe, limit_max = 126,
                        tick_values = c(0, 30, 60, 90, 120),
                        timeline = NA, additional = FALSE) {
  is_coupled <- FALSE
  fill_legend <- "Solver"
  color_labeller <- label_solver
  if("coupled_method" %in% colnames(dataframe)) {
    is_coupled <- TRUE
    fill_legend <- "Implementation scheme"
    color_labeller <- label_coupling
  }

  if(is_coupled) {
    plot <- ggplot(
      dataframe,
      aes(
        x = timestamp / 1000,
        y = consumption,
        fill = coupled_method
      )
    )
  } else {
    plot <- ggplot(
      dataframe,
      aes(
        x = timestamp / 1000,
        y = consumption,
        fill = solver
      )
    )
  }

We draw the RAM usage as an area chart.

  plot <- plot +
    geom_area() +

The limit_max argument is used to plot a horizontal line indicating physical memory limit.

    geom_hline(
      aes(yintercept = limit_max)
    ) +
    geom_text(
      aes(0, limit_max, label = "Physical memory limit", vjust = 2, hjust = 0),
      color = "black",
      family = plot_font,
      size = 5
    )

In the case the extra timeline argument is present, we draw additional labelled vertical lines corresponding to selected computation phases using the geom_vline and geom_text layers. Note that for the geom_text layer we also need to adjust the text position and rotation. To enable the processing of the labels as math expressions, we use the parse parameter.

  if(is.data.frame(timeline)) {
    timeline$solver <- rep(unique(na.omit(dataframe$solver)), nrow(timeline))
    plot <- plot +
      geom_vline(
        data = timeline,
        aes(xintercept = time),
        linetype = "dotted",
        size = 0.4
      ) +
      geom_text(
        data = timeline,
        aes(
          x = middle,
          y = 0.5 * limit_max,
          label = label
        ),
        parse = T,
        angle = 90,
        vjust = "center",
        family = plot_font,
        size = 5
      )

Then, if additional is TRUE, we draw also the horizontal segments corresponding to estimated RAM usage peaks of a sparse FEM factorization and estimated Schur matrix size.

However, at first, we have to convert the timeline data frame to the long format (see Section 8.2). This allows us to use different sizes to distinguish different segments based on the kind of value they represent.

    if(additional) {
      columns <- c("assembly_estimation", "peak_symmetric_factorization",
                   "peak_non_symmetric_factorization", "schur_estimation")
      timeline_long <- gather(
        timeline,
        key = "peak_kind",
        value = "peak_value",
        columns
      )
      plot <- plot +
        geom_spoke(
          data = timeline_long,
          aes(
            x = middle - 0.5 * (time - middle),
            y = peak_value,
            linetype = peak_kind,
            radius = time - middle
          ),
          angle = 2 * pi
        )
    }
  }

Next, we set up the axis and their scaling. We round the Y-axis values to two decimal places.

  if(is.data.frame(timeline)) {
    plot <- plot +
      scale_x_continuous(
        "Execution time", labels = label_time,
        breaks = c(0, na.omit(timeline$time), max(dataframe$timestamp) / 1000)
      )
  } else {
    plot <- plot +
      scale_x_continuous("Execution time", labels = label_time)
  }

  plot <- plot +
    scale_y_continuous(
      "Memory (RAM) usage [GiB]",
      labels = function(label) sprintf("%.0f", label),
      limits = c(NA, NA),
      breaks = tick_values
    )

We end by configuring the visual aspect of the plot.

  if(is.data.frame(timeline) && additional) {
    plot <- plot +
      labs(
        fill = fill_legend,
        linetype = "Estimated peak usage"
      ) +
      generate_theme(
        color_labels = color_labeller,
        linetype_labels = label_peak_kind,
        legend_rows_linetype = length(columns)
      )
  } else {
    plot <- plot +
      labs(fill = fill_legend) +
      generate_theme(color_labels = color_labeller)
  }

  return(plot)
}

8.6.12. `eprofile`

This function returns a plot of power consumtion relatively to execution time. The first and only argument es_data of the function corresponds to the data frame containing power consumption data as well as tag information, if any (see Section 8.2.3.3).

eprofile <- function(es_data, zoom_on = c(NA, NA), TDP = NA, rss = TRUE,
                     flops = TRUE, bw = FALSE) {
  zoomed <- FALSE
  solvers <- unique(na.omit(es_data$solver))
  es <- es_data

  if(!is.na(zoom_on[1]) && !is.na(zoom_on[2])) {
    zoomed <- TRUE
    es <- subset(
      es, timestamp >= zoom_on[1] * 1000 & timestamp <= zoom_on[2] * 1000
    )
  }
  
  edata <- subset(es, kind_general == "e")
  
  power <- ggplot(
    data = subset(edata, is.na(type))
  )
  
  sdata <- NA
  storage <- NA

  if(rss) {
    sdata <- subset(es, kind_general == "s")
    sdata$kind <- factor(sdata$kind, levels = c("sram", "shdd"))
    
    storage <- ggplot(
      data = subset(sdata, is.na(type))
    )
  }

  fdata <- NA

  if(flops) {
    fdata <- subset(es, kind_general == "flops")
    fdata$kind <- factor(fdata$kind, levels = c("flops"))
  }

  bwdata <- NA

  if(bw) {
    bwdata <- subset(es, kind_general == "bw")
    bwdata$kind <- factor(bwdata$kind, levels = c("bw"))
  }

At first, we draw the power consumption curve. The tags are represented by colored points surrounded with black stroke and annotated with labels. Note that the latter are stored as math expressions and need to be parsed by the geom_label_repel function.

  if(flops) {
    power <- power +
      geom_line(
        data = fdata,
        aes(x = timestamp, y = consumption / 1000, color = kind)
      )
  }

  if(bw) {
    power <- power +
      geom_line(
        data = bwdata,
        aes(x = timestamp, y = consumption / 1024, color = kind)
      )
  }

  power <- power +
    geom_line(aes(x = timestamp, y = consumption, color = kind)) +
    geom_point(
      data = subset(edata, !is.na(type) & kind == "ecpu"),
      aes(x = timestamp, y = consumption, shape = tag),
      color = "black",
      fill = "white",
      size = 3
    ) +
    geom_label_repel(
      data = subset(edata, !is.na(type) & kind == "ecpu"),
      aes(x = timestamp, y = consumption, label = type_label),
      force_pull = 0,
      segment.color = "black",
      min.segment.length = 0,
      max.overlaps = 20,
      family = plot_font,
      parse = TRUE,
      show.legend = FALSE
    )

  if(rss) {
    storage <- storage +
      geom_line(aes(x = timestamp, y = consumption, color = kind)) +
      geom_point(
        data = subset(sdata, !is.na(type)),
        aes(x = timestamp, y = consumption, shape = tag),
        color = "black",
        fill = "white",
        size = 3
      ) +
      geom_label_repel(
        data = subset(sdata, !is.na(type)),
        aes(x = timestamp, y = consumption, label = type_label),
        force_pull = 0,
        segment.color = "black",
        min.segment.length = 0,
        max.overlaps = 20,
        family = plot_font,
        parse = TRUE,
        show.legend = FALSE
      )
  }

If the TDP parameter is provided, we add a horizontal line matching its value.

  if(!is.na(TDP)) {
    power <- power +
      geom_hline(yintercept = TDP) +
      geom_text(
        aes(
          0,
          TDP,
          label = paste("TDP:", as.character(TDP), "W"),
          vjust = 2,
          hjust = 0
        ),
        color = "black",
        family = plot_font,
        size = 5
      )
  }

Next, we set up the axis and their scaling. We round the Y-axis values to two decimal places.

  power <- power +
    scale_x_continuous(
      "Execution time [s]",
      labels = function(label) sprintf("%d", as.integer(label / 1000)),
      limits = c(NA, NA),
      breaks = breaks_extended(n = 10)
    )

  if(flops) {
    coeff <- 0.2

    if(zoomed) {
      coeff <- 1.1
    }
    
    power <- power +
      scale_y_continuous(
        "Power consumption [W]",
        labels = function(label) sprintf("%.0f", label),
        breaks = breaks_width(
          as.integer(0.25 * max(edata$consumption))
        ),
        sec.axis = sec_axis(
          ~ . * 1, name = "Flop rate [Gflop/s]",
          labels = function(label) sprintf("%.0f", label),
          breaks = breaks_width(
            as.integer(coeff * (max(fdata$consumption) / 1000))
          )
        )
      )
  } else {
    power <- power +
      scale_y_continuous(
        "Power consumption [W]",
        labels = function(label) sprintf("%.0f", label),
        breaks = breaks_width(
          as.integer(0.25 * max(edata$consumption))
        )
      )
  }

  if(rss) {
    y_axis_title <- ifelse(
      length(unique(na.omit(sdata$kind))) < 2,
      "RAM usage [GiB]",
      "Storage usage [GiB]"
    )
    storage <- storage +
      scale_x_continuous(
        "Execution time [s]",
        labels = function(label) sprintf("%d", as.integer(label / 1000)),
        limits = c(NA, NA),
        breaks = breaks_extended(n = 10)
      ) +
      scale_y_continuous(
        y_axis_title,
        labels = function(label) sprintf("%.0f", label),
        breaks = breaks_width(
          as.integer(0.2 * max(sdata$consumption))
        )
      )
  }

We end by configuring the visual aspect of the plot. We use a facet grid to draw a separate plot for each solver coupling.

  if("coupled_method" %in% colnames(es_data)) {
    power <- power +
      facet_grid(
        . ~ coupled_method + solver,
        labeller = labeller(
          coupled_method = label_coupling,
          solver = label_solver
        ),
        scales = "free_x",
        switch = "y"
      )
  } else {
    power <- power +
      facet_grid(
        . ~ solver,
        labeller = labeller(
          solver = label_solver
        ),
        scales = "free_x",
        switch = "y"
      )
  }

  if(rss) {
    storage <- storage +
      facet_grid(
        . ~ solver,
        labeller = labeller(
          solver = label_solver
        ),
        scales = "free_x",
        switch = "y"
      ) +
      labs(
        shape = "Computation phase"
      )
  }

  power <- power +
    generate_theme(
      color_labels = label_kind,
      color_override_aes = list(size = 8),
      shape_labels = label_tag,
      legend_rows_shape = 1,
      legend_box = "horizontal"
    )

  if(rss) {
    power <- power +
      labs(
        color = "Device"
      ) +
      guides(
        fill = "none", shape = "none", linetype = "none"
      ) +
      theme(
        axis.title.x = element_blank()
      )
    
    storage <- storage +
      generate_theme(
        shape_labels = label_tag,
        legend_rows_shape = 1,
        legend_box = "horizontal"
      ) +
      guides(color = "none", fill = "none") +
      theme(
        strip.text.x = element_blank(), strip.background.x = element_blank()
      )

    return(
      plot_grid(
        power, storage, nrow = 2, ncol = 1, align = "v", axis = "b",
        rel_heights = c(1.1, 1)
      )
    )
  }

  power <- power +
    labs(
      color = "Device",
      shape = "Computation phase"
    ) +
    guides(
      fill = "none"
    )

  return(power)
}

8.6.13. `es_comparison`

es_comparison <- function(dataframe) {
  extended <- dataframe
  extended$etotal <- as.numeric(0)

  for(i in 1:nrow(extended)) {
    gcvb_retrieve_files(extended[i, ])
    eprofile <- fromJSON(file = "./eprofile.txt")
    extended[i, "etotal"] <- as.numeric(
      eprofile$data$data$tags$es_appli_total$`joule(J)`
    )
    extended[i, "es_appli_total"] <- as.numeric(
      eprofile$data$data$tags$es_appli_total$`stop(sec)`
    ) - as.numeric(
      eprofile$data$data$tags$es_appli_total$`start(sec)`
    )
  }

  extended$solver <- ifelse(
    extended$solver == "mumps" & extended$error < 10e-12,
    "mumps-blr", extended$solver
  )

  postamble <- generate_theme(
    color_labels = label_solver,
    legend_rows = 2,
    legend_title = element_blank()
  )

  energy <- ggplot(
    data = extended,
    aes(
      x = nbpts,
      y = etotal,
      fill = solver
    )
  ) +
    geom_bar(position = "dodge", stat = "identity") +
    scale_x_continuous(
      "# Unknowns (\U1D441)",
      breaks = extended$nbpts,
      labels = scientific
    ) +
    scale_y_continuous(
      "Energy consumption [J]",
      labels = scientific
    ) +
    postamble +
    guides(fill = "none") +
    theme(axis.title.x = element_blank())

  time <- ggplot(
    data = extended,
    aes(
      x = nbpts,
      y = es_appli_total,
      fill = solver
    )
  ) +
    geom_bar(position = "dodge", stat = "identity") +
    scale_x_continuous(
      "# Unknowns (\U1D441)",
      breaks = extended$nbpts,
      labels = scientific
    ) +
    scale_y_continuous(
      "Computation time [s]"
    ) +
    postamble

  rss <- ggplot(
    data = extended,
    aes(
      x = nbpts,
      y = rm_peak / 1024.,
      fill = solver
    )
  ) +
    geom_bar(position = "dodge", stat = "identity") +
    scale_x_continuous(
      "# Unknowns (\U1D441)",
      breaks = extended$nbpts,
      labels = scientific
    ) +
    scale_y_continuous(
      "RAM usage peak [GiB]"
    ) +
    postamble +
    guides(fill = "none") +
    theme(axis.title.x = element_blank())

  return(
    plot_grid(
      energy, time, rss, nrow = 1, ncol = 3, align = "h", axis = "b",
      rel_widths = c(1, 1.15, 1)
    )
  )
}

8.6.14. `scalability`

This function returns a plot of factorization and solve computation times relatively to available processor core count.

The column names featured in this plotting function are:

p_units giving the count of cores,
tps_facto represents factorization time in seconds,
tps_solve represents solve time in seconds,
mapping provides the mapping configuration of MPI processes.

Then, data frame is converted to long format (see Section 8.2) so we are able to combine times of both phases using a facet grid. Also, we force the order of facet labels for the solver column by refactoring the latter.

scalability <- function(dataframe) {
  dataframe_long <- gather(
    subset(dataframe, !is.na(dataframe$nbpts)),
    key = "computation_phase",
    value = "time",
    c("tps_facto", "tps_solve")
  )

  dataframe_long$solver <- factor(
    dataframe_long$solver,
    levels = unique(na.omit(dataframe_long$solver))
  )

  plot <- ggplot(
    dataframe_long,
    aes(
      x = p_units,
      y = time,
      group = mapping,
      color = mapping
    )
  ) +
  geom_line() +
  geom_point(size = 2.5, shape = 16) +

As there may be a considerable scale difference between factorization and solve time values, we use log₁₀ scale on the Y-axis to be able to combine both metrics on the same axis without loosing on readability.

  scale_x_continuous(
    "# Cores (\U1D450)",
    breaks = (dataframe_long$p_units)
  ) +
  scale_y_continuous(
    "Computation time",
    labels = label_time_short,
    trans = "log10"
  ) +

Finally, using facet grid we distinguish between factorization and solve times on horizontal and between each solver considered on vertical.

For the sake of readability, we specify that scales on Y-axis can be different. It is useful when multiple solvers are considered having each considerably different time values.

Also, before returning the plot we apply the common theme parameters. For scalability and for parallel efficiency plots, we split the legend items to two lines as the labels are too long.

  facet_grid(
    computation_phase ~ solver + nbpts,
    labeller = labeller(
      solver = label_solver,
      computation_phase = phase.labs,
      nbpts = label_nbpts
    ),
    scales = "free"
  ) +
  labs(color = "Parallel\nconfiguration") +
  generate_theme(
    color_breaks = c("node", "socket", "numa", "core"),
    color_labels = label_mapping,
    legend_rows = 4
  )

  return(plot)
}

8.6.15. `scalability_ram`

This function is a variation of scalability (see Section 8.6.14) and returns a plot of real memory (RAM) usage peaks relatively to available processor core count.

The column names featured in this plotting function are:

p_units giving the count of cores,
rm_peak represents real memory usage peaks, stored in mibibytes (MiB) but converted to gibibytes (GiB),
mapping provides the mapping configuration of MPI processes.

scalability_ram <- function(dataframe) {
  plot <- ggplot(
    dataframe,
    aes(
      x = p_units,
      y = rm_peak / 1024.,
      group = mapping,
      color = mapping
    )
  ) +
  geom_line() +
  geom_point(size = 2.5, shape = 16) +

Note that we round the values on the Y-axis to two decimal places and define custom axis limits and breaks adapted for the current type of plot.

  scale_x_continuous(
    "# Cores (\U1D450)",
    breaks = (dataframe$p_units)
  ) +
  scale_y_continuous(
    "Real memory (RAM) usage peaks [GiB]",
    labels = function(label) sprintf("%.0f", label),
    limits = c(NA, 126),
    breaks = c(0, 30, 60, 90, 120)
  ) +

In this case, we use using facet grid exclusively as a mean to better annotate the resulting plot.

  facet_grid(
    . ~ solver + nbpts,
    labeller = labeller(
      solver = label_solver,
      nbpts = label_nbpts
    )
  ) +
  labs(color = "Parallel\nconfiguration") +
  generate_theme(
    color_breaks = c("node", "socket", "numa", "core"),
    color_labels = label_mapping,
    legend_rows = 4
  )

  return(plot)
}

8.6.16. `scalability_distributed`

This function returns a plot of factorization and solve computation times relatively to available processor core count.

The column names featured in this plotting function are:

p_units giving the count of cores,
tps_facto represents factorization time in seconds,
tps_solve represents solve time in seconds,
mapping provides the mapping configuration of MPI processes.

scalability_distributed <- function(dataframe) {
  dataframe$solver <- factor(
    dataframe$solver,
    levels = unique(na.omit(dataframe$solver))
  )

  color_lab <- NA

  if(unique(na.omit(dataframe$coupled_method)) == "multi-solve") {
    dataframe$block_size <- ifelse(
      dataframe$solver != "mumps/spido",
      paste0(
        "(", dataframe$coupled_nbrhs, ", ", dataframe$cols_schur, ")"
      ),
      paste0(
        "(", dataframe$coupled_nbrhs, ", ", dataframe$coupled_nbrhs, ")"
      )
    )
    color_lab <- expression(
      bold(group("(", list(n[italic(c)], n[italic(S)]), ")"))
    )
  } else {
    dataframe$block_size <- factor(
      dataframe$n_blocks,
      levels = sort(unique(na.omit(dataframe$n_blocks)))
    )
    color_lab <- expression(
      bold(n[italic(b)])
    )
  }

  plot <- ggplot(
    dataframe,
    aes(
      x = processes,
      y = tps_facto + tps_solve,
      group = block_size,
      color = block_size
    )
  ) +
  geom_line() +
  geom_point(size = 2.5, shape = 16) +

  scale_x_continuous(
    "# MPI processes",
    breaks = dataframe$processes
  ) +
  scale_y_continuous(
    "Computation time",
    labels = label_time_short,
    trans = "log10"
  ) +

Finally, using facet grid we distinguish between factorization and solve times on horizontal and between each solver considered on vertical.

For the sake of readability, we specify that scales on Y-axis can be different. It is useful when multiple solvers are considered having each considerably different time values.

Also, before returning the plot we apply the common theme parameters. For scalability and for parallel efficiency plots, we split the legend items to two lines as the labels are too long.

  facet_grid(
    solver ~ nbpts,
    labeller = labeller(
      solver = label_solver,
      nbpts = label_nbpts
    ),
    scales = "free"
  ) +
  labs(
    color = color_lab
  ) +
  generate_theme(
    color_breaks = as.character(sort(unique(na.omit(dataframe$block_size))))
  )

  return(plot)
}

8.6.17. `efficiency`

This function returns a plot of parallel efficiency, i.e. the percentage of computational resource usage relatively to available processor core count.

The column names featured in this plotting function are:

p_units giving the count of cores,
efficiency_facto represents factorization phase efficiency,
efficiency_solve represents solve phase efficiency,
mapping provides the mapping configuration of MPI processes.

The function itself follows the exact same logic as the scalability function (see Section 8.6.14).

efficiency <- function(dataframe) {
  dataframe_long <- gather(
    subset(dataframe, !is.na(dataframe$nbpts)),
    key = "computation_phase",
    value = "efficiency",
    c("efficiency_facto", "efficiency_solve")
  )

  dataframe_long$solver <- factor(
    dataframe_long$solver,
    levels = unique(na.omit(dataframe_long$solver))
  )

  plot <- ggplot(
    dataframe_long,
    aes(
      x = p_units,
      y = efficiency,
      group = mapping,
      color = mapping
    )
  ) +
  geom_line() +
  geom_point(size = 2.5, shape = 16) +
  geom_hline(yintercept = 1) +
  scale_x_continuous(
    "# Cores (\U1D450)",
    breaks = (dataframe_long$p_units)
  ) +

There is only one additional step, i. e. to convert efficiency values from $x$ such as $0 \leq x \leq 1$ to percentages using the label_percentage label function (see Section 8.5.1). Also, we fix our own Y-axis tick values (breaks).

  scale_y_continuous(
    "Parallel efficiency [%]",
    breaks = c(0.0, 0.2, 0.4, 0.6, 0.8, 1.0),
    labels = label_percentage
  ) +
  facet_grid(
    computation_phase ~ solver + nbpts,
    labeller = labeller(
      solver = label_solver,
      computation_phase = phase.labs,
      nbpts = label_nbpts
    )
  ) +
  labs(color = "Parallel\nconfiguration") +
  generate_theme(
    color_breaks = c("node", "socket", "numa", "core"),
    color_labels = label_mapping,
    legend_rows = 4
  )

  return(plot)
}

8.6.18. `efficiency_distributed`

This function returns a plot of parallel efficiency, i.e. the percentage of computational resource usage relatively to available processor core count.

The column names featured in this plotting function are:

p_units giving the count of cores,
efficiency_total represents factorization and solve phase efficiency,
mapping provides the mapping configuration of MPI processes.

The function itself follows the exact same logic as the scalability_distributed function (see Section 8.6.16).

efficiency_distributed <- function(dataframe) {
  dataframe$solver <- factor(
    dataframe$solver,
    levels = unique(na.omit(dataframe$solver))
  )

  color_lab <- NA

  if(unique(na.omit(dataframe$coupled_method)) == "multi-solve") {
    dataframe$block_size <- ifelse(
      dataframe$solver != "mumps/spido",
      paste0(
        "(", dataframe$coupled_nbrhs, ", ", dataframe$cols_schur, ")"
      ),
      paste0(
        "(", dataframe$coupled_nbrhs, ", ", dataframe$coupled_nbrhs, ")"
      )
    )
  } else {
    dataframe$block_size <- factor(
      dataframe$n_blocks,
      levels = sort(unique(na.omit(dataframe$n_blocks)))
    )
    color_lab <- expression(
      bold(n[italic(b)])
    )
  }

  print(colnames(dataframe))

  plot <- ggplot(
    dataframe,
    aes(
      x = processes,
      y = efficiency_total,
      group = block_size,
      color = block_size
    )
  ) +
  geom_line() +
  geom_point(size = 2.5, shape = 16) +
  geom_hline(yintercept = 1) +
  scale_x_continuous(
    "# MPI processes",
    breaks = dataframe$processes
  )

  y_ticks <- seq(
    from = 0.0,
    to = max(na.omit(dataframe$efficiency_total)),
    by = 0.2
  )
  plot <- plot +
    scale_y_continuous(
      "Parallel efficiency [%]",
      breaks = y_ticks,
      labels = label_percentage
    ) +

Finally, using facet grid we distinguish between factorization and solve times on horizontal and between each solver considered on vertical.

For the sake of readability, we specify that scales on Y-axis can be different. It is useful when multiple solvers are considered having each considerably different time values.

Also, before returning the plot we apply the common theme parameters. For scalability and for parallel efficiency plots, we split the legend items to two lines as the labels are too long.

    facet_grid(
      solver ~ nbpts,
      labeller = labeller(
        solver = label_solver,
        nbpts = label_nbpts
      ),
      scales = "free"
    ) +
      labs(
        color = color_lab
      ) +
      generate_theme(
        color_breaks = as.character(sort(unique(na.omit(dataframe$block_size))))
      )

  return(plot)
}

9. Post-processing StarPU execution traces

To visualize StarPU execution traces in FXT format, we also make use of R, the ggplot2 package and of the dedicated StarVZ library starvz16,starvz18.

9.1. StarVZ configuration

To configure the output plot produced by StarVZ, we use a Yaml configuration file. We leave cosmetic options take their default values, except the base font size (see base_size value), and we override those specifying what kind of information shall be included in resulting plots. In our case, we want to plot only the StarPU worker statistics (items st and starpu).

default:
  base_size: 14
  starpu:
    active: TRUE
    legend: TRUE
  ready:
    active: FALSE
  submitted:
    active: FALSE
  st:
    active: TRUE

Showing critical path time bound is not necessary.

    cpb: FALSE

On the other hand, we want to display all the labels, the legend and the application time span.

    labels: "ALL"
    legend: TRUE
    makespan: TRUE

To reduce the size of the resulting plot, we enable the aggregation of visualized tasks in a time step.

    aggregation:
      active: TRUE
      method: "static"
      step: 500

9.2. Plotting FXT traces

At first, we have to determine the path to the latest gcvb benchmark session directory in order to be able to read the FXT trace files.

output = try(
  system("ls -t ~/benchmarks/results | head -n 1", intern = TRUE)
)

latest_session_path = paste(
  "~/benchmarks/results",
  output,
  sep = "/"
)

Then, we import the required R libraries and set the starvz_config variable holding the path to the Yaml configuration file of the StarVZ package (see Section 9.1).

library(ggplot2)
library(starvz)
library(grid)

starvz_config = "starvz.yaml"

Bibliography

[guix] @miscguix, title = GNU Guix software distribution and transactional package manager, howpublished = \urlhttps://guix.gnu.org
[Knuth84] Knuth, Literate Programming, Comput. J., 27(2), 97–111 (1984). link. doi.
[Dominik18] Dominik, The Org Mode 9.1 Reference Manual, 12th Media Services (2018).
[emacs] @miscemacs, title = GNU Emacs: An extensible, customizable, free/libre text editor — and more. , howpublished = \urlhttps://www.gnu.org/software/emacs/
[OrgTangle] @miscOrgTangle, title = Org mode documentation (Extracting source code), howpublished = \urlhttps://orgmode.org/manual/Extracting-Source-Code.html
[OrgExport] @miscOrgExport, title = Org mode documentation (Exporting), howpublished = \urlhttps://orgmode.org/manual/Exporting.html
[gcvb] @miscgcvb, title = (g)enerate (c)ompute (v)alidate (b)enchmark, a Python 3 module aiming at facilitating non-regression, validation and benchmarking of simulation codes , howpublished = \urlhttps://github.com/felsocim/gcvb
[proc5] @miscproc5, title = proc(5) - Linux manual page, howpublished = \urlhttps://man7.org/linux/man-pages/man5/proc.5.html
[ESwebsite] @miscESwebsite, title = Energy Scope website, howpublished = \urlhttps://sed-bso.gitlabpages.inria.fr/datacenter/energy_scope.html
[ESjcad2021] @miscESjcad2021, title = Energy Scope: a tool for measuring the energy profile of HPC and AI applications , author = Hervé Mathieu, year = 2021, howpublished = \urlhttps://jcad2021.sciencesconf.org/data/Herve_Mathieu_energy_scope.pdf
[starvz-readme] @miscstarvz-readme, title = StarVZ: R-based visualization techniques for the StarPU runtime , author = Lucas M. Schnorr and Vinicius Garcia Pinto and Lucas Leando Nesi and Marcelo Cogo Miletto , addendum = (consulted on April 8, 2022), howpublished = \urlhttps://github.com/schnorr/starvz/blob/master/README.org
[ggplot2] @miscggplot2, title = ggplot2, a system for declaratively creating graphics, howpublished = \urlhttps://ggplot2.tidyverse.org/
[sqliteLike] @miscsqliteLike, title = The LIKE, GLOB, REGEXP, and MATCH operators, SQL Language Expressions, SQLite , howpublished = \urlhttps://www.sqlite.org/lang_expr.html#like
[likwid] @misclikwid, title = LIKWID, author = Thomas Gruber and Jan Eitzinger and Georg Hager and Gerhard Wellein, year = 2021, month = 12, note = This research has been partially funded by grants: BMBF 01IH13009 and BMBF 01IH16012C , howpublished = \urlhttps://doi.org/10.5281/zenodo.5752537
[treibig2010likwid] Treibig, Hager & Wellein, Likwid: A lightweight performance-oriented tool suite for x86 multicore environments , 207-216, in in: 2010 39th international conference on parallel processing workshops, edited by (2010)
[RwideToLong] @miscRwideToLong, title = Cookbook for R: Converting data between wide and long format, howpublished = \url http://www.cookbook-r.com/Manipulating_data/Converting_data_between_wide_and_long_format/
[starvz16] , Pinto, Stanisic, Legrand, Schnorr, , Thibault, Danjean & , Analyzing Dynamic Task-Based Applications on Hybrid Platforms: An Agile Scripting Approach , 17-24, in in: 2016 Third Workshop on Visual Performance Analysis (VPA), edited by (2016)
[starvz18] , Garcia Pinto, Mello Schnorr, Stanisic, , Legrand, Thibault, Danjean & , A visual performance analysis framework for task-based parallel applications running on hybrid clusters , Concurrency and Computation: Practice and Experience, 30(18), e4472 (2018). link. doi.

Footnotes:

The source of this document as well as the source code files described in the latter are available for download at: https://thesis-mfelsoci.gitlabpages.inria.fr/sources/files.tar.gz

Ph.D. thesis of Marek Felšöci: Source files

Table of Contents

1. Dealing with proprietary source code

1.1. Defining, scheduling and computing benchmarks

2. Storage resources monitoring

3. Energetic profiling

4. Result parsing

5. Injecting results into a database

6. Wrapper scripts

6.1. Benchmark-related task wrappers

6.1.1. Python tasks

6.2. Benchmark task wrappers

6.2.1. In-core benchmarks

6.2.1.1. TGCC in-core benchmarks

6.2.2. Out-of-core benchmarks

6.2.2.1. Custom out-of-core storage

6.2.2.2. TGCC out-of-core benchmarks

6.2.3. FXT execution tracing template

7. Job submission

8. Post-processing benchmark results

8.1. Prerequisites

8.2. Importing data

8.2.1. Benchmark results

8.2.2. Trace files

8.2.3. Retrieving files

8.2.3.1. Reading rss.py trace files

8.2.3.2. Reading likwid trace files

8.2.3.3. Reading energy_scope trace files

8.2.3.4. Reading timeline trace files

8.3. Compressing data

8.3.1. compress_consumption

8.3.2. compress_global

8.4. Formatting data

8.4.1. Benchmark results

8.5. Style presets

8.5.1. Label functions

8.5.1.1. label_percentage

8.5.1.2. label_time

8.5.1.3. label_time_short

8.5.1.4. label_epsilon

8.5.1.5. label_storage

8.5.1.6. label_ooc

8.5.1.7. label_solver

8.5.1.8. label_solver_config

8.5.1.9. label_tag

8.5.1.10. label_tag_short

8.5.1.11. label_kind

8.5.1.12. label_ooc_storage

8.5.1.13. label_mapping

8.5.1.14. label_nbpts

8.5.1.15. label_nbpts_ticks

8.5.1.16. label_scientific

8.5.1.17. label_coupling and label_coupling_two_stage

8.5.1.18. label_peak_kind

8.5.1.19. label_execution

8.5.2. Label maps

8.5.2.1. phase.labs

8.5.3. Plot theme

8.6. Plot functions

8.6.1. times_by_nbpts

8.6.2. multisolve_times_by_nbrhs_and_nbpts

8.6.3. multisolve_rss_by_nbrhs_and_nbpts

8.6.4. coupled_memory_aware

8.6.5. multifacto_times_by_nbpts_and_schur_size

8.6.6. multifacto_rss_by_nbpts_and_schur_size

8.6.7. compare_coupled

8.6.8. compare_rss_coupled

8.6.9. accuracy_by_nbpts

8.6.10. rss_peaks_by_nbpts

8.6.11. rss_by_time

8.6.12. eprofile

8.6.13. es_comparison

8.6.14. scalability

8.6.15. scalability_ram

8.6.16. scalability_distributed

8.6.17. efficiency

8.6.18. efficiency_distributed

9. Post-processing StarPU execution traces

9.1. StarVZ configuration

9.2. Plotting FXT traces

8.2.3.1. Reading `rss.py` trace files

8.2.3.2. Reading `likwid` trace files

8.2.3.3. Reading `energy_scope` trace files

8.3.1. `compress_consumption`

8.3.2. `compress_global`

8.5.1.1. `label_percentage`

8.5.1.2. `label_time`

8.5.1.3. `label_time_short`

8.5.1.4. `label_epsilon`

8.5.1.5. `label_storage`

8.5.1.6. `label_ooc`

8.5.1.7. `label_solver`

8.5.1.8. `label_solver_config`

8.5.1.9. `label_tag`

8.5.1.10. `label_tag_short`

8.5.1.11. `label_kind`

8.5.1.12. `label_ooc_storage`

8.5.1.13. `label_mapping`

8.5.1.14. `label_nbpts`

8.5.1.15. `label_nbpts_ticks`

8.5.1.16. `label_scientific`

8.5.1.17. `label_coupling` and `label_coupling_two_stage`

8.5.1.18. `label_peak_kind`

8.5.1.19. `label_execution`

8.5.2.1. `phase.labs`

8.6.1. `times_by_nbpts`

8.6.2. `multisolve_times_by_nbrhs_and_nbpts`

8.6.3. `multisolve_rss_by_nbrhs_and_nbpts`

8.6.4. `coupled_memory_aware`

8.6.5. `multifacto_times_by_nbpts_and_schur_size`

8.6.6. `multifacto_rss_by_nbpts_and_schur_size`

8.6.7. `compare_coupled`

8.6.8. `compare_rss_coupled`

8.6.9. `accuracy_by_nbpts`

8.6.10. `rss_peaks_by_nbpts`

8.6.11. `rss_by_time`

8.6.12. `eprofile`

8.6.13. `es_comparison`

8.6.14. `scalability`

8.6.15. `scalability_ram`

8.6.16. `scalability_distributed`

8.6.17. `efficiency`

8.6.18. `efficiency_distributed`