Ph.D. thesis of Marek Felšöci: Source files
Table of Contents
- 1. Dealing with proprietary source code
- 2. Storage resources monitoring
- 3. Energetic profiling
- 4. Result parsing
- 5. Injecting results into a database
- 6. Wrapper scripts
- 7. Job submission
- 8. Post-processing benchmark results
- 8.1. Prerequisites
- 8.2. Importing data
- 8.3. Compressing data
- 8.4. Formatting data
- 8.5. Style presets
- 8.5.1. Label functions
- 8.5.1.1.
label_percentage
- 8.5.1.2.
label_time
- 8.5.1.3.
label_time_short
- 8.5.1.4.
label_epsilon
- 8.5.1.5.
label_storage
- 8.5.1.6.
label_ooc
- 8.5.1.7.
label_solver
- 8.5.1.8.
label_solver_config
- 8.5.1.9.
label_tag
- 8.5.1.10.
label_tag_short
- 8.5.1.11.
label_kind
- 8.5.1.12.
label_ooc_storage
- 8.5.1.13.
label_mapping
- 8.5.1.14.
label_nbpts
- 8.5.1.15.
label_nbpts_ticks
- 8.5.1.16.
label_scientific
- 8.5.1.17.
label_coupling
andlabel_coupling_two_stage
- 8.5.1.18.
label_peak_kind
- 8.5.1.19.
label_execution
- 8.5.1.1.
- 8.5.2. Label maps
- 8.5.3. Plot theme
- 8.5.1. Label functions
- 8.6. Plot functions
- 8.6.1.
times_by_nbpts
- 8.6.2.
multisolve_times_by_nbrhs_and_nbpts
- 8.6.3.
multisolve_rss_by_nbrhs_and_nbpts
- 8.6.4.
coupled_memory_aware
- 8.6.5.
multifacto_times_by_nbpts_and_schur_size
- 8.6.6.
multifacto_rss_by_nbpts_and_schur_size
- 8.6.7.
compare_coupled
- 8.6.8.
compare_rss_coupled
- 8.6.9.
accuracy_by_nbpts
- 8.6.10.
rss_peaks_by_nbpts
- 8.6.11.
rss_by_time
- 8.6.12.
eprofile
- 8.6.13.
es_comparison
- 8.6.14.
scalability
- 8.6.15.
scalability_ram
- 8.6.16.
scalability_distributed
- 8.6.17.
efficiency
- 8.6.18.
efficiency_distributed
- 8.6.1.
- 9. Post-processing StarPU execution traces
Reproducibility of numerical experiments has always been a complex matter. Building exactly the same software environment on multiple computing platforms may be long, tedious and sometimes virtually impossible to be done manually.
Within our research work, we put strong emphasis on ensuring the reproducibility of the experimental environment. On one hand, we make use of the GNU Guix guix transactional package manager allowing for a complete reproducibility of software environments across different machines. On the other hand, we rely on the principles of literate programming Knuth84 in an effort to maintain an exhaustive, clear and accessible documentation of our experiments and the associated environment.
This document represents such a documentation for the source files commonly
required for constructing our experimental environments and running the
experiments. It is written using Org mode for Emacs Dominik18,emacs which
defines the Org markup language allowing to combine formatted text, images and
figures with traditional source code. The .org
source this document is based
on features the source code blocks of multiple scripts and configuration files
that can to be extracted OrgTangle into separate source files 1.
Eventually, the document can be exported to various output formats OrgExport such as the HTML page you are currently viewing.
1. Dealing with proprietary source code
To perform experiments, we use the benchmark suite test_FEMBEM
and the
acoustic application actipole
from Airbus. test_FEMBEM
is a part of their
scab
software package. scab
as well as its direct dependencies, mpf
and
hmat
, are proprietary packages with closed source code residing in private Git
repositories. Before setting up an experimental software environment using Guix,
we have to acquire their local copies.
The getsrc.sh
script we define in this section is meant for acquiring the
source tarballs corresponding to user-specified versions of the Airbus hmat
,
mpf
, scab
and actipole
packages.
It begins with a help message function accessible through the -h
option.
function help() { echo "Retrieve the source tarballs corresponding to user-specified versions" \ "of the Airbus 'hmat', 'mpf', 'scab' and 'actipole' packages." >&2 echo "Usage: $0 [options]" >&2 echo >&2 echo "Options:" >&2 echo " -A Do not acquire sources for 'actipole'." >&2 echo " -h Show this help message and exit." >&2 echo " -o PATH Place the output source tarballs into the" \ "directory located at PATH." >&2 echo " -v SCRIPT[,SCRIPT,...] Consider the version information given" \ "by the environment variables defined in SCRIPT. Multiple SCRIPTs can" \ "be specified to retrieve multiple different versions at once." >&2 }
We use a generic error message function. The error message to print is expected to be the first argument to the function. If not present, the function displays a generic error message.
function error() { if test $# -lt 1; then echo "An unknown error occurred!" >&2 else echo "Error: $1" >&2 fi }
ACQUIRE_ACTIPOLE
decides whether the sources for the actipole
package should
be acquired. As this package is rarely used, using the -A
option it is
convenient to disable the preparation of source tarball for it.
ACQUIRE_ACTIPOLE=1
OUTPUT_DIR
specifies the location where to store the output source tarballs of
the Airbus packages. This can be modified using the -o
option. The default
location is $HOME/src
.
OUTPUT_DIR="$HOME/src"
VERSIONS
shall store the version information of the packages to retrieve. The
user must use the -v
option to specify at least one shell script file
providing the definition of the following environment variables:
HMAT_PACKAGE
, HMAT_VERSION
, HMAT_COMMIT
, HMAT_BRANCH
, MPF_PACKAGE
,
MPF_VERSION
, MPF_COMMIT
, MPF_BRANCH
, SCAB_PACKAGE
, SCAB_VERSION
,
SCAB_COMMIT
, SCAB_BRANCH
, ACTIPOLE_PACKAGE
, ACTIPOLE_VERSION
,
ACTIPOLE_COMMIT
and ACTIPOLE_BRANCH
. For each software package, they
indicate the name of the package in Guix, the commit and the branch in the
associated Git repository to consider. Note that if the -A
option is
specified, the ACTIPOLE_*
variables are unnecessary.
VERSIONS=""
At this stage, we are ready to parse the options and check the validity of option arguments where applicable.
while getopts ":Aho:v:" option; do case $option in A) ACQUIRE_ACTIPOLE=0 ;; o) OUTPUT_DIR=$OPTARG if test ! -d "$OUTPUT_DIR"; then error "'$OUTPUT_DIR' is not a valid directory!" exit 1 fi ;; v) VERSIONS=$(echo $OPTARG | sed 's/,/\n/g') ;;
We must also take into account unknown options, missing option arguments or
syntax mismatches as well as the -h
option.
\?) error "Arguments mismatch! Invalid option '-$OPTARG'." echo help exit 1 ;; :) error "Arguments mismatch! Option '-$OPTARG' expects an argument!" echo help exit 1 ;; h | *) help exit 0 ;; esac done
At first, we verify whether the -v
option has been specified by the user.
if test "$VERSIONS" == ""; then error "No version information has been specified." exit 1 fi
Then, we iterate over the version information script files given through the
-v
option and:
for spec in $VERSIONS; do
check if the current version information script is a valid file and source it,
if test ! -f $spec; then error "'$spec' is not a valid version information script." exit 1 fi source $spec
prepare some useful environment variables,
HMAT_BASENAME="$HMAT_PACKAGE-$HMAT_VERSION" HMAT_TARBALL="$OUTPUT_DIR/$HMAT_BASENAME.tar.gz" MPF_BASENAME="$MPF_PACKAGE-$MPF_VERSION" MPF_TARBALL="$OUTPUT_DIR/$MPF_BASENAME.tar.gz" SCAB_BASENAME="$SCAB_PACKAGE-$SCAB_VERSION" SCAB_TARBALL="$OUTPUT_DIR/$SCAB_BASENAME.tar.gz" ACTIPOLE_BASENAME="$ACTIPOLE_PACKAGE-$ACTIPOLE_VERSION" ACTIPOLE_TARBALL="$OUTPUT_DIR/$ACTIPOLE_BASENAME.tar.gz"
remove any previous clones of the Airbus repositories in
OUTPUT_DIR
,rm -rf $OUTPUT_DIR/$HMAT_BASENAME $OUTPUT_DIR/$MPF_BASENAME \ $OUTPUT_DIR/$SCAB_BASENAME $OUTPUT_DIR/$ACTIPOLE_BASENAME $HMAT_TARBALL \ $MPF_TARBALL $SCAB_TARBALL if test $ACQUIRE_ACTIPOLE -eq 1; then rm -rf $ACTIPOLE_TARBALL fi
make fresh clones and checkout the required revisions,
git clone --recurse-submodules --single-branch --branch $HMAT_BRANCH \ https://aseris-lab.info/airbus/hmat $OUTPUT_DIR/$HMAT_BASENAME cd $OUTPUT_DIR/$HMAT_BASENAME git checkout $HMAT_COMMIT git submodule update git clone --single-branch --branch $MPF_BRANCH \ https://aseris-lab.info/airbus/mpf $OUTPUT_DIR/$MPF_BASENAME cd $OUTPUT_DIR/$MPF_BASENAME git checkout $MPF_COMMIT git clone --single-branch --branch $SCAB_BRANCH \ https://aseris-lab.info/airbus/scab $OUTPUT_DIR/$SCAB_BASENAME cd $OUTPUT_DIR/$SCAB_BASENAME git checkout $SCAB_COMMIT if test $ACQUIRE_ACTIPOLE -eq 1; then git clone --single-branch --branch $ACTIPOLE_BRANCH \ https://aseris-lab.info/airbus/actipole $OUTPUT_DIR/$ACTIPOLE_BASENAME cd $OUTPUT_DIR/$ACTIPOLE_BASENAME git checkout $ACTIPOLE_COMMIT fi
verify that the cloned repositories are valid directories,
if test ! -d $OUTPUT_DIR/$HMAT_BASENAME || \ test ! -d $OUTPUT_DIR/$MPF_BASENAME || \ test ! -d $OUTPUT_DIR/$SCAB_BASENAME || \ (test $ACQUIRE_ACTIPOLE -eq 1 && test ! -d $OUTPUT_DIR/$ACTIPOLE_BASENAME); then error "Failed to clone the Airbus reporitories!" exit 1 fi
remove the
.git
folders from inside the clones to shrink the size of the final tarball created using thetar
command,rm -rf $OUTPUT_DIR/$HMAT_BASENAME/.git \ $OUTPUT_DIR/$HMAT_BASENAME/hmat-oss/.git $OUTPUT_DIR/$MPF_BASENAME/.git \ $OUTPUT_DIR/$SCAB_BASENAME/.git if test $ACQUIRE_ACTIPOLE -eq 1; then rm -rf $OUTPUT_DIR/$ACTIPOLE_BASENAME/.git tar -czf $ACTIPOLE_TARBALL -C $OUTPUT_DIR $ACTIPOLE_BASENAME fi tar -czf $HMAT_TARBALL -C $OUTPUT_DIR $HMAT_BASENAME tar -czf $MPF_TARBALL -C $OUTPUT_DIR $MPF_BASENAME tar -czf $SCAB_TARBALL -C $OUTPUT_DIR $SCAB_BASENAME
check if the tarballs were created and remove the clones.
if test ! -f $HMAT_TARBALL || test ! -f $MPF_TARBALL || \ test ! -f $SCAB_TARBALL || (test $ACQUIRE_ACTIPOLE -eq 1 && test ! -f $ACTIPOLE_TARBALL); then error "Failed to create tarballs!" exit 1 fi rm -rf $OUTPUT_DIR/$HMAT_BASENAME $OUTPUT_DIR/$MPF_BASENAME \ $OUTPUT_DIR/$SCAB_BASENAME if test $ACQUIRE_ACTIPOLE -eq 1; then rm -rf $OUTPUT_DIR/$ACTIPOLE_BASENAME fi done
1.1. Defining, scheduling and computing benchmarks
To automatize the generation and the computation of benchmarks, we use gcvb
gcvb, an open-source tool developed at Airbus. gcvb
allows us to define
benchmarks, generate corresponding shell job scripts for every benchmark or a
selected group of benchmarks, submit these job scripts for execution, then
gather and optionally validate or post-process the results. To generate multiple
variants of the same benchmark, gcvb
supports templates.
The configuration of gcvb
, the benchmark definitions as well as scheduling
parameters are specific to a given experimental study. We thus describe these in
the study-specific documentations. Follow
this link to see the
list of all of our experimental studies.
2. Storage resources monitoring
To continuously measure the amount of storage (RAM, swap file and optionally
hard drive) resources consumed during each benchmark run, we launch
test_FEMBEM
through the Python script rss.py
producing as many RAM, swap and
disk usage logs as there are MPI processes created during benchmark execution.
The peak values are included at the very end of these log files. Note that
measures are taken regularly each second and the output units are mibibytes
[MiB].
import subprocess import sys import threading import time import os import psutil from datetime import datetime
After importing necessary Python modules, we begin by defining a help message
function that can be triggered with the -h
option.
def help(): print("Monitor memory usage of PROGRAM.\n", "Usage: ", sys.argv[0], " [options] [PROGRAM]\n\n", "Options:\n -h Show this help message.\n", " --with-hdd Extend monitoring to disk usage.\n", " --with-io Extend monitoring to I/O volume and bandwidth.\n", file = sys.stderr, sep = "")
We must check if at least one argument has been passed to the script. It takes
as arguments the executable the resource consumption of which shall be measured
together with the arguments of the latter. Otherwise, one can specify the -h
option to make the script print a help message on how to use it. Also, the
--with-hdd
and --with-io
switches allow one to extend the monitoring to disk
usage and to I/O volume and bandwidth, respectively.
if len(sys.argv) < 2: print("Error: Arguments mismatch!\n", file = sys.stderr) help() sys.exit(1) if sys.argv[1] == "-h": help() sys.exit(0) with_hdd = False with_io = False if "--with-hdd" in sys.argv: with_hdd = True if "--with-io" in sys.argv: with_io = True
Follow some function definitions. At first, a function to determine the MPI rank of the currently monitored process.
def mpi_rank(): rank = 0 for env_name in ['MPI_RANKID', 'OMPI_COMM_WORLD_RANK', 'PMIX_RANK']: try: rank = int(os.environ[env_name]) break except: continue return rank
We also need a function to determine and format the current date and time. Indeed, we include a timestamp in the output log files everytime we record a measure. This helps us to synchronize the records with other monitoring tools.
def timestamp(): now = datetime.now() return now.strftime("%Y/%m/%dT%H:%M:%S")
The script monitors RAM and swap usages and optionally disk usage and bandwidth.
On Linux, the values can be obtained from the /proc
filesystem.
Memory usage as well as other statistics of a particular process are stored in a
human-readable form in /proc/<pid>/status
where <pid>
is the process
identifier (PID). In this file, the fields VmRSS
and VmSwap
hold the amount
of RAM and the amount of swap memory used by the process at instant \(t\),
respectively proc5. See the associated function below.
def rss(f, pid): f.seek(0, 0) lines = f.readlines() datetime = timestamp() VmRSS = -1024. VmSwap = -1024. try: for line in lines: if line.startswith("VmRSS"): VmRSS = int(line.split()[1]) elif line.startswith("VmSwap"): VmSwap = int(line.split()[1]) except ValueError: print( "Warning: Failed to decode '/proc/%d/status' at %s." % (pid, datetime), file = sys.stderr ) pass return datetime, (VmRSS / 1024.), (VmSwap / 1024.)
Input/output (I/O) statistics of a particular process are stored in a
human-readable form in /proc/<pid>/io
where <pid>
is the process identifier
(PID). In this file, the fields read_bytes
and write_bytes
hold the amount
of bytes read and wrote by the process at instant \(t\), respectively
proc5. Based on these values, we then compute I/O bandwidth. Finally, the
cancelled_byte_writes
field gives the amount of bytes that, in the end, were
not effectively written to the storage device. See the associated function
below.
def io(f, pid, last_read, last_write): f.seek(0, 0) lines = f.readlines() datetime = timestamp() read_bytes = 0. read_rate = 0. write_bytes = 0. write_rate = 0. cancelled_byte_writes = 0. try: for line in lines: if line.startswith("read_bytes"): read_bytes = int(line.split()[1]) read_rate = read_bytes - (last_read * 1024.) elif line.startswith("write_bytes"): write_bytes = int(line.split()[1]) write_rate = write_bytes - (last_write * 1024.) elif line.startswith("cancelled_byte_writes"): cancelled_byte_writes = int(line.split()[1]) except ValueError: print( "Warning: Failed to decode '/proc/%d/io' at %s." % (pid, datetime), file = sys.stderr ) pass return datetime, (read_bytes / 1024.), (write_bytes / 1024.), \ (cancelled_byte_writes / 1024.), (read_rate / 1024.), \ (write_rate / 1024.)
To be able to monitor disk space used during benchmarks, we assign to each run
its specific temporary folder all the files used by test_FEMEBM are put into.
Finally, we monitor the evolution of the size of TMPDIR
in our Python script
using the du
Linux command.
def hdd(tmpdir): du = subprocess.Popen(["du", "-s", tmpdir], stdout = subprocess.PIPE) datetime = timestamp() HddNow = -1024. try: HddNow = int(du.stdout.read().decode("utf-8").split("\t")[0]) except ValueError: print( "Warning: Failed to decode the stdout of 'du' at %s." % datetime, file = sys.stderr ) pass return datetime, (HddNow / 1024.)
Before entering into the monitoring loop, we need to:
initialize variables to store peak values and previously read values for I/O bandwidth computation (used only in presence of the
--with-io
option);VmHWM = float(0) SwapPeak = float(0) HddPeak = float(0) IOReadPeak = float(0) IOWritePeak = float(0) IOLastRead = float(0) IOLastWrite = float(0)
get the path of the temporary folder from the
TMPDIR
environment variable. If the variable is not set and the--with-hdd
option was specified,/tmp
is returned to allow the script to detect that no separate temporary folder has been created and exit with failure to prevent taking potentially incorrect measures;tmpdir = os.getenv('TMPDIR', '/tmp') if with_hdd and tmpdir == '/tmp': print('Error: Temporary files were not redirected!', file = sys.stderr) sys.exit(1)
launch the program to monitor and get the rank of the currently monitored process;
myargs = sys.argv[1:] if with_hdd == True and with_io == True: myargs = sys.argv[3:] elif with_hdd == True or with_io == True: myargs = sys.argv[2:] p = subprocess.Popen(myargs) rank = mpi_rank()
open the
/proc/<pid>/status
and/proc/<pid>/io
(if the--with-io
option was used) files to be able to collect the requested data about the monitored process.proc_status = open("/proc/%d/status" % p.pid, "r") proc_io = None if with_io == True: proc_io = open("/proc/%d/io" % p.pid, "r")
At this point, we begin the monitoring loop. The latter breaks when the monitored process exits. Note that the measured values are initially stored in memory. They are dumped into the corresponding log files at the end of the benchmark execution. This is to prevent frequent disk accesses that may impact the execution time of the monitored benchmark.
rss_log = [] hdd_log = [] io_log = [] while p.returncode == None:
The following tasks are performed to gather necessary results:
get current RAM and swap usage, update the peak values if necessary and add them to the log;
RamSwapNow = rss(proc_status, p.pid) if RamSwapNow[1] > VmHWM: VmHWM = RamSwapNow[1] if RamSwapNow[2] > SwapPeak: SwapPeak = RamSwapNow[2] rss_log.append(RamSwapNow)
collect disk usage statistics, update the peak disk space usage if necessary and add the corresponding value converted from bytes to kibibytes to the log;
if with_hdd == True: HddNow = hdd(tmpdir) if HddNow[1] > HddPeak: HddPeak = HddNow[1] hdd_log.append(HddNow)
collect I/O statistics, update the peak I/O bandwidth if necessary and add the corresponding values to the log;
if with_io == True: IONow = io(proc_io, p.pid, IOLastRead, IOLastWrite) if IONow[4] > IOReadPeak: IOReadPeak = IONow[4] if IONow[5] > IOWritePeak: IOWritePeak = IONow[5] IOLastRead = IONow[1] IOLastWrite = IONow[2] io_log.append(IONow)
sleep for one second and repeat.
time.sleep(1) p.poll()
Eventually, before exiting, we append usage peaks to the logs and dump the latter to disk.
print("Writing storage resource usage logs...") m = open("rss-%d.log" % rank, "w") for entry in rss_log: m.write("%s\t%g\t%g\n" % (entry[0], entry[1], entry[2])) m.write("%g\t%g\n" % (VmHWM, SwapPeak)) m.flush() m.close() proc_status.close() if with_hdd == True: d = open("hdd-%d.log" % rank, "w") for entry in hdd_log: d.write("%s\t%g\n" % (entry[0], entry[1])) d.write("%g\n" % HddPeak) d.flush() d.close() if with_io == True: io = open("io-%d.log" % rank, "w") for entry in io_log: io.write("%s\t%g\t%g\t%g\t%g\t%g\n" % \ (entry[0], entry[1], entry[2], entry[3], entry[4], entry[5])) io_log_last = io_log[-1] io.write("%g\t%g\t%g\t%g\t%g\n" % \ (io_log_last[1], io_log_last[2], io_log_last[3], IOReadPeak, \ IOWritePeak)) io.flush() io.close() proc_io.close() sys.exit(p.returncode)
3. Energetic profiling
energy_scope
ESwebsite,ESjcad2021 is a program benchmarking tool
dedicated primarily to the creation of energetic profiles of HPC applications.
To configure what and how it should be monitored, it uses a configuration in
JSON format. Among other parameters, it allows the choose the acquisition
frequency and profile.
In this section, we define a gcvb
template (see Section 1.1)
configuration file. We keep the default values for the most of the parameters.
Then, the {es[interval]}
and the {es[profile]}
placeholders allow us to
specify respectively the acquisition interval (in milliseconds) and profile,
e.g. et
for energy and CPU temperature monitoring or e
for energy-only
monitoring.
Note that the usage of double brackets allows us to parse this template in Python scripts.
{{ "parameter": {{ "comment":"configuration file template", "version":"2022-01", "owner":"energy_scope", "cpu_info":"/proc/cpuinfo", "nvidia_gpu_info":"/proc/driver/nvidia/gpus", "stop_file_prefix":"/tmp/es_read_rt_stop_flag_", "synchro_file_saved_prefix":"wrk/energy_scope_tmp_file_", "synchro_started_prefix":"wrk/energy_scope_tmp_started_", "msr_cmd":"rdmsr", "opt_enode":"node", "opt_estat":"shell", "opt_profile":"none", "opt_send":"none", "analyseafteracquisition":"no", "eprofile_period(ms)":{es[interval]}, "default_read_rt_profile":"{es[profile]}" }}, "data": {{ "intel": {{ "generic" : {{ "read_rt_idle_time":{es[idle_time]}, "registers": {{ "0x010": {{"when":"ae", "pct":"package"}}, "0x0ce": {{"when":"b", "pct":"package"}}, "0x0e7": {{"when":"N", "pct":"thread"}}, "0x0e8": {{"when":"a", "pct":"thread"}}, "0x198": {{"when":"N", "pct":"core"}}, "0x19c": {{"when":"at", "pct":"core"}}, "0x1A2": {{"when":"b", "pct":"package"}}, "0x606": {{"when":"b", "pct":"package"}}, "0x610": {{"when":"b", "pct":"package"}}, "0x611": {{"when":"ae", "pct":"package"}}, "0x613": {{"when":"a", "pct":"package"}}, "0x614": {{"when":"b", "pct":"package"}}, "0x619": {{"when":"ae", "pct":"package"}}, "0x620": {{"when":"b", "pct":"package"}}, "0x621": {{"when":"a", "pct":"package"}}, "0x639": {{"when":"ae", "pct":"package"}}, "0x641": {{"when":"ae", "pct":"package"}}, "0x64e": {{"when":"a", "pct":"thread"}}, "0x770": {{"when":"b", "pct":"package"}}, "0x771": {{"when":"b", "pct":"thread"}}, "0x774": {{"when":"b", "pct":"thread"}} }} }} }}, "amd":{{ "generic" : {{ "read_rt_idle_time":{es[idle_time]}, "registers": {{ "0x010": {{"when":"ae", "pct":"package"}}, "0x0e7": {{"when":"a", "pct":"thread"}}, "0x0e8": {{"when":"a", "pct":"thread"}}, "0xC0000104": {{"when":"b", "pct":"thread"}}, "0xC0010064": {{"when":"b", "pct":"core"}}, "0xC0010299": {{"when":"b", "pct":"package"}}, "0xC001029A": {{"when":"a", "pct":"core"}}, "0xC001029B": {{"when":"ae", "pct":"package"}} }} }} }} }} }}
4. Result parsing
The Shell script parse.sh
parses data from a test_FEMBEM
benchmark or an
actipole
computation and store the result to a comma-separated values .csv
file.
We begin with a help message function that can be triggered using the -h
option.
function help() { echo -n "Parse data from a test_FEMBEM run and store the result to a " >&2 echo "comma-separated (*.csv) file." >&2 echo "Usage: ./parse.sh [options]" >&2 echo >&2 echo "Options:" >&2 echo " -h Show this help message." >&2 echo -n " -s FILE Read the standard output of " >&2 echo "test_FEMBEM from FILE rather than from the standard input." >&2 echo -n " -r PATH Read memory and disk usage " >&2 echo "logs provided by rss.py from the directory at PATH." >&2 echo -n " -l FILE=FUNCTION[,FUNCTION] Read the trace log of the " >&2 echo -n "test_FEMBEM run from FILE and parse execution time and " >&2 echo "iteration count of one or more FUNCTION(s)." echo -n " -K KEY=VALUE[,KEY=VALUE] Parse one or more additional " >&2 echo "KEY=VALUE pairs to be added to the output (*.csv) file." >&2 echo " -H Do not print header on output." >&2 echo -n " -o FILE Specify the file to store the " >&2 echo "output of the parsing to." >&2 }
We use a generic error message function. The error message to print is expected to be the first argument to the function. If not present, the function displays a generic error message.
function error() { if test $# -lt 1; then echo "An unknown error occurred!" >&2 else echo "Error: $1" >&2 fi }
There should be at least one argument provided on the command line.
if test $# -lt 1; then help exit 1 fi
STDOUT
holds the path to the input file containing the standard output to
parse.
STDOUT=""
RSS
contains path and name pattern of input files containing the log of the
resource monitoring script (see Section 2).
RSS=""
TRACE
is the input file containing the trace call log produced by
test_FEMBEM
.
TRACE=""
OUTPUT
is the path to the output file to store the parsed values to.
OUTPUT=""
FUNCTIONS
contains names of functions to parse information about from
trace call logs.
FUNCTIONS=""
CUSTOM_KV
holds custom key-value pairs to be included in the output *.csv
file.
CUSTOM_KV=""
DISABLE_HEADING
toggles heading in the output *.csv
file.
DISABLE_HEADING=0
At this point, we parse provided arguments and check the validity of option values.
while getopts ":hs:r:l:K:o:H" option; do case $option in
The standard output to parse can be passed to the script either through its
standard input or in a file specified using the -s
option.
s) STDOUT=$OPTARG if test ! -f $STDOUT; then error "'$STDOUT' is not a valid file!" exit 1 fi ;;
When the -r
option is provided, the script shall also parse resource
monitoring logs.
r) RSS=$OPTARG if test ! -d $RSS; then error "'$RSS' is not a valid directory!" exit 1 fi ;;
When the -l
option is provided, the script shall read the trace call log from
the file passed as argument and parse execution time, iteration count and
floating-point operations (Flops) of one or more functions.
l) TRACE=$(echo $OPTARG | cut -d '=' -f 1) FUNCTIONS=$(echo $OPTARG | cut -d '=' -f 2 | sed 's/,/\t/g') if test ! -f $TRACE; then error "'$TRACE' is not a valid file!" exit 1 fi ;;
The -K
option allows to add one or more KEY=VALUE
pairs into the output
*.csv
file.
K) if test "$CUSTOM_KV" == ""; then CUSTOM_KV="$OPTARG" else CUSTOM_KV="$CUSTOM_KV,$OPTARG" fi ;;
The -H
option toggles printing of the header in the output *.csv
file.
H)
DISABLE_HEADING=1
;;
Using the -o
option, one can specify the name of the output *.csv
file.
o) OUTPUT=$OPTARG ;;
Eventually, we take care of unknown options or missing arguments and raise an error if any.
\?) # Unknown option error "Arguments mismatch! Invalid option '-$OPTARG'." echo help exit 1 ;; :) # Missing option argument error "Arguments mismatch! Option '-$OPTARG' expects an argument!" echo help exit 1 ;; h | *) help exit 0 ;; esac done
If the standard output to parse is not provided in a file, we try to read its content from the standard input of the script.
if test "$STDOUT" == ""; then rm -rf .input while read line; do echo $line >> .input done STDOUT=.input fi
If no standard output to parse is found, we abort the script and raise an error.
if test $(wc --bytes $STDOUT | cut -d' ' -f 1) -lt 1; then error "'$STDOUT' contains no 'test_FEMBEM' standard output to parse!" exit 1 fi
The treatment starts by separating custom key-value pairs, if any, by a
tabulation character, so they can be looped over using a for
loop.
CUSTOM_KV=$(echo $CUSTOM_KV | sed 's/,/\t/g')
We print the header line to the output file if it was not explicitly disabled
using the -H
option.
if test $DISABLE_HEADING -ne 1; then echo -n "processes,by,mapping,ranking,binding,omp_thread_num," > $OUTPUT echo -n "mkl_thread_num,hmat_ncpu,mpf_max_memory,nbpts,radius," >> $OUTPUT echo -n "height,nbrhs,step_mesh,nbem,nbpts_lambda,lambda," >> $OUTPUT echo -n "thread_block_size,proc_block_size,disk_block_size," >> $OUTPUT echo -n "tps_cpu_facto_mpf,tps_cpu_solve_mpf,error,symmetry," >> $OUTPUT echo -n "h_assembly_accuracy,h_recompression_accuracy," >> $OUTPUT echo -n "assembled_size_mb,tps_cpu_facto,factorized_size_mb," >> $OUTPUT echo -n "tps_cpu_solve,mumps_blr,mumps_blr_accuracy," >> $OUTPUT echo -n "mumps_blr_variant,coupled_method,coupled_nbrhs," >> $OUTPUT echo -n "size_schur,cols_schur,sparse_ooc,singularity_container," >> $OUTPUT echo -n "slurm_jobid,system_kind,csc_tot,csc_init,csc_assemb," >> $OUTPUT echo -n "csc_analyze,csc_facto,csc_bcast,csc_retrieval,csc_free," >> $OUTPUT echo -n "es_cpu,es_dram,es_duration,rm_peak,rm_values,swap_peak," >> $OUTPUT echo -n "swap_values,hdd_peak,hdd_values,io_read_peak," >> $OUTPUT echo -n "io_read_values,io_write_peak,io_write_values," >> $OUTPUT echo -n "io_cwrite_peak,io_cwrite_values" >> $OUTPUT
The common header entries are followed by custom key-value pairs and additional details about selected functions acquired from the trace call logs, if specified.
for kv in $CUSTOM_KV; do KEY=$(echo $kv | cut -d '=' -f 1) echo -n ",$KEY" >> $OUTPUT done for f in $FUNCTIONS; do echo -n ","$f"_exec_time,"$f"_nb_execs,"$f"_flops" >> $OUTPUT done echo >> $OUTPUT fi
Next, we parse all the interesting information from the provided standard output.
processes=$(cat $STDOUT | grep "<PERFTESTS> MPI process count" | uniq | \ cut -d '=' -f 2 | sed 's/[^0-9]//g') mapping=$(cat $STDOUT | grep "<PERFTESTS> MPI processes mapped by" | uniq | \ cut -d '=' -f 2 | sed 's/[^a-z]//g') ranking=$(cat $STDOUT | grep "<PERFTESTS> MPI processes ranked by" | uniq |\ cut -d '=' -f 2 | sed 's/[^a-z]//g') binding=$(cat $STDOUT | grep "<PERFTESTS> MPI processes bound to" | uniq | \ cut -d '=' -f 2 | sed 's/[^a-z]//g') omp_thread_num=$(cat $STDOUT | grep "OpenMP thread number" | cut -d '=' -f 2 | \ sed 's/[^0-9]//g') mkl_thread_num=$(cat $STDOUT | grep "MKL thread number" | cut -d '=' -f 2 | \ sed 's/[^0-9]//g') hmat_ncpu=$(cat $STDOUT | grep "HMAT_NCPU" | cut -d '=' -f 4 | \ sed 's/[^0-9]//g') mpf_max_memory=$(cat $STDOUT | grep "MPF_MAX_MEMORY" | cut -d '=' -f 2 | \ sed 's/[^0-9]//g') nbpts=$(cat $STDOUT | grep "<PERFTESTS> NbPts =" | cut -d '=' -f 2 | \ sed 's/[^0-9]//g') radius=$(cat $STDOUT | grep "Reading radius" | cut -d '=' -f 2 | \ sed 's/[^0-9.]//g') height=$(cat $STDOUT | grep "Reading height" | cut -d '=' -f 2 | \ sed 's/[^0-9.]//g') nbrhs=$(cat $STDOUT | grep "<PERFTESTS> NbRhs" | cut -d '=' -f 2 | \ sed 's/[^0-9]//g') step_mesh=$(cat $STDOUT | grep "<PERFTESTS> StepMesh" | cut -d '=' -f 2 | \ sed 's/[^-0-9e.+]//g') nbem=$(cat $STDOUT | grep "<PERFTESTS> NbPtsBEM" | cut -d '=' -f 2 | \ sed 's/[^0-9]//g') nbpts_lambda=$(cat $STDOUT | grep "<PERFTESTS> nbPtLambda" | \ cut -d '=' -f 2 | sed 's/[^-0-9e.+]//g') lambda=$(cat $STDOUT | grep "<PERFTESTS> Lambda" | cut -d '=' -f 2 | \ sed 's/[^-0-9e.+]//g') thread_block_size=$(cat $STDOUT | grep "thread block" | cut -d ':' -f 2 | \ cut -d 'x' -f 1 | sed 's/[^0-9]//g') proc_block_size=$(cat $STDOUT | grep "proc block" | cut -d ':' -f 2 | \ cut -d 'x' -f 1 | sed 's/[^0-9]//g') disk_block_size=$(cat $STDOUT | grep "disk block" | cut -d ':' -f 2 | \ cut -d 'x' -f 1 | sed 's/[^0-9]//g') tps_cpu_facto_mpf=$(cat $STDOUT | grep "<PERFTESTS> TpsCpuFactoMPF =" | \ cut -d '=' -f 2 | sed 's/[^0-9.]//g') tps_cpu_solve_mpf=$(cat $STDOUT | grep "<PERFTESTS> TpsCpuSolveMPF =" | \ cut -d '=' -f 2 | sed 's/[^0-9.]//g') error=$(cat $STDOUT | grep "<PERFTESTS> Error" | cut -d '=' -f 2 | \ sed 's/[^-0-9e.+]//g') symmetry=$(cat $STDOUT | grep "Testing: Non-Symmetric matrices and solvers.") if test "$symmetry" != ""; then symmetry="non-symmetric" else symmetry="symmetric" fi h_assembly_accuracy=$(cat $STDOUT | grep "\[HMat\] Compression epsilon" | \ cut -d ':' -f 2 | sed 's/[^-0-9e.+]//g') h_recompression_accuracy=$(cat $STDOUT | \ grep "\[HMat\] Recompression epsilon" | \ cut -d ':' -f 2 | sed 's/[^-0-9e.+]//g') assembled_size_mb=$(cat $STDOUT | grep "<PERFTESTS> AssembledSizeMb" | \ cut -d '=' -f 2 | sed 's/[^0-9.]//g') tps_cpu_facto=$(cat $STDOUT | grep "<PERFTESTS> TpsCpuFacto =" | \ cut -d '=' -f 2 | sed 's/[^0-9.]//g') factorized_size_mb=$(cat $STDOUT | grep "<PERFTESTS> FactorizedSizeMb" | \ cut -d '=' -f 2 | sed 's/[^0-9.]//g') tps_cpu_solve=$(cat $STDOUT | grep "<PERFTESTS> TpsCpuSolve =" | \ cut -d '=' -f 2 | sed 's/[^0-9.]//g') mumps_blr="NA" if test "$(cat $STDOUT | grep 'NOT Using Block-Low-Rank compression')" != ""; then mumps_blr=0 elif test "$(cat $STDOUT | grep 'Using Block-Low-Rank compression')" != ""; then mumps_blr=1 fi mumps_blr_accuracy=$(cat $STDOUT | \ grep "Accuracy parameter for Block-Low-Rank" | \ cut -d ':' -f 2 | sed 's/[^-0-9e.+]//g') mumps_blr_variant=$(cat $STDOUT | grep "BLR Factorization variant" | \ cut -d ':' -f 2 | sed 's/[^-0-9e.+]//g') coupled_method="" if test "$(cat $STDOUT | grep 'Multi solve method.')" != ""; then coupled_method="multi-solve" elif test "$(cat $STDOUT | grep 'Multi facto method.')" != ""; then coupled_method="multi-facto" fi coupled_nbrhs=$(cat $STDOUT | grep "Number of simultaneous RHS" | \ cut -d ':' -f 2 | sed 's/[^0-9.]//g') if test "$coupled_nbrhs" == ""; then coupled_nbrhs=32 # Default value fi size_schur=$(cat $STDOUT | grep "Size of the block Schur" | \ cut -d ':' -f 2 | sed 's/[^0-9.]//g') cols_schur=$(cat $STDOUT | \ grep "Number of columns in the H-matrix Schur block" | \ cut -d ':' -f 2 | sed 's/[^0-9.]//g') sparse_ooc=$(cat $STDOUT | grep "[mumps] Out-of-core.") if test "$sparse_ooc" != ""; then sparse_ooc=1 else sparse_ooc=0 fi singularity_container=$(cat $STDOUT | grep "<PERFTESTS> Singularity image") if test "$singularity_container" != ""; then singularity_container=$(echo $singularity_container | cut -d '=' -f 2 | \ sed 's/[[:space:]]*//g') else singularity_container="NA" fi slurm_jobid=$(cat $STDOUT | grep "<PERFTESTS> Slurm job identifier" | uniq | \ cut -d '=' -f 2 | sed 's/[^0-9.]//g') if test "$(cat $STDOUT | grep 'Testing : FEM-BEM')" != ""; then system_kind="fem-bem" elif test "$(cat $STDOUT | grep 'Testing : FEM')" != ""; then system_kind="fem" elif test "$(cat $STDOUT | grep 'Testing : BEM')" != ""; then system_kind="bem" else system_kind="unknown" fi csc_tot="0.0" csc_init="0.0" csc_assemb="0.0" csc_analyze="0.0" csc_facto="0.0" csc_bcast="0.0" csc_retrieval="0.0" csc_free="0.0" if test "$coupled_method" == "multi-facto"; then for line in $(cat $STDOUT | grep -o 'CreateSchurComplement.*A\[.*$' | \ sed 's/ /;/g'); do short=$(echo $line | grep -o 'Timers.*$') csc_tot_value="0.0" if test "$(echo $line | grep 'QR_MUMPS')" != ""; then csc_tot_value=$(echo $short | cut -d ';' -f 2) else csc_tot_value=$(echo $short | cut -d ';' -f 1 | sed 's/[^0-9.]//g') fi short=$(echo $line | grep -o 'init.*$') csc_init_value=$(echo $short | cut -d ';' -f 1 | cut -d '=' -f 2) csc_assemb_value=$(echo $short | cut -d ';' -f 2 | cut -d '=' -f 2) csc_analyze_value=$(echo $short | cut -d ';' -f 3 | cut -d '=' -f 2) csc_facto_value=$(echo $short | cut -d ';' -f 4 | cut -d '=' -f 2) csc_retrieval_value="0.0" csc_bcast_value="0.0" if test "$(echo $line | grep 'QR_MUMPS')" != ""; then csc_retrieval_value=$(echo $short | cut -d ';' -f 5 | cut -d '=' -f 2) else csc_bcast_value=$(echo $short | cut -d ';' -f 5 | cut -d '=' -f 2) fi csc_free_value=$(echo $short | cut -d ';' -f 6 | cut -d '=' -f 2) csc_tot="$csc_tot + $csc_tot_value" csc_init="$csc_init + $csc_init_value" csc_assemb="$csc_assemb + $csc_assemb_value" csc_analyze="$csc_analyze + $csc_analyze_value" csc_facto="$csc_facto + $csc_facto_value" csc_bcast="$csc_bcast + $csc_bcast_value" csc_retrieval="$csc_retrieval + $csc_retrieval_value" csc_free="$csc_free + $csc_free_value" done csc_tot=$(echo $csc_tot | bc -l) csc_init=$(echo $csc_init | bc -l) csc_assemb=$(echo $csc_assemb | bc -l) csc_analyze=$(echo $csc_analyze | bc -l) csc_facto=$(echo $csc_facto | bc -l) csc_bcast=$(echo $csc_bcast | bc -l) csc_retrieval=$(echo $csc_retrieval | bc -l) csc_free=$(echo $csc_free | bc -l) fi es_cpu="NA" es_dram="NA" es_duration="NA" if test -f energy_scope_estat*; then es_estat=$(cat energy_scope_estat*) es_cpu=$(echo $es_estat | jq '.arch.data."joule(J)"."ecpu(J)"') es_dram=$(echo $es_estat | jq '.arch.data."joule(J)"."edram(J)"') es_duration=$(echo $es_estat | jq '."duration(sec)"') fi
If the -r
option was provided, we parse also storage resource monitoring logs.
RAM and swap usage log files are named rss-x.log
and disk usage log files are
named hdd-x.log
where \(x\) is MPI process rank.
if test "$RSS" != ""; then rm_peak="0.0" rm_values="" swap_peak="0.0" swap_values="" hdd_peak="0.0" hdd_values="" io_read_peak="0.0" io_read_values="" io_write_peak="0.0" io_write_values="" io_cwrite_peak="0.0" io_cwrite_values="" for rss in $(ls $RSS | grep -e "^rss"); do rm_value=$(cat $RSS/$rss | tail -n 1 | cut -f 1) rm_peak="$rm_peak + $rm_value" rm_values="$rm_value $rm_values" swap_value=$(cat $RSS/$rss | tail -n 1 | cut -f 2) swap_peak="$swap_peak + $swap_value" swap_values="$swap_value $swap_values" done for hdd in $(ls $RSS | grep -e "^hdd"); do hdd_value=$(cat $RSS/$hdd | tail -n 1) hdd_peak="$hdd_peak + $hdd_value" hdd_values="$hdd_value $hdd_values" done for io in $(ls $RSS | grep -e "^io"); do io_read_value=$(cat $RSS/$io | tail -n 1 | cut -f 2) io_read_peak="$io_read_peak + $io_read_value" io_read_values="$io_read_value $io_read_values" io_write_value=$(cat $RSS/$io | tail -n 1 | cut -f 3) io_write_peak="$io_write_peak + $io_write_value" io_write_values="$io_write_value $io_write_values" io_cwrite_value=$(cat $RSS/$io | tail -n 1 | cut -f 4) io_cwrite_peak="$io_cwrite_peak + $io_cwrite_value" io_cwrite_values="$io_cwrite_value $io_cwrite_values" done rm_peak=$(echo $rm_peak | bc -l) swap_peak=$(echo $swap_peak | bc -l) hdd_peak=$(echo $hdd_peak | bc -l) io_read_peak=$(echo $io_read_peak | bc -l) io_write_peak=$(echo $io_write_peak | bc -l) io_cwrite_peak=$(echo $io_cwrite_peak | bc -l) fi
Eventually, we print parsed values, values from custom key-value pairs, if any, and additional function information possibly parsed from trace call logs.
echo -n "$processes,$by,$mapping,$ranking,$binding,$omp_thread_num," >> $OUTPUT echo -n "$mkl_thread_num,$hmat_ncpu,$mpf_max_memory,$nbpts," >> $OUTPUT echo -n "$radius,$height,$nbrhs,$step_mesh,$nbem,$nbpts_lambda," >> $OUTPUT echo -n "$lambda,$thread_block_size,$proc_block_size," >> $OUTPUT echo -n "$disk_block_size,$tps_cpu_facto_mpf,$tps_cpu_solve_mpf," >> $OUTPUT echo -n "$error,$symmetry,$h_assembly_accuracy," >> $OUTPUT echo -n "$h_recompression_accuracy,$assembled_size_mb," >> $OUTPUT echo -n "$tps_cpu_facto,$factorized_size_mb,$tps_cpu_solve," >> $OUTPUT echo -n "$mumps_blr,$mumps_blr_accuracy,$mumps_blr_variant," >> $OUTPUT echo -n "$coupled_method,$coupled_nbrhs,$size_schur,$cols_schur," >> $OUTPUT echo -n "$sparse_ooc,$singularity_container,$slurm_jobid," >> $OUTPUT echo -n "$system_kind,$csc_tot,$csc_init,$csc_assemb,$csc_analyze," >> $OUTPUT echo -n "$csc_facto,$csc_bcast,$csc_retrieval,$csc_free,$es_cpu," >> $OUTPUT echo -n "$es_dram,$es_duration,$rm_peak,$rm_values,$swap_peak," >> $OUTPUT echo -n "$swap_values,$hdd_peak,$hdd_values,$io_read_peak," >> $OUTPUT echo -n "$io_read_values,$io_write_peak,$io_write_values," >> $OUTPUT echo -n "$io_cwrite_peak,$io_cwrite_values" >> $OUTPUT for kv in $CUSTOM_KV; do VALUE=$(echo $kv | cut -d '=' -f 2) echo -n ",$VALUE" >> $OUTPUT done for f in $FUNCTIONS; do exec_time=$(cat $TRACE | grep $f | cut -d '|' -f 2 | sed 's/[^0-9.]//g') nb_execs=$(cat $TRACE | grep $f | cut -d '|' -f 3 | sed 's/[^0-9]//g') flops=$(cat $TRACE | grep $f | cut -d '|' -f 4 | sed 's/[^0-9]//g') echo -n ",$exec_time,$nb_execs,$flops" >> $OUTPUT done
At the very end, we print the trailing new line character to the output file, perform cleaning and terminate.
echo >> $OUTPUT rm -rf .input exit 0
5. Injecting results into a database
The Python script inject.py
allows to call a custom parsing script producing a
.csv
file on output and gather the results exported in the latter, then inject
the values into a gcvb
NoSQL database (see Section 1.1) for an
eventual post-processing.
At the beginning, we import necessary Python modules as well as the gcvb
module.
import gcvb import subprocess import csv import sys
The script expect the -h
option or at least two arguments:
DATASET
which stands for the.csv
file to gather the data from,PARSER
which represents the path to the parsing script to use.
DATASET
should be a .csv.
file containing two lines, one heading providing
the captions and then the associated values.
Next, may follow the arguments to call PARSER
with.
The script continues with a help message function, that can be triggered with
the -h
option, and argument check.
def help(): print("Launch PARSER (with ARGUMENTS if needed) instrumented to produce a ", "comma-separated values .csv output DATASET, deserialize it and ", "inject to the current gcvb database provided by the gcvb module.\n", "Usage: ", sys.argv[0], " DATASET PARSER [ARGUMENTS]\n\n", "Options:\n -h Show this help message.", file = sys.stderr, sep = "") if len(sys.argv) < 2: print("Error: Arguments mismatch!\n", file = sys.stderr) help() if sys.argv[1] == "-h": help() sys.exit(0)
In the main function, we parse the arguments
def main(): args = sys.argv args.pop(0) # Drop the script name. dataset = args.pop(0) parser = args.pop(0)
and launch the parsing script.
subprocess.run([parser] + args)
Once it's finished, we open the data set produced, gather data into a dictionary
and inject key-value pairs one by one into the currently used gcvb
database
provided by the imported gcvb
module.
with open(dataset, mode = 'r') as data: reader = csv.DictReader(data) for row in reader: for item in row: gcvb.add_metric(item, row[item]) if __name__ == '__main__': main()
6. Wrapper scripts
Before launching a benchmark or a benchmark-related command there are some
preparation actions to perform. Similarly, there are some completion and
cleaning actions to perform once the task finishes. For the sake of convenience,
especially in case of distributed parallel execution, we use a wrapper shell
script to perform these actions. In case of benchmarks themselves, this script
has the form of a gcvb
template file (see Section 1.1) allowing us
to make the script fit the individual parameters of each benchmark. In case of a
benchmark-related command, we produce a directly usable shell script.
6.2. Benchmark task wrappers
Although, the major part of the script template is identical for all kinds of benchmarks we define (see Section 1.1), some parts may differ to meet the specific needs of a given benchmark or set of benchmarks, e.g. setting of the environment variables related to out-of-core computation activation. In this section, we define multiple variations of the benchmark task wrapper script template.
6.2.1. In-core benchmarks
The wrapper script template wrapper-ic.sh
is meant for benchmarks run in-core,
i.e. with the out-of-core computation disabled.
At first, we try to determine the rank of the current process.
if test "$OMPI_COMM_WORLD_RANK" != ""; then MPI_RANK=$OMPI_COMM_WORLD_RANK elif test "$MPI_RANKID" != ""; then MPI_RANK=$MPI_RANKID else echo "Failed to get the MPI of the current process! Can not proceed." >&2 exit 1 fi
and if we are the 0th process, we print out:
the MPI configuration (see the
parallel
placeholder in Section 1.1),if test $MPI_RANK -eq 0; then echo "<PERFTESTS> MPI process count = {parallel[np]}" echo "<PERFTESTS> MPI processes per node = {parallel[nodewise]}" echo "<PERFTESTS> MPI processes mapped by = {parallel[map]}" echo "<PERFTESTS> MPI processes ranked by = {parallel[rank]}" echo "<PERFTESTS> MPI processes bound to = {parallel[bind]}" fi
the path to the Singularity container, if the execution is taking place within one of them,
if test "$SINGULARITY_CONTAINER" != "" && test $MPI_RANK -eq 0; then echo "<PERFTESTS> Singularity image = $SINGULARITY_CONTAINER." fi
the Slurm job identifier.
if test "$SLURM_JOBID" != "" && test $MPI_RANK -eq 0; then echo "<PERFTESTS> Slurm job identifier = $SLURM_JOBID" fi
To measure disk space usage (see Section 2) during execution of
benchamrks we need to setup a dedicated temporary folder. Depending on the
script template, the target storage media may vary. We thus use an environment
variable TMPBASE
which can be set for each script template separately.
export TMPBASE=/tmp
The {@job[id]}
is gcvb
placeholder exposed by default within template
expansion and represents a unique benchmark identifier. Note that we remove the
chosen temporary directory first to ensure there is no data left behind by any
previous benchmark run. Setting MUMPS_OOC_TMPDIR
is necessary for out-of-core
in MUMPS to work properly in all cases. If the corresponding shell wrapper is
present in the current working directory, we need to apply the workaround for
Singularity containers from Section 6.1.1 too.
rm -rf $TMPBASE/vive-pain-au-chocolat/{@job[id]}/$MPI_RANK mkdir -p $TMPBASE/vive-pain-au-chocolat/{@job[id]}/$MPI_RANK export TMPDIR=$TMPBASE/vive-pain-au-chocolat/{@job[id]}/$MPI_RANK export MUMPS_OOC_TMPDIR=$TMPDIR if test -f "./wrapper-python-task.sh"; then <<code:sanitize-python-in-singularity>> fi
Then, we position the environment variables specifying the number of threads to
use available through the parallel
placeholder (see Section
1.1).
export OMP_NUM_THREADS={parallel[nt]} MKL_NUM_THREADS={parallel[nt]} \ HMAT_NCPU={parallel[nt]} QRM_NCPU={parallel[nt]}
Also, we need to disable the out-of-core computation feature. For MUMPS, it is
disabled by default. For SPIDO and HMAT we need to set a couple of environment
variables to disable it explicitly. For qr_mumps
, we need to set a StarPU
environment variable, i.e. STARPU_LIMIT_CPU_MEM
representing the maximum
amount of RAM StarPU can use. We set it to an arbitrary large value (1 PiB). In
case of HMAT, the HMAT_LIMIT_MEM
environment variable is used to achieve the
same effect.
export MPF_SPIDO_INCORE=1 HMAT_DISABLE_OOC=1 STARPU_LIMIT_CPU_MEM=1073741824 \ HMAT_LIMIT_MEM=1073741824
At this point, we can launch the benchmark command passed as argument.
eval "$*"
After the execution, we have to clean all the files produced during the
benchmark execution except for logs, results, FXT traces, configuration files,
scheduling and launching scripts, i.e. files matching *.log
, *.yaml
,
es_config.json
, prof_file*
, *.out
, *batch
, coupled
, scalability
, or
*.sh
.
if test $MPI_RANK -eq 0; then rm -f $(find . -type f ! -name "*.log" ! -name "*.yaml" ! -name "prof_file*" \ ! -name "*.out" ! -name "*batch" ! -name "coupled" \ ! -name "scalability" ! -name "*.sh" ! -name "*energy_scope*" \ ! -name "es_config.json" ! -name "*.csv") fi
6.2.1.1. TGCC in-core benchmarks
The template for in-core benchmarks wrapper-ic-tgcc.sh
sets TMPBASE
such as
to allow in-core benchmarks to run on the TGCC platform.
export TMPBASE=$CCCSCRATCHDIR if test "$PMIX_RANK" != ""; then MPI_RANK=$PMIX_RANK else echo "Failed to get the MPI of the current process! Can not proceed." >&2 exit 1 fi if test $MPI_RANK -eq 0; then echo "<PERFTESTS> MPI process count = {parallel[np]}" fi rm -rf $TMPBASE/vive-pain-au-chocolat/{@job[id]}/$MPI_RANK mkdir -p $TMPBASE/vive-pain-au-chocolat/{@job[id]}/$MPI_RANK export TMPDIR=$TMPBASE/vive-pain-au-chocolat/{@job[id]}/$MPI_RANK export OMP_NUM_THREADS={parallel[nt]} MKL_NUM_THREADS={parallel[nt]} \ HMAT_NCPU={parallel[nt]} QRM_NCPU={parallel[nt]} <<code:incore-incore>> <<code:incore-completion>>
6.2.2. Out-of-core benchmarks
By default, the template for out-of-core benchmarks wrapper-ooc.sh
sets
TMPBASE
so as to use the /tmp
storage.
export TMPBASE=/tmp
Compared to the template for in-core benchmarks (see Section
6.2.1), here we make sure the environment variables disabling
purely in-core computation for qr_mumps
, SPIDO and HMAT are unset if the
placeholder {sparse[ooc]}
or the placeholder {dense[ooc]}
is set to 1
. For
MUMPS, the activation of out-of-core is handled using a dedicated switch of
test_FEMBEM
. Also, by default, HMAT starts to dump data on disk only when the
solver's consumption reaches 55% of available system memory. This leaves only
45% of available memory for all the other data and computations. Therefore, we
decrease this threshold, i.e. to 1%. We set the same threshold for qr_mumps
using the dedicated StarPU environment variable.
<<code:incore-preparation>> if test {sparse[ooc]} -gt 0 || test {dense[ooc]} -gt 0; then unset MPF_SPIDO_INCORE unset HMAT_DISABLE_OOC TOTAL_MEMORY_KiB=$(grep "MemTotal" /proc/meminfo | sed 's/[^0-9]//g') TOTAL_MEMORY_MiB=$(expr \( \( $TOTAL_MEMORY_KiB / 1024 \) / 100 \) \* {dense[ooc]}) export HMAT_LIMIT_MEM=$TOTAL_MEMORY_MiB STARPU_LIMIT_CPU_MEM=$TOTAL_MEMORY_MiB else export MPF_SPIDO_INCORE=1 HMAT_DISABLE_OOC=1 unset HMAT_LIMIT_MEM unset STARPU_LIMIT_CPU_MEM fi <<code:incore-completion>>
6.2.2.1. Custom out-of-core storage
The template for out-of-core benchmarks wrapper-ooc-custom.sh
allows for
setting TMPBASE
using a gcvb
placeholder instead of setting it statically.
export TMPBASE={job[storage]} <<code:ooc>>
6.2.2.2. TGCC out-of-core benchmarks
The template for out-of-core benchmarks wrapper-ooc-tgcc.sh
sets TMPBASE
such as to allow in-core benchmarks to run on the TGCC platform.
export TMPBASE=$CCCSCRATCHDIR if test "$PMIX_RANK" != ""; then MPI_RANK=$PMIX_RANK else echo "Failed to get the MPI of the current process! Can not proceed." >&2 exit 1 fi if test $MPI_RANK -eq 0; then echo "<PERFTESTS> MPI process count = {parallel[np]}" fi rm -rf $TMPBASE/vive-pain-au-chocolat/{@job[id]}/$MPI_RANK mkdir -p $TMPBASE/vive-pain-au-chocolat/{@job[id]}/$MPI_RANK export TMPDIR=$TMPBASE/vive-pain-au-chocolat/{@job[id]}/$MPI_RANK export MUMPS_OOC_TMPDIR=$TMPDIR export OMP_NUM_THREADS={parallel[nt]} MKL_NUM_THREADS={parallel[nt]} \ HMAT_NCPU={parallel[nt]} QRM_NCPU={parallel[nt]} if test {dense[ooc]} -gt 0; then unset MPF_SPIDO_INCORE unset HMAT_DISABLE_OOC TOTAL_MEMORY_KiB=$(grep "MemTotal" /proc/meminfo | sed 's/[^0-9]//g') TOTAL_MEMORY_MiB=$(expr \( \( $TOTAL_MEMORY_KiB / 1024 \) / 100 \) \* {dense[ooc]}) export HMAT_LIMIT_MEM=$TOTAL_MEMORY_MiB STARPU_LIMIT_CPU_MEM=$TOTAL_MEMORY_MiB else export MPF_SPIDO_INCORE=1 HMAT_DISABLE_OOC=1 unset HMAT_LIMIT_MEM unset STARPU_LIMIT_CPU_MEM fi <<code:incore-completion>>
6.2.3. FXT execution tracing template
In the template wrapper-fxt.sh
for benchmarks ensuring FXT execution tracing,
we need to set specific StarPU environment variables related to this
functionality and add the path to the StarVZ R library starvz-readme in
PATH
. Note that the out-of-core computation remains disabled.
export TMPBASE=/tmp <<code:incore-preparation>> <<code:incore-incore>> export STARPU_FXT_PREFIX=$(pwd)/ STARPU_PROFILING=1 STARPU_WORKER_STATS=1 export PATH=$GUIX_ENVIRONMENT/site-library/starvz/tools/:$PATH <<code:incore-completion>>
7. Job submission
To simplify the submission of job scripts produced by gcvb
, we use the
shell script submit.sh
.
We begin by defining a help message function that can be triggered if the script
is run with the -h
option.
function help() { echo "Submit job scripts from benchmark SESSION." >&2 echo "Usage: $(basename $0) [options]" >&2 echo >&2 echo "Options:" >&2 echo " -c Use 'ccc_msub' to submit jobs instead of 'sbatch'." >&2 echo " -h Print this help message." >&2 echo " -E REGEX Submit all job scripts except those matching REGEX" \ "(mutually exclusive with '-F')." >&2 echo " -F REGEX Submit only job scripts matching REGEX." >&2 echo -n " -l Only list the job scripts to submit instead of " >&2 echo "actually submitting them." >&2 echo " -s SESSION Session to submit the benchmarks from." >&2 }
We use a generic error message function. The error message to print is expected to be the first argument to the function. If not present, the function displays a generic error message.
function error() { if test $# -lt 1; then echo "An unknown error occurred!" >&2 else echo "Error: $1" >&2 fi }
Follows the parsing of options.
SUBMIT_CMD="sbatch" REGEX="" EXCLUDE=0 INCLUDE=0 SESSION="" LIST_ONLY=0 while getopts ":chE:F:ls:" option; do case $option in
The -c
option allows for using the TGCC's specific job submitting command
ccc_msub
instead of the usual Slurm command sbatch
.
c) SUBMIT_CMD="ccc_msub -m scratch -x" ;;
Using the -E
option, one can decide to exclude from submitting the job scripts
the names of which match a given regular expression.
E) REGEX=$OPTARG EXCLUDE=1 if test "$REGEX" == ""; then error "Bad usage of the '-E' option (empty regular expression)!" exit 1 fi ;;
Using the -F
option, one can decide to submit only the job scripts the names
of which match a given regular expression.
F) REGEX=$OPTARG INCLUDE=1 if test "$REGEX" == ""; then error "Bad usage of the '-F' option (empty regular expression)!" exit 1 fi ;;
The -l
option enables to list job scripts selected for submitting without
actually submitting them. It may be particularly useful, for example, in
combination with the -F
option to verify whether the provided regular
expression matches the intended job scripts.
l)
LIST_ONLY=1
;;
The -s
option is used to specify the benchmark session in the results
directory (see Section 1.1) to submit the job scripts from.
s) SESSION=$OPTARG if test ! -d $SESSION; then error "'$SESSION' is not a valid directory!" exit 1 fi ;;
Eventually, we have to take care of unknown options or missing arguments, if any, raise an error and terminate the script in that case.
\?) # Unknown option error "Arguments mismatch! Invalid option '-$OPTARG'." echo help exit 1 ;; :) # Missing option argument error "Arguments mismatch! Option '-$OPTARG' expects an argument!" echo help exit 1 ;; h | *) help exit 0 ;; esac done
Options -E
and -F
are mutually exclusive.
if test $EXCLUDE -ne 0 && test $INCLUDE -ne 0; then error "Options '-E' and '-F' must not be used together!" exit 1 fi
After argument parse and check, we naviagte to the root of the gcvb
filesystem
of the given benchmark session folder.
cd $SESSION/../../
We also check whether we are in a valid gcvb
filesystem contaning a
config.yaml
and a results
folder (see Section 1.1).
if test ! -f config.yaml || test ! -d results; then error "'$SESSION' is not a correct gcvb session directory!" exit 1 fi
Before submitting, we look for all the job scripts. If there is none, we do nothing.
cd $SESSION RUNS=$(ls *.sh) if test "$RUNS" == ""; then error "No job script found! Nothing to do." exit 0 fi
Then, if either the -E
or the -F
option was specified, we filter the job
scripts accordingly. We also need to return back to the root of the gcvb
filesystem.
RUNS_COUNT=0 if test $INCLUDE -ne 0; then RUNS=$(ls *.sh | grep "$REGEX") RUNS_COUNT=$(ls *.sh | grep "$REGEX" | wc -l) elif test $EXCLUDE -ne 0; then RUNS=$(ls *.sh | grep -v "$REGEX") RUNS_COUNT=$(ls *.sh | grep -v "$REGEX" | wc -l) fi cd ../../
Finally, we call the sbatch
Slurm submission command for each job script. If
the -l
option was specified, the job scripts are not submitted, only their
list is printed out on the screen.
i=1 for job_script in $RUNS; do PREFIX="[ $i / $RUNS_COUNT ]" if test $LIST_ONLY -ne 0; then echo "$PREFIX Listing job script '$job_script'." else $SUBMIT_CMD $SESSION/$job_script if test $? -ne 0; then error "$PREFIX Failed to submit batch job '$job_script'!" exit 1 else echo "$PREFIX Submitted batch job '$job_script'." fi fi i=$(expr $i + 1) done exit 0
8. Post-processing benchmark results
To visualize benchmark results we use R and rely mainly on the plot drawing
library ggplot2
ggplot2. The R script file plot.R
we describe in this
section is meant to be used from within plot drawing R source code blocks in
concerned Org documents, e.g. reports, articles and so on.
8.1. Prerequisites
At the beginning we need to import several libraries for:
manipulating data sources, i.e. R data frames, SQLite databases and JSON files,
library(plyr) library(dplyr) library(readr) library(DBI) library(rjson)
transforming data frames,
library(tidyr)
decompressing gzip files,
library(R.utils)
defining and exporting plots.
library(ggplot2) library(scales) library(grid) library(gridExtra) require(cowplot) library(stringr) library(ggrepel)
We also need to prevent using the cairo
library to produce figures. This does
not work on Linux.
set_null_device(cairo_pdf)
One can specify the name of the font family to use for typefacing text in plots
in a variable named plot_font
before sourcing this script. If this variable
does not exist the default value will be used, i.e. CMU Serif
corresponding to
the 'Computer Modern Serif' font face.
plot_font <- get0("plot_font", ifnotfound = "CMU Serif")
8.2. Importing data
8.2.1. Benchmark results
The benchmark tool we rely on, gcvb
, stores benchmark results into a local
SQLite database (see Section 1.1). We define a global variable having
the name gcvb_db_path
to store the path to the gcvb
database we are
currently using.
gcvb_db_path <- "results.db"
To extract the results from the database, we define the function gcvb_import
which takes up to three arguments:
session
indicating thegcvb
benchmark session to extract the results from (see Section 1.1),like
representing a list or a vector of benchmark name patterns,path
containing the path to thegcvb
SQLite database file and by default is set togcvb_db_path
(see above).
Using the like
argument one can extract only the results of benchmarks the
identifiers of which match given pattern or patterns (see Section
1.1). The latter may contain wildcard characters supported
by the SQL's operator LIKE
sqliteLike.
The function begins by opening a connection to gcvb
SQLite database and
setting the LIKE
keyword to be case-sensitive.
gcvb_import <- function(session, like = c(), path = gcvb_db_path) { conn <- dbConnect(RSQLite::SQLite(), path) dbExecute(conn, "PRAGMA case_sensitive_like = true;")
A gcvb
database contains 6 tables (see Figure 1).
Figure 1: EER diagram of a gcvb
database.
At first, we need to select all the benchmarks belonging to session
from the
test
table and optionally restrict the selection to benchmarks matching the
patterns of like
.
request <- "" for(p in 1:length(like)) { if(p > 1) { request <- paste(request, "OR ") } request <- paste0(request, "name LIKE '", like[p], "'") } if(length(like) > 0) { request <- paste( "SELECT id, name, start_date FROM test WHERE (", request, ") AND run_id IN", "(SELECT id FROM run WHERE gcvb_id =", session, ")" ) } else { request <- paste( "SELECT id, name, start_date FROM test WHERE run_id IN", "(SELECT id FROM run WHERE gcvb_id =", session, ")" ) } benchmarks <- dbGetQuery(conn, request)
Then, if a given benchmark was executed more then once, we want to keep only the
data associated with the latest run. To do this, we sort the data by the
execution start timestamp, i.e. the start_date
column and remove duplicates
based on benchmark identifiers in the name
column. Note that start_date
is
imported as string, so we need to convert it to a datetime format at first. At
the end, we drop the start_date
column being useless for further treatment.
benchmarks$start_date <- as.POSIXct(benchmarks$start_date) benchmarks <- benchmarks[order(benchmarks$start_date, decreasing = TRUE), ] benchmarks <- benchmarks[!duplicated(benchmarks$name), ] benchmarks <- subset(benchmarks, select = -c(start_date))
Benchmark results are stored in the separate valid
table. Each row references
a benchmark in the test
table and represents one metric and its value (see an
example in Table 1). This representation is referred to as
long format.
id | metric | value | test_id | task_step |
---|---|---|---|---|
39870 | nbrhs | 50 | 1260 | NULL |
39871 | step_mesh | 0.02241996 | 1260 | NULL |
… | … | … | … | … |
In addition to the valid
table, the file
table holds files potentially
associated with a given benchmark, e.g. resource consumption logs (see an
example in Table 2. Note that the files themselves are stored
in a gzip-compressed binary form.
id | filename | file | test_id |
---|---|---|---|
9 | rss-0.log | BLOB | 3666 |
10 | hdd-0.log | BLOB | 3666 |
11 | …/energy_scope_eprofile_1473279.txt | BLOB | 3666 |
… | … | … | … |
So, for each of the selected benchmarks, we extract the associated rows from the
valid
and the files
tables based on the test_id
key. Note that the value
column is of mixed type, i.e. it contains both strings and numerical values. To
prevent the R dbGetQuery
function from doing inappropriate type conversions,
we cast the value
column to text in the corresponding SQL request. We do the
same for the file
column in the files
table. Note also that for the union
operation below to work, we have to rename the selected columns of the files
table to match those in the valid
table.
for(row in 1:nrow(benchmarks)) { test_id <- benchmarks[row, "id"] metrics <- dbGetQuery( conn, paste( "SELECT metric, CAST(value AS TEXT) AS 'value'", "FROM valid WHERE test_id = :id", "UNION", "SELECT filename AS 'metric', QUOTE(file) AS 'value'", "FROM files WHERE test_id = :id" ), params = list(id = test_id) )
At the same time, we convert the original long format representation (see Table 1) into wide format for easier data manipulation. In the latter, to each metric corresponds a separate column (see an example in Table 3). This way, each row contains all the results of a given benchmark.
id | name | nbrhs | step_mesh | … |
---|---|---|---|---|
1260 | spido-100000 | 50 | 0.02241996 | … |
if(nrow(metrics) > 0) { for(metric in 1:nrow(metrics)) { metric_name <- metrics[metric, "metric"] if(!(metric_name %in% colnames(benchmarks))) { benchmarks[[metric_name]] <- NA } benchmarks[which(benchmarks$id == test_id), metric_name] <- metrics[metric, "value"] } } }
Finally, we convert columns containing numerical values to the appropriate data
type, close the database connection and return the resulting data frame. Note
that the names of some columns, e.g. rss-0.log
, hdd-1.log
or
tmp_energy_scope_1473279/energy_scope_eprofile_1473279.txt
may vary from one
benchmark to another so we have to determine their names dynamically using a
regular expression.
cols_to_numeric <- colnames(benchmarks) dynamic_columns <- cols_to_numeric[ grepl( paste( "^output\\.log$", "^rss.*\\.log$", "^hdd.*\\.log$", "^likwid.*\\.csv$", ".*energy_scope_eprofile.*\\.txt$", sep = "|" ), cols_to_numeric ) ] cols_to_numeric <- cols_to_numeric[! cols_to_numeric %in% c( c("name", "mapping", "ranking", "scheme", "binding", "coupled_method", "solver", "variation", "node", "symmetry", "config", "singularity_container", "system_kind", "energy_scope", "config_kind", "solver_config", "rm_values", "swap_values", "execution", "hdd_values", "output.log"), dynamic_columns )] benchmarks <- benchmarks %>% mutate_at(cols_to_numeric, as.numeric) dbDisconnect(conn) return(benchmarks) }
8.2.2. Trace files
We use multiple benchmarking tools to monitor the usage of storage resources (memory and hard drive), the power and energy consumption as well as the floating-point operation rate and memory bandwidth of our solver suite.
To monitor the memory and hard drive usage, we use a custom Python script named
rss.py
(see Section 2). A measure is taken every second during the
execution. There is one consumption log file per MPI process. The names of the
log files matche rss-N.log
for memory usage and hdd-N.log
for hard drive
usage where N
stands for the rank of the corresponding MPI process. In a log
file, each line contains the current timestamp and usage in mibibytes (MiB). The
very last line corresponds to the peak value.
To measure the power and the energy consumption of processing units and RAM, we
use the energy_scope
tool ESwebsite,ESjcad2021. In this case, the
acquisition interval can be defined by the user. As a result, energy_scope
produces an archive containing all the measures in a raw form. It can then
produce a JSON trace file based on this archive. The output file is stored under
./tmp_energy_scope_X/energy_scope_eprofile_X.txt
where X
represents the
identifier of the corresponding Slurm job (see Section
1.1).
Finally, to measure the floating-point operation rate and the memory bandwidth,
we rely on the likwid
tool likwid,treibig2010likwid. likwid
benchmarks
each MPI process of the application separately and stores all the measures into
a comma-separated value file the name of which mathes likwid-N.csv
where N
stands for the rank of the corresponding MPI process.
At the end of a benchmark execution, the above log and trace files are compressed and stored into the SQLite database holding all the benchmark results (see Section 8.2.1). Therefore, in the first place, we need to define a function allowing us to extract the files from the database and decompress them.
8.2.3. Retrieving files
The function gcvb_retrieve_files
takes a data frame containing the results of
one single benchmark, through the benchmark
argument, retrieved by the
gcvb_import
function (see Section 8.2.1) and
formatted using the gcvb_format
function (see Section
8.4.1), extracts the corresponding trace files and
decompresses them into the current working directory. Note that the
energy_scope
trace file is renamed to eprofile.txt
to ensure a static file
name and simplify its later processing.
In the first place, we need to dynamically determine the names of the columns containing the log and trace files based on the patterns discussed above.
gcvb_retrieve_files <- function(benchmark) { filename_pattern <- paste( "^output\\.log$", "^rss.*\\.log$", "^hdd.*\\.log$", "^likwid.*\\.csv$", ".*energy_scope_eprofile.*\\.txt$", sep = "|" ) trace_file_columns <- colnames(benchmark) trace_file_columns <- trace_file_columns[ grepl(filename_pattern, trace_file_columns) ]
Then, we remove any previously retrieved files matching the same file name patterns to prevent data mismatch in plot functions (see Section 8.6)
filename_pattern <- paste( "^output\\.log$", "^rss.*\\.log$", "^hdd.*\\.log$", "^likwid.*\\.csv$", "^eprofile\\.txt$", sep = "|" ) for(file in list.files(pattern = filename_pattern)) { unlink(file) }
and iterate over the corresponding columns in benchmark
and:
for(trace_file_column in trace_file_columns) {
retrieve the compressed version of the file as string,
trace_file_name <- benchmark[[trace_file_column]]
if the file is not empty, strip the
X'
prefix and the'
suffix from this string,if(!is.na(trace_file_name)) { trace_file_name <- substring( trace_file_name, 3, nchar(trace_file_name) - 1 )
split the string into a vector of two-character strings representing the hexadecimal binary content of the file,
trace_file_name <- strsplit(trace_file_name, "")[[1]] trace_file_name <- paste0( trace_file_name[c(TRUE, FALSE)], trace_file_name[c(FALSE, TRUE)] )
change the output file name to
eprofile.txt
, if the file corresponds to anenergy_scope
trace file,if(grepl("energy_scope", trace_file_column, fixed = TRUE)) { filename <- "eprofile.txt.gz" } else { filename <- paste0(basename(trace_file_column), ".gz") }
convert the vector of two-character strings into a raw binary form and restore the corresponding gzip-compressed file,
writeBin(as.raw(as.hexmode(trace_file_name)), filename)
decompress the file into the current working directory and remove its compressed counterpart.
gunzip(filename, overwrite = TRUE) } } }
8.2.3.1. Reading rss.py
trace files
The function read_rss
constructs an R data frame based on memory usage and
optionally on hard drive usage trace files (if the include_hdd
switch is set
to TRUE
) located in logs_path
. If a named vector is provided through the
optional aesthetics
argument, the function adds extra constant columns into
the data frame using the names from the named vector and initialized with the
values of the named vector. Adding such columns may be useful later for using
aesthetics function (see Section 8.5) when drawing plots. If the
output of the function is intended to be combined with an energy_scope
trace
(see Section 8.2.3.3), the optional override_beginning
argument can be
used to synchronize the output data frame according to the energy_scope
trace
by considering the datetime given by this argument as the alternative beginning
of the acquisition. Note that the energy_scope
trace always starts first as
energy_scope
is always launched before the other benchmarking tools.
The output
data frame shall contain four columns, the timestamp
column
giving the time of the measure in milliseconds since the beginning of the
acquisition, the corresponding consumption
value, the kind
of the measure
(sram
for memory usage or shdd
for hard drive usage) and the process
rank
column.
read_rss <- function(logs_path, include_hdd = FALSE, aesthetics = NA, override_beginning = NA) { columns <- c("timestamp", "consumption", "kind", "process") output <- data.frame(matrix(nrow = 0, ncol = length(columns))) colnames(output) <- columns
We begin by listing all the memory and, optionally, hard drive usage trace files
in logs_path
.
pattern <- ifelse(include_hdd, "rss|hdd.*\\.log", "rss.*\\.log") logs <- list.files( logs_path, pattern = pattern, full.names = TRUE, include.dirs = FALSE )
Then, we read each log file, strip its last line containing the peak value,
which we do not use here, and append the data into the output data frame data
.
Note that in a previous version of rss.py
, the trace file format was
different. There was only one column in the trace file corresponding to the
amount of memory or hard drive used. For backward compatibility with old trace
files, we continue supporting the former trace file format too.
for(log in logs) { count <- length(readLines(log, warn = FALSE)) - 1 data <- read.table(log, nrow = count) if(ncol(data) < 2) { # Legacy format colnames(data) <- c("consumption") data$timestamp <- 0:(count - 1) } else { if(ncol(data) > 2) { # Newest format including swap monitoring colnames(data) <- c("timestamp", "residual", "swap") data$consumption <- data$residual + data$swap data <- data[, !(names(data) %in% c("residual", "swap"))] } else { # Older format without swap monitoring colnames(data) <- c("timestamp", "consumption") } data$timestamp <- as.POSIXct(data$timestamp, format = "%Y/%m/%dT%H:%M:%S") start <- min(data$timestamp)
If the override_beginning
argument was provided, we need to synchronize the
timestamp
column in such a ways as to consider override_beginning
to be the
beginning of the trace.
if(!is.na(override_beginning)) { beginning <- as.integer(min(data$timestamp) - override_beginning) data$timestamp <- data$timestamp + beginning }
Then, we convert the timestamp to milliseconds and the measured value to gibibytes (GiB).
data$timestamp <- (as.integer(data$timestamp) - as.integer(start)) * 1000 } data$consumption <- as.numeric(data$consumption / 1024.)
Based on the name of the currently processed trace file, we determine the kind of the measure as well as the MPI process rank. We merge the data frame corresponding to the currently processed trace file with the global output data frame and continue with the other trace files, if any.
base <- basename(log) kind <- ifelse(!include_hdd || startsWith(base, "rss"), "sram", "shdd") data$kind <- rep(kind, nrow(data)) process <- gsub("([a-z]+)-(\\d+)\\.log", "\\2", base) data$process <- rep(as.integer(process), nrow(data)) output <- rbind(output, data) }
In case of multiple log files per benchmark, i.e. when a benchmark is run in a distributed fashion, we need to sum up the consumption of all the processes.
processes <- max(output$process) if(processes > 0) { output <- output %>% group_by(timestamp, kind) %>% summarise( consumption = sum(consumption) ) }
Finally, we process the aesthetics
parameter, if provided, and return the
output data frame.
if(length(aesthetics) > 0) { for(aesthetic in aesthetics) { name <- names(aesthetics)[aesthetics == aesthetic] output[[name]] <- rep(as.character(aesthetic), nrow(output)) } } return(output) }
8.2.3.2. Reading likwid
trace files
The function read_likwid
constructs a data frame based on likwid
trace files
located in logs_path
. If a named vector is provided through the optional
aesthetics
argument, the function adds extra constant columns into the data
frame using the names from the named vector and initialized with the values of
the named vector (see Section 8.2.3.1). If the output of the function is
intended to be combined with an energy_scope
trace (see Section 8.2.3.3),
the optional shift
argument can be used to synchronize the output data frame
according to the energy_scope
trace by shifting the output timestamps by the
value (in millisecons) of this argument. metrics
indicates the number of
metrics to look for in trace file. The default value is 2, i.e. flop rate and
memory bandwidth (see below). As the measuring tool provides only timestamps
relative to the beginning of the execution, we are unable to provide a more
precise synchronization procedure. Therefore, we fallback to considering the
beginning of the rss.py
trace (see Section 8.2.3.1) corresponding to the
same test case as t0 for the likwid
trace file.
The output
data frame shall contain four columns, the timestamp
column
giving the time of the measure in milliseconds since the beginning of the
acquisition, the corresponding consumption
value, the kind
of the measure
(flops
for floating-point operation rate and bw
for memory bandwidth) and
the process
rank column.
read_likwid <- function(logs_path, aesthetics = NA, shift = 0, metrics = 2) { columns <- c("timestamp", "consumption", "kind", "process") output <- data.frame(matrix(nrow = 0, ncol = length(columns))) colnames(output) <- columns
We begin by listing all the trace files in logs_path
.
pattern <- "likwid.*\\.csv" logs <- list.files( logs_path, pattern = pattern, full.names = TRUE, include.dirs = FALSE )
A likwid
trace file has two or more heading lines at its beginning. The count
of heading lines depends on the count of monitored hardware performance counters
or groups of the latter. In our case, we use two groups of performance counters,
one for floating-point operation rate and one for memory bandwidth. We thus have
three heading lines in our likwid-N.csv
trace files. Then, the first four
columns (# GID
, MetricsCount
, CpuCount
and Total runtime [s]
) of the
trace file are common for all the performance counters. Each group of
performance counters may have a different number of associated metrics and thus
a different number of associated columns in the trace file. The MetricsCount
column gives the number of metrics and allows to compute the number of columns
in the trace file associated to a given performance group. In our case, there
are 6 metrics for the floating-point operation rate group and 10 for the memory
bandwidth group. Moreover, there is a separate column for each processing unit
and for each metric. Note that the number of processing units used during the
execution is given by the CpuCount
column. In summary, the trace file contains
4 common columns followed by 6 × CpuCount
columns contaning the values for the
metrics of the floating-point operation rate group followed by 10 × CpuCount
columns containing the values for the metrics of the memory bandwidth group.
We read each log file and:
for(log in logs) { input <- read.csv(log, header = FALSE, skip = metrics + 1)
determine the count of processing units used for the corresponding test case,
cpus <- as.integer(input[1, 3])
create an intermediate data frame with the timestamps, the floating-point operation rate, corresponding to the metric
Packed DP [MFLOP/s]
, and the memory bandwidth, corresponding to the metricMemory bandwidth [MBytes/s]
,data_flops <- data.frame( timestamp = input[which(input[, 2] == 6), c(4)], consumption = rowSums( input[which(input[, 2] == 6), c((4 + 5 * cpus + 1):(4 + 6 * cpus - 1))] ) ) if(metrics > 1) { data_bw <- data.frame( timestamp = input[which(input[, 2] == 10), c(4)], consumption = rowSums( input[which(input[, 2] == 10), c((4 + 8 * cpus + 1):(4 + 9 * cpus - 1))] ) ) }
initialize the
kind
columns in these separate data frames,data_flops$kind <- rep("flops", nrow(data_flops)) if(metrics > 1) { data_bw$kind <- rep("bw", nrow(data_bw)) }
merge the separate data frames again into a unique data frame,
if(metrics > 1) { data <- rbind(data_flops, data_bw) } else { data <- data_flops }
convert the timestamp to milliseconds and synchronize it if the
shift
argument was provided,data$timestamp <- as.integer(round(data$timestamp, 0) * 1000) if(shift > 0) { data$timestamp <- data$timestamp + shift }
determine the MPI process rank corresponding to the current trace file based on the name of the latter and initialize the
rank
column,process <- gsub("([a-z]+)-(\\d+)\\.csv", "\\2", basename(log)) data$process <- rep(as.integer(process), nrow(data))
merge the data frame corresponding to the currently processed trace file with the global output data frame and continue with the other trace files, if any.
output <- rbind(output, data) }
In case of multiple log files per benchmark, i.e. when a benchmark is run in distributed fashion, we need to sum up the consumption of all the processes.
processes <- max(output$process) if(processes > 0) { output <- output %>% group_by(timestamp, kind) %>% summarise( consumption = sum(consumption) ) }
Finally, we process the aesthetics
parameter, if provided, and return the
output data frame.
if(length(aesthetics) > 0) { for(aesthetic in aesthetics) { name <- names(aesthetics)[aesthetics == aesthetic] output[[name]] <- rep(as.character(aesthetic), nrow(output)) } } return(output) }
8.2.3.3. Reading energy_scope
trace files
The function read_es
constructs an R data frame based on a given
energy_scope
JSON data file es_data
. If a named vector is provided through
the optional aesthetics
argument, the function adds extra constant columns
into the data frame using the names from the named vector and initialized with
the values of the named vector. Adding such columns may be useful later for
using aesthetics function (see Section 8.5) when drawing plots.
The data frame contains also tags. Benchmark executable may print special
energy_scope
tags to delimit the beginning and the end of a particular portion
of computation in the energy_scope
trace. If the only
optional argument is
provided, only the tags specified by the latter are treated. The interval
argument allows one to set the interval (in milliseconds) the measures were
taken in. By default, the function considers the interval of 1000 milliseconds.
The resulting data frame is composed of the following columns:
tag
representing the tag label,type
indicating whether a line corresponds to a start or a stop tag,type_label
holding prettified version oftype
for plot drawing,timestamp
representing the time of the measure in milliseconds since the beginning of the acquisition,consumption
representing current power consumption of CPU or RAM (in Watts [W]),kind
indicating whetherconsumption
represents power consumption of CPU or RAM,aesthetics
representing an optional named vector describing constant columns to be added to the output data frame in order to simplify plot drawing based on the latter.
Note that the format of the output data frame is compatible, in terms of columns
it conains, to the format of data frames produced by the functions read_rss
(see Section 8.2.3.1) and read_likwid
(see Section 8.2.3.2) for
reading trace files produced by the other benchmarking tools we use. This is to
allow for combining different kinds of benchmark data. Indeed, one can provide
this function with a data frame obtained with read_rss
through the rss_data
argument and extend the output data frame with the measures done by the rss.py
monitoring script (see Section 2). Similarly, we can extend the output
data frame with the measures done by likwid
and extracted using the
read_likwid
function through the likwid_data
argument. likwid_metrics
indicates the number of metrics to look for in trace file. The default value
is 2, i.e. flop rate and memory bandwidth (see Section 8.2.3.2). The
timestamp
column present also in rss_data
as well as in likwid_data
shall
allow us to synchronize the measures.
The first step is to read the data from the JSON data file at es_data
,
read_es <- function(es_data, aesthetics = c(), only = NA, interval = 1000, rss_data = NA, include_hdd = FALSE, likwid_data = NA, likwid_metrics = 2) { es_raw <- fromJSON(file = es_data)
extract the timestamp corresponding to the beginning of acquisition
es_start <- as.POSIXct( es_raw$data$data$tags$es_appli_total$start, format = "%Y/%m/%d %H:%M:%S" )
and the relative start and stop dates of acquisition.
es_start_sec <- as.numeric( es_raw$data$data$tags$es_appli_total$`start(sec)` ) es_stop_sec <- as.numeric( es_raw$data$data$tags$es_appli_total$`stop(sec)` )
Then, we collect CPU and RAM power consumption data into the respective data
frames es_cpu
and es_ram
. The timestamp
column is generated based on
es_start
. Note that the acquisition interval is one second.
es_cpu <- data.frame( consumption = as.numeric( es_raw$data$data$`ecpu(W)`[(es_start_sec + 1):(es_stop_sec + 1)] ), timestamp = as.integer(0) ) for(i in 2:nrow(es_cpu)) { es_cpu$timestamp[i] <- es_cpu$timestamp[i - 1] + interval } es_cpu$kind <- rep("ecpu", nrow(es_cpu)) es_ram <- data.frame( power_total = as.numeric( es_raw$data$data$`etotal(W)`[(es_start_sec + 1):(es_stop_sec + 1)] ), power_cpu = as.numeric( es_raw$data$data$`ecpu(W)`[(es_start_sec + 1):(es_stop_sec + 1)] ), timestamp = as.integer(0) ) es_ram$consumption <- es_ram$power_total - es_ram$power_cpu es_ram <- subset(es_ram, select = c("consumption", "timestamp")) for(i in 2:nrow(es_ram)) { es_ram$timestamp[i] <- es_ram$timestamp[i - 1] + interval } es_ram$kind <- rep("eram", nrow(es_ram))
We also make a local copy of rss_data
if provided.
rss_ram <- NA rss_hdd <- NA likwid <- NA if(!is.na(rss_data)) { rss <- read_rss(rss_data, include_hdd, aesthetics, es_start) rss_ram <- subset(rss, kind == "sram") if(include_hdd) { rss_hdd <- subset(rss, kind == "shdd") } actual_start <- min(rss_ram$timestamp) actual_end <- max(rss_ram$timestamp) es_cpu <- subset( es_cpu, timestamp >= actual_start & timestamp <= actual_end ) es_ram <- subset( es_ram, timestamp >= actual_start & timestamp <= actual_end ) } if(!is.na(likwid_data)) { shift <- 0 if(!is.na(rss_data)) { shift <- min(rss_ram$timestamp) } likwid <- read_likwid(likwid_data, aesthetics, shift, likwid_metrics) }
Now, we initialize additional columns to store tag information.
es_cpu$tag <- NA es_cpu$type <- NA es_cpu$type_label <- NA es_ram$tag <- NA es_ram$type <- NA es_ram$type_label <- NA if(!is.na(rss_data)) { rss_ram$tag <- NA rss_ram$type <- NA rss_ram$type_label <- NA if(include_hdd) { rss_hdd$tag <- NA rss_hdd$type <- NA rss_hdd$type_label <- NA } } if(!is.na(likwid_data)) { likwid$tag <- NA likwid$type <- NA likwid$type_label <- NA }
For each tag considered, we add four new lines into the es
data frame, i.e.
one line with start time and CPU power consumption, one for stop time and CPU
power consumption, one for start time and RAM power consumption and one for
stop time and RAM power consumption.
If during benchmark program execution a loop generates more than one occurrence
of a given energy_scope
tag, a unique identifier is appended to the latter,
e.g. tag_0
, tag_1
, etc. Otherwise, energy_scope
would discard all the data
related to the tag. However, the presence of such identifiers complicates the
post-treatment. Therefore, we need to strip them off the tag name tag
but keep
them for later usage in instance
.
for(entire_tag in names(es_raw$data$data$tags)) { tag <- sub("_[^_]+$", "", entire_tag) instance <- tail(strsplit(entire_tag, "_")[[1]], n = 1) instance <- suppressWarnings(as.integer(instance)) instance <- ifelse(is.na(instance), "i", instance)
Here we use the imported JSON data file es_raw
to get the power consumption
values as well as the start and stop timestamps of tags.
if(is.na(only) || tag %in% only) { es_start <- as.integer(es_start) start_time <- as.integer( as.POSIXct( es_raw$data$data$tags[[entire_tag]]$start, format = "%Y/%m/%d %H:%M:%S" ) ) stop_time <- as.integer( as.POSIXct( es_raw$data$data$tags[[entire_tag]]$stop, format = "%Y/%m/%d %H:%M:%S" ) ) begin <- (start_time - es_start) * 1000 end <- (stop_time - es_start) * 1000 row <- data.frame( tag = as.character(tag), type = "start", type_label = paste0( "bold(alpha)~", label_tag_short(c(tag), instance) ), timestamp = begin, consumption = es_cpu[ es_cpu$timestamp == begin & is.na(as.character(es_cpu$tag)), "consumption" ], kind = "ecpu" ) es_cpu <- rbind(es_cpu, row) row <- data.frame( tag = as.character(tag), type = "stop", type_label = paste0( "bold(omega)~", label_tag_short(c(tag), instance) ), timestamp = end, consumption = es_cpu[ es_cpu$timestamp == end & is.na(as.character(es_cpu$tag)), "consumption" ], kind = "ecpu" ) es_cpu <- rbind(es_cpu, row) row <- data.frame( tag = as.character(tag), type = "start", type_label = paste0( "bold(alpha)~", label_tag_short(c(tag), instance) ), timestamp = begin, consumption = es_ram[ es_ram$timestamp == begin & is.na(as.character(es_ram$tag)), "consumption" ], kind = "eram" ) es_ram <- rbind(es_ram, row) row <- data.frame( tag = as.character(tag), type = "stop", type_label = paste0( "bold(omega)~", label_tag_short(c(tag), instance) ), timestamp = end, consumption = es_ram[ es_ram$timestamp == end & is.na(as.character(es_ram$tag)), "consumption" ], kind = "eram" ) es_ram <- rbind(es_ram, row) if(!is.na(rss_data)) { consumption <- rss_ram[ rss_ram$timestamp == begin & is.na(as.character(rss_ram$tag)), "consumption" ] consumption <- ifelse(length(consumption) > 0, consumption, NA) if(!is.na(consumption)) { row <- data.frame( tag = as.character(tag), type = "start", type_label = paste0( "bold(alpha)~", label_tag_short(c(tag), instance) ), timestamp = begin, consumption = as.numeric(consumption), kind = "sram" ) rss_ram <- rbind.fill(rss_ram, row) } consumption <- rss_ram[ rss_ram$timestamp == end & is.na(as.character(rss_ram$tag)), "consumption" ] consumption <- ifelse(length(consumption) > 0, consumption, NA) if(!is.na(consumption)) { row <- data.frame( tag = as.character(tag), type = "stop", type_label = paste0( "bold(omega)~", label_tag_short(c(tag), instance) ), timestamp = end, consumption = as.numeric(consumption), kind = "sram" ) rss_ram <- rbind.fill(rss_ram, row) } if(include_hdd) { consumption <- rss_hdd[ rss_hdd$timestamp == begin & is.na(as.character(rss_hdd$tag)), "consumption" ] consumption <- ifelse(length(consumption) > 0, consumption, NA) if(!is.na(consumption)) { row <- data.frame( tag = as.character(tag), type = "start", type_label = paste0( "bold(alpha)~", label_tag_short(c(tag), instance) ), timestamp = begin, consumption = as.numeric(consumption), kind = "shdd" ) rss_hdd <- rbind.fill(rss_hdd, row) } consumption <- rss_hdd[ rss_hdd$timestamp == end & is.na(as.character(rss_hdd$tag)), "consumption" ] consumption <- ifelse(length(consumption) > 0, consumption, NA) if(!is.na(consumption)) { row <- data.frame( tag = as.character(tag), type = "stop", type_label = paste0( "bold(omega)~", label_tag_short(c(tag), instance) ), timestamp = end, consumption = as.numeric(consumption), kind = "shdd" ) rss_hdd <- rbind.fill(rss_hdd, row) } } } if(!is.na(likwid_data)) { consumption <- likwid[ likwid$timestamp == begin & is.na(as.character(likwid$tag)), "consumption" ] kind <- likwid[ likwid$timestamp == begin & is.na(as.character(likwid$tag)), "kind" ] consumption <- ifelse(length(consumption) > 0, consumption, NA) kind <- ifelse(length(kind) > 0, kind, NA) if(!is.na(consumption)) { row <- data.frame( tag = as.character(tag), type = "start", type_label = paste0( "bold(alpha)~", label_tag_short(c(tag), instance) ), timestamp = begin, consumption = as.numeric(consumption), kind = kind ) likwid <- rbind.fill(likwid, row) } consumption <- likwid[ likwid$timestamp == end & is.na(as.character(likwid$tag)), "consumption" ] kind <- likwid[ likwid$timestamp == end & is.na(as.character(likwid$tag)), "kind" ] consumption <- ifelse(length(consumption) > 0, consumption, NA) kind <- ifelse(length(kind) > 0, kind, NA) if(!is.na(consumption)) { row <- data.frame( tag = as.character(tag), type = "stop", type_label = paste0( "bold(omega)~", label_tag_short(c(tag), instance) ), timestamp = end, consumption = as.numeric(consumption), kind = kind ) likwid <- rbind.fill(likwid, row) } } } }
If the aesthetics
argument is provided, we include constant columns described
by the latter to the resulting data frame. Finally, we return a single data
frame combining es_cpu
, es_ram
and es_rss
if the rss_data
argument was
provided. In this case, we also include an additional column to the three data
frames to distinguish between power and storage resources consumption. This is
useful for plotting figures (see Section 8.6.12).
if(length(aesthetics) > 0) { for(aesthetic in aesthetics) { name <- names(aesthetics)[aesthetics == aesthetic] es_cpu[[name]] <- rep(as.character(aesthetic), nrow(es_cpu)) es_ram[[name]] <- rep(as.character(aesthetic), nrow(es_ram)) if(!is.na(rss_data)) { rss_ram[[name]] <- rep(as.character(aesthetic), nrow(rss_ram)) if(include_hdd) { rss_hdd[[name]] <- rep(as.character(aesthetic), nrow(rss_hdd)) } } if(!is.na(likwid_data)) { likwid[[name]] <- rep(as.character(aesthetic), nrow(likwid)) } } } es <- rbind(es_cpu, es_ram) es$kind_general <- "e" if(!is.na(likwid_data)) { likwid$kind_general <- likwid$kind es <- rbind.fill(es, likwid) } if(!is.na(rss_data)) { rss_ram$kind_general <- "s" if(include_hdd) { rss_hdd$kind_general <- "s" es <- rbind.fill(es, rbind(rss_ram, rss_hdd)) } else { es <- rbind.fill(es, rss_ram) } es$timestamp <- es$timestamp - min(rss_ram$timestamp) } return(es) }
8.2.3.4. Reading timeline trace files
The function read_timeline
allows us to read a test_FEMBEM
timeline trace
trace
, extract data about selected computation phases
from the latter and
return it in form of a data frame suitable for drawing axis intercepting lines
with the associated labels using the rss_by_time
plot function (see Section
8.6.11).
phases
represents a data frame of selected computation phases to extract
information about. Note that one computation phase label
can appear more than
once if the corresponding function was called multiple times. This is why the
data frame contains the occurrence
column (see Table 4)
indicating which occurrence of a given phase should be considered. The type
column indicates whether a given row in the data frame designs the beginning
(ENTER
) or the end (EXIT
) of the corresponding computation phase. In this
data frame, one can provide additional information, i.e. prettified computation
phase labels for better data readability (see Table 4).
The phases
data frame is used to construct the output data frame of this
function.
type | name | label | occurrence |
---|---|---|---|
EXIT | mpf_solvegemm() | Block-wise Schur complement computation | 1 |
… | … | … | … |
We begin by reading the timeline trace trace
.
read_timeline <- function(trace, phases) { complete_timeline <- read.csv(trace, header = FALSE)
For an easier column access, we add column names to the newly created data frame.
colnames(complete_timeline) <- c("rank", "time", "type", "duration", "path")
Then, we construct the resulting timeline
data frame containing only the data
we are interested in based on the input phases
data frame. The new data frame
contains three additional columns:
time
providing the corresponding computation phase delimiter timestamp,zero
a constant zero column useful for plotting vertical or horizontal axis intercepts,middle
providing the values representing the middles of the intervals defined by the values of thetime
column useful to center phase labels in therss_by_time
plot function (see Section 8.6.11).
timeline <- phases timeline$time <- NA timeline$middle <- NA timeline$zero <- rep(0, nrow(timeline)) for(i in 1:length(phases)) { time <- complete_timeline[ complete_timeline$type == phases[i, "type"] & str_detect(complete_timeline$path, paste0(phases[i, "name"], "$")), "time" ] time <- time[[as.integer(phases[i, "occurrence"])]] print(time) middle <- ifelse( i > 1, time - ((time - timeline[i - 1, "time"]) / 2), time / 2 ) timeline[i, "time"] <- as.integer(time) timeline[i, "middle"] <- as.double(middle) }
At the end, we return the timeline
data frame.
return(timeline)
}
8.3. Compressing data
8.3.1. compress_consumption
The larger a trace file (see Section 8.2.2), the larger the data frame produced by the associated parsing function defined in Section 8.2.3 and consequently the bigger the final figures (in terms of file size) plotted based on these data frames. This phenomenon may then significantly the time needed to produce PDF documents featuring the aforementionned figures.
To prevent this, we define here a function named compress_consumption
allowing
us to reduce the number of entries in a given data frame produced by one of the
functions in Section 8.2.3 except for read_timeline
(see
Section 8.2.3.4). compress_consumption
takes two
arguments, the data
frame to compress and a threshold epsilon
(see below).
compress_consumption <- function(data, epsilon) {
data
may contain tag information when passed through the read_es
function
(see Section 8.2.3.3). At first, we need to identify the corresponding
entries in data
in order to be able to preserve them during compression.
keep <- subset(data, !is.na(tag)) keep <- keep$timestamp
The idea of this compression function is to start with the first entry base
in
data
, iterate over the data points in the latter and drop
all the points
until the difference between base
and the current data point exceeds the
threshold epsilon
. At that point, we choose the current point as base
and
continue. Note that a data point can be dropped only if it is not in keep
and
if the jump between the current and the very next point is not greater than
epsilon
. This is to prevent the degradation of peak curve shapes in figures.
drop <- c() data_size <- nrow(data) base <- data[1, "consumption"] for(i in 2:(data_size - 1)) { c_timestamp <- data[i, "timestamp"] c_consumption <- data[i, "consumption"] if(abs(base - c_consumption) > epsilon) { base <- c_consumption } else if(!(c_timestamp %in% keep)) { if(abs(data[i + 1, "consumption"] - c_consumption) <= epsilon) { drop <- c(drop, i) } } } data <- data[-drop, ]
Before returning the compressed data frame, we print out some compression statistics.
print(paste0("Removed ", length(drop), " points out of ", data_size, ".")) return(data) }
8.3.2. compress_global
The read_es
function allows one to combine data frames produced by all of the
functions in Section 8.2.3 except for read_timeline
(see
Section 8.2.3.4). To simplify compression of such
combined data frames, we define here a function named compress_global
allowing
to apply compress_consumption
(see Section 8.3.1) on each
subpart of a combined data frame.
compress_global <- function(data, epsilon) { kinds <- unique(na.omit(data$kind)) compressed <- NA compressed_is_na <- TRUE for(c_kind in kinds) { c_compressed <- compress_consumption(subset(data, kind == c_kind), epsilon) if(compressed_is_na) { compressed <- c_compressed compressed_is_na <- FALSE } else { compressed <- rbind(compressed, c_compressed) } } return(compressed) }
8.4. Formatting data
8.4.1. Benchmark results
Before actually visualizing data, we have to format it. This is the role of the
function gcvb_format
which takes a single argument - data
, an R data frame
containing benchmark results imported by the gcvb_import
function (see Section
8.2).
gcvb_format <- function(data) {
Due to the way certain benchmarks are defined in gcvb
, the solver
column may
contain non-uniform values, i.e., in some cases, coupled solver names use dashes
instead of slashes, which are the expected separators. For example, we need to
replace the dash in mumps-spido
by a slash so it becomes mumps/spido
.
if("solver" %in% colnames(data)) { data$solver <- ifelse( data$solver == "mumps-spido", "mumps/spido", ifelse(data$solver == "mumps-hmat", "mumps/hmat", data$solver) ) }
Names of columns holding the same information, such as factorization and solve time, expected result accuracy and so on, vary depending on the solver used during a given benchmark. To ease further data treatment, we fusion these into common columns.
data$tps_facto <- ifelse( is.na(data$tps_cpu_facto_mpf), data$tps_cpu_facto, data$tps_cpu_facto_mpf ) data$tps_solve <- ifelse( is.na(data$tps_cpu_solve_mpf), data$tps_cpu_solve, data$tps_cpu_solve_mpf ) data$desired_accuracy <- ifelse( !is.na(data$mumps_blr_accuracy), data$mumps_blr_accuracy, NA ) data$size_schur <- ifelse( is.na(data$size_schur) & data$coupled_method == "multi-facto", data$disk_block_size, data$size_schur ) if(all(c("cols_schur", "solver", "coupled_method", "coupled_nbrhs") %in% colnames(data))) { data$cols_schur <- ifelse( data$coupled_method == "multi-solve" & data$solver == "mumps/spido", data$coupled_nbrhs, data$cols_schur ) } if("solver" %in% colnames(data)) { data$desired_accuracy <- ifelse( (data$solver == "hmat-bem" | data$solver == "hmat-fem" | data$solver == "hmat-fem-nd") & !is.na(data$h_assembly_accuracy), data$h_assembly_accuracy, data$desired_accuracy ) }
Then, we must remove from the data frame the lines corresponding to unfinished jobs, i.e. lines without computation time information or where the computation time is null.
data <- subset(data, !is.na(tps_facto)) data <- subset(data, !is.na(tps_solve)) data <- subset(data, tps_facto > 0.0) data <- subset(data, tps_solve > 0.0)
If the dense_ooc
column is present, we have to take care of NA
values so
that we can make the column a factor. NA
appears in case of purely in-core
benchmarks that do not position this variable at all. Finaly, we replace NA
by
-1
.
if("dense_ooc" %in% colnames(data)) { data$dense_ooc <- ifelse( is.na(data$dense_ooc), -1, data$dense_ooc ) data$dense_ooc[data$dense_ooc > 0] <- 1 data$dense_ooc <- as.character(data$dense_ooc) data$dense_ooc <- factor(data$dense_ooc, levels = c("-1", "0", "1")) }
The same goes for the energy_scope
column. In this case, NA
means that the
tool was not used.
if("energy_scope" %in% colnames(data)) { data$energy_scope <- ifelse( is.na(data$energy_scope), "off", data$energy_scope ) data$energy_scope <- as.character(data$energy_scope) data$energy_scope <- factor(data$energy_scope, levels = c("off", "on")) }
The memory usage peaks are held in two separate columns rm_peak
, corresponding
to the VmRSS
field in /proc/[PID]/status
, and swap_peak
, corresponding to
the VmSwap
field in /proc/[PID]/status
, which we need to sum up to obtain
the total peak value proc5.
if("swap_peak" %in% colnames(data)) { data$swap_peak <- ifelse( is.na(data$swap_peak), 0, data$swap_peak ) data$rm_peak <- data$rm_peak + data$swap_peak data <- data[, !(names(data) %in% c("swap_peak"))] }
Some required columns are not present in the data frame by default, so we have to compute them based on existing data.
The total of cores used for computation is the combination of MPI process count and thread count.
data$p_units <- data$omp_thread_num * data$processes
The value of n_blocks
represents the number of blocks the dense submatrix
\(A_{ss}\) is split into during the Schur complement computation in the two-stage
multi-factorization implementation. When this value is not explicitly set by the
user, n_blocks
equals auto
. Although, for the plot drawing functions to
operate correctly, we need to recompute the corresponding numerical values.
if("n_blocks" %in% colnames(data)) { data$n_blocks <- ifelse( data$n_blocks == "auto", ceiling(data$nbem / data$size_schur), as.numeric(data$n_blocks) ) }
Eventually, factorization and solve efficiency computation is split into several steps:
- gathering best sequential times for each solver,
if("solver" %in% colnames(data)) { sequentials <- subset( data, select = c("solver", "tps_facto", "tps_solve", "nbpts"), data$p_units == 1 ) by_node <- FALSE if(nrow(sequentials) < 1) { sequentials <- subset( data, select = c("solver", "tps_facto", "tps_solve", "nbpts"), data$processes == 1 ) by_node <- TRUE } if(nrow(sequentials) > 0) { sequentials <- merge( aggregate(tps_facto ~ solver + nbpts, min, data = sequentials), sequentials ) sequentials <- sequentials %>% dplyr::rename( tps_facto_seq = tps_facto, tps_solve_seq = tps_solve )
- adding a column containing sequential time for every run in the data frame,
data <- dplyr::left_join(data, sequentials, by = c("solver", "nbpts"))
- computing the efficiency itself.
if(by_node) { data$efficiency_facto <- data$tps_facto_seq / (data$processes * data$tps_facto) data$efficiency_solve <- data$tps_solve_seq / (data$processes * data$tps_solve) data$efficiency_total <- (data$tps_facto_seq + data$tps_solve_seq) / (data$processes * (data$tps_facto + data$tps_solve)) } else { data$efficiency_facto <- data$tps_facto_seq / (data$p_units * data$tps_facto) data$efficiency_solve <- data$tps_solve_seq / (data$p_units * data$tps_solve) data$efficiency_total <- (data$tps_facto_seq + data$tps_solve_seq) / (data$p_units * (data$tps_facto + data$tps_solve)) } } } return(data) }
8.5. Style presets
We also define a common set of data visualization elements to ensure a unified graphic output when drawing plots.
8.5.1. Label functions
Label functions allow us to format and prettify column names or values when used in plot legends.
8.5.1.1. label_percentage
This function multiplies value
by 100 and returns the results as a string with
appended percentage sign.
label_percentage <- function(value) { return(paste0(value * 100, "%")) }
8.5.1.2. label_time
This function converts time values in seconds into more human-readable equivalents.
label_time <- function(labels) { out <- c() for (i in 1:length(labels)) { if(!is.na(labels[i])) { days <- as.integer(labels[i] / 86400) hours <- as.integer((labels[i] - (86400 * days)) / 3600) minutes <- as.integer((labels[i] - (86400 * days) - (3600 * hours)) / 60) seconds <- as.integer( labels[i] - (86400 * days) - (3600 * hours) - (60 * minutes) ) out[i] <- "" if(days > 0) { out[i] <- paste0(out[i], days, "d") } if(hours > 0) { out[i] <- paste0(out[i], hours, "h") } if(minutes > 0 && days < 1) { out[i] <- paste0(out[i], minutes, "m") } if(seconds > 0 && days < 1 && hours < 1) { out[i] <- paste0(out[i], seconds, "s") } if(seconds < 1 && minutes < 1 && hours < 1 && days < 1) { out[i] <- "0s" } } else { out[i] <- "?" } } return(out) }
8.5.1.3. label_time_short
This function converts time values in seconds into short more human-readable equivalents.
label_time_short <- function(labels) { out <- c() for (i in 1:length(labels)) { if(!is.na(labels[i])) { days <- as.integer(labels[i] / 86400) hours <- as.integer((labels[i] - (86400 * days)) / 3600) minutes <- as.integer((labels[i] - (86400 * days) - (3600 * hours)) / 60) seconds <- as.integer( labels[i] - (86400 * days) - (3600 * hours) - (60 * minutes) ) if(days > 0) { out[i] <- paste0(round(labels[i] / 86400., 1), "d") } else if(hours > 0) { out[i] <- paste0(round(labels[i] / 3600., 1), "h") } else if(minutes > 0) { out[i] <- paste0(round(labels[i] / 60., 1), "m") } else if(seconds > 0) { out[i] <- paste0(round(labels[i], 1), "s") } else { out[i] <- "0s" } } else { out[i] <- "?" } } return(out) }
8.5.1.4. label_epsilon
This function converts solution accuracy threshold values from numeric values to strings having the form \(1 \times 10^{-n}\) or to ‘unset (no compression)’ if the value is not assigned like in the case of solvers not using data compression.
label_epsilon <- function(labels) { out <- c() for (i in 1:length(labels)) { if(!is.na(labels[i])) { out[i] <- formatC( as.numeric(labels[i]), format = "e", digits = 1 ) } else { out[i] <- "unset (no compression)" } } return(out) }
8.5.1.5. label_storage
This function prettifies storage support names.
label_storage <- function(labels) { out <- c() for(i in 1:length(labels)) { out[i] <- switch( as.character(labels[i]), "rm_peak" = "Memory (RAM)", "hdd_peak" = "Disk drive" ) } return(out) }
8.5.1.6. label_ooc
This function prettifies the names of facet grids distinguishing between
benchmarks with and without out-of-core computation (column dense_ooc
in the
data frame).
label_ooc <- function(labels) { out <- c() for(i in 1:length(labels)) { out[i] <- switch( labels[i], "-1" = "All in-core", "0" = "Out-of-core except Schur complement", "1" = "All out-of-core" ) } return(out) }
8.5.1.7. label_solver
This function prettifies solver names.
label_solver <- function(labels) { out <- c() for(i in 1:length(labels)) { out[i] <- switch( as.character(labels[i]), "spido" = "SPIDO", "hmat-bem" = "HMAT", "hmat-fem" = "HMAT", "hmat-fem-nd" = "HMAT (proto-ND)", "hmat-oss" = "HMAT-OSS", "chameleon" = "Chameleon", "mumps" = "MUMPS", "mumps-blr" = "MUMPS (compressed)", "mumps-no-blr" = "MUMPS (uncompressed)", "qr_mumps" = "qr_mumps", "mumps/spido" = "MUMPS/SPIDO", "qr_mumps/spido" = "qr_mumps/SPIDO", "qr_mumps/hmat" = "qr_mumps/HMAT", "mumps/hmat" = "MUMPS/HMAT", "hmat/hmat" = "HMAT" ) } return(out) }
8.5.1.8. label_solver_config
This function prettifies solver configuration labels.
label_solver_config <- function(labels) { out <- c() for(i in 1:length(labels)) { out[i] <- switch( as.character(labels[i]), "vanilla" = "without advanced features", "sparse-rhs" = "sparse right-hand side", "blr-dense-rhs" = "compression", "blr-sparse-rhs" = "compression and sparse right-hand side" ) } return(out) }
8.5.1.9. label_tag
This function prettifies energy_scope
tag names.
label_tag <- function(labels) { out <- c() for(i in 1:length(labels)) { out[i] <- switch( as.character(labels[i]), # Multi-solve "CreateSchurComplement_MultiSolve_MUMPS" = expression( italic(S[i] == A[ss[i]] - A[sv[i]]*A[vv]^{-1}*A[sv[i]]^{T}) ~ "(Schur compl. block)" ), "CreateSchurComplement_MultiSolve_MUMPS_solve" = expression( italic(A[vv]^{-1}*A[sv[i]]^{T}) ~ "(sparse solve)" ), "CreateSchurComplement_MultiSolve_MUMPS_product" = expression( italic(A[sv[i]]*A[vv]^{-1}*A[sv[i]]^{T}) ~ "(SpMM)" ), "MPF_SolveGEMM_MUMPS_HMAT_assembly" = expression( italic(compress * group("(", A[sv[i]]*A[vv]^{-1}*A[sv[i]]^{T}, ")")) ~ "(SpMM)" ), "MPF_SolveGEMM_MUMPS_SPIDO_AXPY" = expression( italic(A[ss[i]] - A[sv[i]]*A[vv]^{-1}*A[sv[i]]^{T}) ~ "(AXPY)" ), "MPF_SolveGEMM_MUMPS_HMAT_AXPY" = expression( italic(A[ss[i]] - A[sv[i]]*A[vv]^{-1}*A[sv[i]]^{T}) ~ "(compressed AXPY)" ), "MPF_SolveGEMM_MUMPS" = expression( italic(S == A[ss] - A[sv]*A[vv]^{-1}*A[sv]^{T}) ~ "(Schur complement)" ), # Multi-factorization "CreateSchurComplement" = expression( italic(S[i] == A[ss[ij]] - A[sv[i]]*A[vv]^{-1}*A[sv[j]]^{T}) ~ "(Schur compl. block)" ), "CreateSchurComplement_factorization" = expression( italic(X[ij] == - A[sv[i]]*A[vv]^{-1}*A[sv[j]]^{T}) ~ "(Schur compl. block)" ), "MPF_multifact_MUMPS" = expression( italic(S == A[ss] - A[sv]*A[vv]^{-1}*A[sv]^{T}) ~ "(Schur complement)" ), "MPF_multifact_MUMPS_HMAT" = expression( italic(S == A[ss] - A[sv]*A[vv]^{-1}*A[sv]^{T}) ~ "(Schur complement)" ), # Common "testMPF_factorisation_Schur" = expression( italic(S^{-1}) ~ "(Schur factorization)" ), "testMPF_factorisation" = expression( italic(A^{-1}) ~ "(factorization of" ~ italic(A) * ")" ), "testMPF_solve" = expression( italic(A^{-1}*x) ~ "(system solution)" ) ) } return(out) }
8.5.1.10. label_tag_short
This function do the same thing as label_tag
but returns shorter labels than
the latter.
label_tag_short <- function(labels, instance = "i") { out <- c() for(i in 1:length(labels)) { out[i] <- switch( as.character(labels[i]), # Multi-solve "CreateSchurComplement_MultiSolve_MUMPS" = paste0( "italic(S[", instance, "])" ), "CreateSchurComplement_MultiSolve_MUMPS_solve" = paste0( "italic(solve[", instance, "])" ), "CreateSchurComplement_MultiSolve_MUMPS_product" = paste0( "italic(SpMM[", instance, "])" ), "MPF_SolveGEMM_MUMPS_HMAT_assembly" = paste0( "italic(compress[", instance, "])" ), "MPF_SolveGEMM_MUMPS_HMAT_AXPY" = paste0( "italic(CAXPY[", instance, "])" ), "MPF_SolveGEMM_MUMPS_SPIDO_AXPY" = paste0( "italic(AXPY[", instance, "])" ), "MPF_SolveGEMM_MUMPS" = "italic(S)", "CreateSchurComplement" = paste0( "italic(S[", instance, "])" ), # Multi-factorization "CreateSchurComplement_factorization" = paste0( "italic(X[", instance, "])" ), "MPF_multifact_MUMPS" = "italic(S)", "MPF_multifact_MUMPS_HMAT" = "italic(S)", # Common "testMPF_factorisation_Schur" = "italic(S^{-1})", "testMPF_factorisation" = "italic(A^{-1})", "testMPF_solve" = "italic(A^{-1}*x)" ) } return(out) }
8.5.1.11. label_kind
This function prettifies power and storage resources consumption kinds (see Section 8.2.3.3).
label_kind <- function(labels) { out <- c() for(i in 1:length(labels)) { out[i] <- switch( as.character(labels[i]), "ecpu" = "CPU", "eram" = "RAM", "sram" = "RAM", "shdd" = "Disk", "flops" = "Flop rate", "bw" = "RAM bandwidth" ) } return(out) }
8.5.1.12. label_ooc_storage
This function prettifies power and storage resources consumption kinds (see Section 8.2.3.3).
label_ooc_storage <- function(labels) { out <- c() for(i in 1:length(labels)) { out[i] <- switch( as.character(labels[i]), "network" = "BeeGFS (10 Gbit/s link)", "hdd" = "Hard drive", "ssd" = "Solid state drive" ) } return(out) }
8.5.1.13. label_mapping
This function transforms MPI process mapping values into strings giving a more detailed information about the underlying parallel configuration.
label_mapping <- function(labels) { out <- c() for(i in 1:length(labels)) { out[i] <- switch( as.character(labels[i]), "node" = "1 unbound MPI process \U2A09 1 to 24 threads", "socket" = "2 MPI processes bound to sockets \U2A09 1 to 12 threads", "numa" = "4 MPI processes bound to NUMAs \U2A09 1 to 6 threads", "core" = "1 to 24 MPI processes bound to cores \U2A09 1 thread" ) } return(out) }
8.5.1.14. label_nbpts
This function converts problem's unknown counts into the form N = <count>
where <count>
uses the scientific notation of the associated numerical values.
label_nbpts <- function(labels) { out <- c() for (i in 1:length(labels)) { out[i] <- paste( "\U1D441 =", as.character( round(as.numeric(labels[i]) / 1e+06, digits = 2) ), "M" ) } return(out) }
8.5.1.15. label_nbpts_ticks
This function converts problem's unknown counts into the form <count>M
or
<count>K
where <count>
is converted to millions and rounded to two decimals
or thousands and rounded to integer values, respectively.
label_nbpts_ticks <- function(labels) { out <- c() for (i in 1:length(labels)) { value <- as.numeric(labels[i]) if(value < 1e+06) { divide_by <- 1000 show_digits <- 0 unit <- "K" } else { divide_by <- 1e+06 show_digits <- 2 unit <- "M" } out[i] <- paste0( as.character( round(value / divide_by, digits = show_digits) ), unit ) } return(out) }
8.5.1.16. label_scientific
This function converts numeric label values to scientific notation.
label_scientific <- function(labels) { out <- c() for (i in 1:length(labels)) { out[i] <- formatC( as.numeric(labels[i]), format = "e", digits = 1 ) } return(out) }
8.5.1.17. label_coupling
and label_coupling_two_stage
These functions prettify names of the implementation schemes for the solution of coupled linear systems.
label_coupling <- function(labels) { out <- c() for(i in 1:length(labels)) { out[i] <- switch( as.character(labels[i]), "multi-solve" = "Multi-solve (two-stage)", "multi-facto" = "Multi-factorization (two-stage)", "full-hmat" = "Partially sparse-aware (single-stage)", "single-stage" = "Single-stage (prototype)" ) } return(out) } label_coupling_short <- function(labels) { out <- c() for(i in 1:length(labels)) { out[i] <- switch( as.character(labels[i]), "multi-solve" = "Multi-solve", "multi-facto" = "Multi-factorization", "full-hmat" = "Partially sparse-aware", "single-stage" = "Single-stage (prototype)" ) } return(out) }
8.5.1.18. label_peak_kind
This function prettifies peak memory usage value kinds.
label_peak_kind <- function(labels) { out <- c() for(i in 1:length(labels)) { out[i] <- switch( as.character(labels[i]), "assembly_estimation" = expression("assembled system"), "schur_estimation" = expression("Schur complement matrix " * italic(S)), "peak_symmetric_factorization" = expression( "symmetric factorization of " * italic(A[vv])), "peak_non_symmetric_factorization" = expression( "non-symmetric factorization of " * italic(A[vv])) ) } return(out) }
8.5.1.19. label_execution
These functions prettify names of execution modes.
label_execution <- function(labels) { out <- c() for(i in 1:length(labels)) { out[i] <- switch( as.character(labels[i]), "sync" = "synchronous", "async" = "asynchronous" ) } return(out) }
8.5.2. Label maps
A label map is a variation of a label function. It provides alternative names for given columns of a dataframe.
8.5.2.1. phase.labs
This label map provides prettified names for factorization and solve time
columns tps_facto
and tps_solve
, the corresponding efficiency columns
efficiency_facto
and efficiency_solve
as well as the column holding the
associated execution time of the Schur block creation phase
CreateSchurComplement_exec_time
.
phase.labs <- c( "Factorization", "Solve", "Factorization", "Solve", "Schur complement computation" ) names(phase.labs) <- c( "tps_facto", "tps_solve", "efficiency_facto", "efficiency_solve", "CreateSchurComplement_exec_time" )
8.5.3. Plot theme
To preserve a unique color palette as well as sets of point shapes and line types throughout all the plots, we provide each legend entry with the information on its:
- color,
colors <- c( "0.001" = "#F07E26", "1e-06" = "#9B004F", "node" = "#1488CA", "socket" = "#95C11F", "numa" = "#FFCD1C", "core" = "#6561A9", "1 (36 threads)" = "#E63312", "2 (72 threads)" = "#F07E26", "4 (144 threads)" = "#9B004F", "8 (288 threads)" = "#1488CA", "(32, 32)" = "#F07E26", "(48, 48)" = "#9B004F", "(64, 64)" = "#1488CA", "(128, 128)" = "#95C11F", "(256, 256)" = "#E63312", "(256, 512)" = "#FFCD1C", "(512, 512)" = "#800080", "(256, 1024)" = "#6561A9", "(256, 2048)" = "#89CCCA", "(256, 4096)" = "#384257", "1" = "#1488CA", "2" = "#95C11F", "3" = "#FFCD1C", "4" = "#6561A9", "5" = "#384257", "7" = "#89CCCA", "10" = "#1488CA", "11" = "#9B004F", "30" = "#FFCD1C", "250000" = "#384257", "5e+05" = "#1488CA", "1e+06" = "#E63312", "1200000" = "#95C11F", "1400000" = "#FFCD1C", "1500000" = "#6561A9", "2e+06" = "#F07E26", "2500000" = "#9B004F", "multi-facto" = "#6561A9", "multi-solve" = "#89CCCA", "full-hmat" = "#C7D64F", "single-stage" = "#C7D64F", "spido" = "#384257", "hmat" = "#9B004F", "mumps" = "#F07E26", "mumps-blr" = "#1488CA", "qr_mumps" = "#95C11F", "mumps/spido" = "#95C11F", "mumps/hmat" = "#FFCD1C", "ecpu" = "#384257", "eram" = "#E63312", "sram" = "#1488CA", "shdd" = "#95C11F", "network" = "#384257", "hdd" = "#1488CA", "ssd" = "#95C11F", "flops" = "#9B004F", "bw" = "#1488CA", "vanilla" = "#1488CA", "sparse-rhs" = "#95C11F", "blr-dense-rhs" = "#FFCD1C", "blr-sparse-rhs" = "#6561A9" )
- point shape style,
shapes <- c( "spido" = 16, "hmat-bem" = 15, "hmat-fem" = 15, "hmat-oss" = 15, "hmat-fem-nd" = 23, "chameleon" = 18, "mumps" = 16, "qr_mumps" = 17, "mumps/spido" = 16, "qr_mumps/spido" = 17, "qr_mumps/hmat" = 17, "mumps/hmat" = 15, "hmat/hmat" = 18, "rm_peak" = 16, "swap_peak" = 17, "hdd_peak" = 15, "sync" = 16, "async" = 15, "CreateSchurComplement_MultiSolve_MUMPS" = 22, "CreateSchurComplement_MultiSolve_MUMPS_solve" = 22, "CreateSchurComplement_MultiSolve_MUMPS_product" = 21, "MPF_SolveGEMM_MUMPS_HMAT_assembly" = 23, "MPF_SolveGEMM_MUMPS_SPIDO_AXPY" = 24, "MPF_SolveGEMM_MUMPS_HMAT_AXPY" = 25, "MPF_SolveGEMM_MUMPS" = 23, "CreateSchurComplement" = 22, "CreateSchurComplement_factorization" = 22, "MPF_multifact_MUMPS" = 23, "MPF_multifact_MUMPS_HMAT" = 23, "testMPF_factorisation_Schur" = 24, "testMPF_factorisation" = 23, "testMPF_solve" = 24 )
- line type.
linetypes <- c( "spido" = "solid", "hmat-bem" = "dotted", "hmat-fem" = "dotted", "hmat-oss" = "dotted", "hmat-fem-nd" = "longdash", "chameleon" = "longdash", "mumps" = "solid", "qr_mumps" = "dashed", "mumps/spido" = "solid", "qr_mumps/spido" = "dashed", "qr_mumps/hmat" = "dashed", "mumps/hmat" = "dotted", "hmat/hmat" = "longdash", "rm_peak" = "solid", "swap_peak" = "dotted", "hdd_peak" = "longdash", "sync" = "solid", "async" = "dotted", "assembly_estimation" = "solid", "peak_symmetric_factorization" = "dotted", "peak_non_symmetric_factorization" = "longdash", "schur_estimation" = "dashed" )
Furthermore, each plot object uses a set of common theme layers provided by the
function generate_theme
. If needed, the latter allows us to provide custom
breaks for color, point shape or line type aesthetics, legend title, the number
of lines allowed in the legend as well as the legend box placement.
generate_theme <- function( color_breaks = waiver(), color_labels = waiver(), color_override_aes = list(), shape_breaks = waiver(), shape_labels = waiver(), shape_override_aes = list(), linetype_breaks = waiver(), linetype_labels = waiver(), linetype_override_aes = list(), legend_title = element_text(family = plot_font, size = 14, face = "bold"), legend_text = element_text(family = plot_font, size = 14), theme_text = element_text(family = plot_font, size = 16), legend_rows = 1, legend_rows_color = NA, legend_rows_fill = NA, legend_rows_shape = NA, legend_rows_linetype = NA, legend_box = "vertical", legend_position = "bottom" ) { return(list( scale_color_manual( values = colors, na.value = "#384257", labels = color_labels, breaks = color_breaks ), scale_fill_manual( values = colors, na.value = "#384257", labels = color_labels, breaks = color_breaks ), scale_shape_manual( values = shapes, labels = shape_labels, breaks = shape_breaks, na.translate = F ), scale_linetype_manual( values = linetypes, labels = linetype_labels, breaks = linetype_breaks ), guides( shape = guide_legend( order = 1, nrow = ifelse( is.na(legend_rows_shape), legend_rows, legend_rows_shape ), byrow = TRUE, override.aes = shape_override_aes ), linetype = guide_legend( order = 1, nrow = ifelse( is.na(legend_rows_linetype), legend_rows, legend_rows_linetype ), byrow = TRUE, override.aes = linetype_override_aes ), color = guide_legend( order = 2, nrow = ifelse( is.na(legend_rows_color), legend_rows, legend_rows_color ), byrow = TRUE, override.aes = color_override_aes ), fill = guide_legend( order = 2, nrow = ifelse( is.na(legend_rows_fill), legend_rows, legend_rows_fill ), byrow = TRUE, override.aes = color_override_aes ) ),
By default (see above), we set font family to plot_font
(see Section
8.1), font size to 16 points and legend text size to 14 points.
theme_bw(), theme( text = theme_text, legend.text = legend_text, legend.title = legend_title,
We place the legend at the desired position and sourround it with a rectangle.
legend.position = legend_position,
legend.background = element_rect(color = "gray40", size = 0.5),
We place legend items either side by side or one by line.
legend.box = legend_box,
For better visibility, we make legend symbols longer.
legend.key.width = unit(3, "line"),
Finally, we rotate X-axis text to avoid long label overwriting.
axis.text.x = element_text(angle = 60, hjust = 1) ) )) }
8.6. Plot functions
In performance analysis, the metrics we study are in general the same. So, we prepared a series of plot functions for each type of data visualization. This way, we can for example reuse the same function to plot disk usage peaks by problem unknown's count for any combination of solvers.
8.6.1. times_by_nbpts
This function returns a plot of factorization and solve times relatively to linear system's unknown count for a given set of either dense or sparse solvers.
We combine both metrics in one plot in which they are distinguished thanks to a facet grid. Although, to be able to use the latter, we need to convert the data frame from wide format into long format (see Section 8.2) RwideToLong.
The solver_config
option allows one to distinguish between different solver
configurations on the figure instead of different precision parameter
\(\epsilon\) values.
By default, the trans_x
and trans_y
arguments set the axis scales to
logarithmic using log10
. To use standard scale, the value identity
should be
used. The time_break_width
allows one to set custom Y-axis break widths (in
seconds).
times_by_nbpts <- function(dataframe, solver_config = FALSE, trans_x = "log10", trans_y = "log10", time_break_width = 3600) { dataframe_long <- gather( dataframe, key = "computation_phase", value = "time", c("tps_facto", "tps_solve") )
Now, we can begin to define the resulting ggplot2
plot object. The column
names featured in this plotting function are:
nbpts
linear system's unknown count,tps_facto
represents factorization time in seconds,tps_solve
represents solve time in seconds,solver_config
gives the solver configuration used,desired_accuracy
represents solution accuracy threshold \(\epsilon\),computation_phase
tells whether the row contains factorization or solve time (used in the long formatted data framedataframe_long
),time
contains factorization or solve time (used in the long formatted data framedataframe_long
).
plot <- ggplot(dataframe_long, aes(x = nbpts, y = time))
If solver_config
is TRUE
, we want to distinguish between different solver
configurations.
if(solver_config) { plot <- ggplot( dataframe_long, aes( x = nbpts, y = time, color = as.character(solver_config), shape = solver, linetype = solver ) ) } else if(!all(is.na(dataframe_long$desired_accuracy))) {
Otherwise, we will try to distinguish between various data compression levels.
plot <- ggplot(
dataframe_long,
aes(
x = nbpts,
y = time,
color = as.character(desired_accuracy),
shape = solver,
linetype = solver
)
)
}
Then, we scale the axes.
plot <- plot + geom_line() + geom_point(size = 2.5) + scale_x_continuous( "# Unknowns (\U1D441)", trans = trans_x, breaks = dataframe_long$nbpts, labels = label_nbpts_ticks ) + scale_y_continuous( "Computation time", trans = trans_y, breaks = breaks_width(time_break_width), labels = label_time_short ) +
Finally, using facet grid we distinguish between factorization and solve time
on horizontal and between each solver considered on vertical row. The plot theme
is provided by the generate_theme
function (see Section 8.5.3).
The color label functions depends on the value of solver_config
.
facet_grid( . ~ computation_phase, labeller = labeller(computation_phase = phase.labs) ) if(solver_config) { plot <- plot + labs(color = "Configuration", shape = "Solver", linetype = "Solver") + generate_theme( color_labels = label_solver_config, color_override_aes = list(linetype = 0, shape = 15, size = 8), shape_labels = label_solver, linetype_labels = label_solver, legend_rows = length(unique(na.omit(dataframe_long$solver_config))) / 2 ) } else { plot <- plot + labs(color = "\U1D700", shape = "Solver", linetype = "Solver") + generate_theme( color_labels = label_epsilon, color_override_aes = list(linetype = 0, shape = 15, size = 8), shape_labels = label_solver, linetype_labels = label_solver ) } return(plot) }
8.6.2. multisolve_times_by_nbrhs_and_nbpts
This function returns a plot of factorization time relatively to linear system's
unknown count for different counts of right-hand sides used during the Schur
complement computation when relying on the two-stage multi-solve implementation
scheme. Using the ooc
switch, we can include a facet grid to distinguish
between runs with and without out-of-core Schur complement computation. The
distributed
switch shall adapt the output figure for multi-node benchmarks if
set to TRUE
. y_break_width
allows one to customize the distance between
Y-axis breaks. By default, it corresponds to 20% of the maximum Y-axis value, i.e.
the maximum value of the tps_facto
column.
Now, we can begin to define the plot object. The column names featured in this plotting function are:
nbpts
linear system's unknown count,tps_facto
represents factorization time in seconds,tps_solve
represents solve time in seconds,processes
gives the number of MPI processes used for the computation,p_units
gives the total number of threads used for the computation,cols_schur
gives the number of columns in a Schur complement block,solver
contains the names of solvers featured in the coupling.
Note that we create an additional column p_labels
. It contains the legend
labels for the color aesthetics combining the values of processes
and
p_units
, e.g. 2 (72 threads)
for 2 nodes and 72 threads in total.
multisolve_times_by_nbpts_and_nbrhs <- function(dataframe, ooc = FALSE, distributed = FALSE) { dataframe$cols_schur_labels <- paste0( "(", dataframe$coupled_nbrhs, ", ", dataframe$cols_schur, ")" ) cbreaks <- unique(na.omit(dataframe$cols_schur_labels)) cbreaks <- cbreaks[order( as.integer(gsub("\\(([0-9]+), ([0-9]+)\\)", "\\1", cbreaks)), as.integer(gsub("\\(([0-9]+), ([0-9]+)\\)", "\\2", cbreaks)))] if(distributed) { dataframe$p_labels <- paste0( dataframe$processes, " (", dataframe$p_units, " threads)" ) plot <- ggplot( dataframe, aes( x = nbpts, y = tps_facto, color = p_labels, shape = solver, linetype = solver ) ) } else { plot <- ggplot( dataframe, aes( x = nbpts, y = tps_facto, color = as.character(cols_schur_labels), shape = solver, linetype = solver ) ) } plot <- plot + geom_line() + geom_point(size = 2.5) +
The X-axis shows the count of unknowns in the linear system and the Y-axis shows the computation time.
scale_x_continuous( "# Unknowns (\U1D441)", breaks = dataframe$nbpts, labels = label_nbpts_ticks ) + scale_y_continuous( "Factorization time", breaks = breaks_log(n = 10, base = 10), trans = "log10", labels = label_time_short )
If the ooc
switch is set to TRUE
, include a facet grid based on the
dense_ooc
column values.
if(ooc) { plot <- plot + facet_grid( . ~ dense_ooc, labeller = labeller(dense_ooc = label_ooc) ) }
Finally, we set the legend title, apply the custom theme and return the plot
object. The color breaks are automatically determined based on the values of the
cols_schur
column.
if(distributed) { plot <- plot + labs( shape = "Solver coupling", linetype = "Solver coupling", color = "# Nodes" ) + generate_theme( color_breaks = as.character(sort(unique(na.omit(dataframe$p_labels)))), shape_labels = label_solver, linetype_labels = label_solver ) } else { plot <- plot + labs( shape = "Solver coupling", linetype = "Solver coupling", color = expression( bold(group("(", list(n[italic(c)], n[italic(S)]), ")")) ) ) + generate_theme( color_breaks = cbreaks, shape_labels = label_solver, linetype_labels = label_solver ) } return(plot) }
8.6.3. multisolve_rss_by_nbrhs_and_nbpts
This function returns a plot of real memory (RAM) usage peaks relatively to
linear system's unknown count for different counts of right-hand sides used
during the Schur complement computation when relying on the two-stage
multi-solve implementation scheme. The function can take two extra arguments,
limit_max
and tick_values
, allowing to redefine the default Y-axis maximum
and tick values respectively. Using the ooc
switch, we can include a facet
grid to distinguish between runs with and without out-of-core Schur complement
computation as well as between RAM and disk space consumption.
The column names featured in this plotting function are:
nbpts
linear system's unknown count,rm_peak
real memory usage peaks, stored in mibibytes (MiB) but converted to gibibytes (GiB),hdd_peak
hard drive usage peaks, stored in mibibytes (MiB) but converted to gibibytes (GiB),processes
gives the number of MPI processes used for the computation,p_units
gives the total number of threads used for the computation,cols_schur
gives the number of columns in a Schur complement block,solver
contains the names of solvers featured in the coupling.
Note that we create an additional column p_labels
. It contains the legend
labels for the color aesthetics combining the values of processes
and
p_units
, e.g. 2 (48 threads)
for 2 nodes and 48 threads in total.
We begin by defining the plot object directly.
multisolve_rss_by_nbpts_and_nbrhs <- function(dataframe, limit_max = 128, tick_values = c( 0, 30, 60, 90, 120), ooc = FALSE, distributed = FALSE ) { df <- dataframe df$cols_schur_labels <- paste0( "(", df$coupled_nbrhs, ", ", df$cols_schur, ")" ) cbreaks <- unique(na.omit(df$cols_schur_labels)) cbreaks <- cbreaks[order(as.integer(gsub("\\(([0-9]+), ([0-9]+)\\)", "\\1", cbreaks)),as.integer(gsub("\\(([0-9]+), ([0-9]+)\\)", "\\2", cbreaks)))] y_axis_label <- "Real memory (RAM) usage peak [GiB]" if(distributed) { df$p_labels <- paste0( df$processes, " (", df$p_units, " threads)" ) plot <- ggplot( df, aes( x = nbpts, y = rm_peak / 1024., color = p_labels, shape = solver, linetype = solver ) ) } else if(ooc) { rss_kinds <- c("rm_peak", "hdd_peak") df <- gather( df, key = "rss_kind", value = "rss_peak", all_of(rss_kinds) ) df$rss_kind <- factor(df$rss_kind, levels = rss_kinds) y_axis_label <- "Storage usage peak [GiB]" plot <- ggplot( df, aes( x = nbpts, y = rss_peak / 1024., color = as.character(cols_schur_labels), shape = solver, linetype = solver ) ) } else { plot <- ggplot( df, aes( x = nbpts, y = rm_peak / 1024., color = as.character(cols_schur_labels), shape = solver, linetype = solver ) ) } plot <- plot + geom_line() + geom_point(size = 2.5) + geom_hline( data = subset(df, rss_kind == "rm_peak"), aes(yintercept = limit_max) ) + geom_text( data = subset(df, rss_kind == "rm_peak"), aes( min(df$nbpts), limit_max, label = "Physical memory limit", vjust = 2, hjust = 0 ), color = "black", family = plot_font, size = 5 ) +
The X-axis shows the count of unknowns in the linear system and the Y-axis shows the RAM usage peaks. On the Y-axis, we round the values to 0 decimal places.
scale_x_continuous( "# Unknowns (\U1D441)", breaks = df$nbpts, labels = label_nbpts_ticks ) + scale_y_continuous( y_axis_label, labels = function(label) sprintf("%.0f", label), limits = c(NA, NA), breaks = tick_values )
If the ooc
switch is set to TRUE
, include a facet grid based on the
rss_kind
and the dense_ooc
column values.
if(ooc) { plot <- plot + facet_grid( rss_kind ~ dense_ooc, labeller = labeller(rss_kind = label_storage, dense_ooc = label_ooc) ) }
Finally, we set the legend title, apply the custom theme and return the plot
object. The color breaks are automatically determined based on the values of the
cols_schur
column.
if(distributed) { plot <- plot + labs( shape = "Solver coupling", linetype = "Solver coupling", color = "# Nodes" ) + generate_theme( color_breaks = as.character(sort(unique(na.omit(df$p_labels)))), shape_labels = label_solver, linetype_labels = label_solver ) } else { plot <- plot + labs( shape = "Solver coupling", linetype = "Solver coupling", color = expression( bold(group("(", list(n[italic(c)], n[italic(S)]), ")")) ) ) + generate_theme( color_breaks = cbreaks, shape_labels = label_solver, linetype_labels = label_solver ) } return(plot) }
8.6.4. coupled_memory_aware
This function returns a plot of factorization time relatively to residual memory
(RAM) usage peaks during the Schur complement computation when relying on the
two-stage multi-solve implementation scheme. In this case, we show the results
for a fixed problem's size, i.e. my_nbpts
. narrow
allow to optimize the
output for multi-column document layout.
The column names featured in this plotting function are:
tps_facto
represents factorization time in seconds,rm_peak
real memory usage peaks, stored in mibibytes (MiB) but converted to gibibytes (GiB),solver
contains the names of solvers featured in the coupling.
We begin by shrinking the data frame on input according to my_nbpts
.
coupled_memory_aware <- function(dataframe, my_nbpts, narrow = FALSE, font = plot_font, fontsize = 16, log = TRUE, scheme = "multi-solve", multi_node = FALSE) { dataframe_largest <- NA if(multi_node) { dataframe_largest <- subset( dataframe, coupled_method == scheme ) } else { dataframe_largest <- subset( dataframe, coupled_method == scheme & nbpts == my_nbpts ) }
Then, we create an additional column in the data frame to hold prettified labels providing the Schur complement computation phase parameter values.
if(multi_node) { if(scheme == "multi-solve") { dataframe_largest$label_content <- ifelse( dataframe_largest$solver != "mumps/spido", paste0( "group(\"(\", list(", dataframe_largest$coupled_nbrhs, ", ", "", dataframe_largest$cols_schur, "), \")\")" ), paste0( "group(\"(\", list(", dataframe_largest$coupled_nbrhs, ", ", "", dataframe_largest$coupled_nbrhs, "), \")\")" ) ) } else { dataframe_largest$label_content <- as.character( dataframe_largest$n_blocks ) } } else { if(scheme == "multi-solve") { dataframe_largest$label_content <- ifelse( dataframe_largest$solver != "mumps/spido", paste0( "atop(n[c]==", dataframe_largest$coupled_nbrhs, ", ", "n[S]==", dataframe_largest$cols_schur, ")" ), paste0("n[c]==", dataframe_largest$coupled_nbrhs) ) } else { dataframe_largest$label_content <- paste0( "n[b]==", dataframe_largest$n_blocks ) } }
Also, we need to reorder entries in the data frame following the
coupled_nbrhs
, cols_schur
and n_blocks
columns.
if(scheme == "multi-solve") { dataframe_largest <- dataframe_largest %>% arrange(coupled_nbrhs, cols_schur) } else { dataframe_largest <- dataframe_largest %>% arrange(desc(n_blocks)) }
Finally, we have to determine the maximal memory usage peak among all the processes involved in the computation.
for(i in 1:nrow(dataframe_largest)) { dataframe_largest[i, "rm_peak_max"] <- max( as.numeric( strsplit(dataframe_largest[i, "rm_values"], split = " ")[[1]] ) ) }
Now, we define the plot object.
plot <- NA if(multi_node) { dataframe_largest$p_labels <- paste0( dataframe_largest$processes, " (", dataframe_largest$p_units, " threads)" ) plot <- ggplot( dataframe_largest, aes( x = rm_peak_max / 1024., y = tps_facto, color = p_labels, label = label_content, shape = solver, linetype = solver ) ) } else { plot <- ggplot( dataframe_largest, aes( x = rm_peak / 1024., y = tps_facto, color = coupled_method, label = label_content, shape = solver, linetype = solver ) ) } plot <- plot + geom_path(size = 0.8) + geom_point(size = 2.5) + coord_cartesian(clip = "off") +
We make use of the geom_label_repel
function from the ggrepel
R package to
obtain better label layout.
geom_label_repel( parse = TRUE, xlim = c(NA, NA), ylim = c(NA, NA), color = "black", family = font, size = 3, min.segment.length = 0, max.overlaps = Inf )
The X-axis shows the RAM usage peaks and the Y-axis shows the count of unknowns in the linear system. We use custom break and limit values for better readability.
if(log) { plot <- plot + scale_x_continuous( "Maximal random access memory (RAM) usage peak [GiB]", breaks = breaks_log(10), limits = c(NA, NA), trans = "log10" ) + scale_y_continuous( "Factorization time", trans = "log10", labels = label_time_short, breaks = breaks_log(6) ) } else { plot <- plot + scale_x_continuous( "Maximal random access memory (RAM) usage peak [GiB]", breaks = breaks_extended(10), limits = c(NA, NA) ) + scale_y_continuous( "Factorization time", labels = label_time_short, breaks = breaks_extended(6) ) }
Finally, we set the legend title, apply the custom theme and return the plot object. Note that we do not show the legend for the color aesthetics as there is only one item, the multi-solve implementation scheme. However, we still define the color aesthetics in order to preserve the general color scheme.
if(narrow) { plot <- plot + labs( shape = "Solver coupling", linetype = "Solver coupling" ) + generate_theme( shape_labels = label_solver, shape_override_aes = list(color = colors[scheme]), linetype_labels = label_solver, linetype_override_aes = list(color = colors[scheme]), legend_rows = 2, legend_box = "horizontal", legend_title = element_text( family = font, size = fontsize - 2, face = "bold" ), legend_text = element_text(family = font, size = fontsize - 2), theme_text = element_text(family = font, size = fontsize) ) + guides( color = FALSE, fill = FALSE ) } else if(multi_node) { plot <- plot + facet_wrap( . ~ nbpts, labeller = labeller(nbpts = label_nbpts), scales = "free" ) + labs ( shape = "Solver coupling", linetype = "Solver coupling", color = "Parallel configuration" ) + generate_theme( color_breaks = as.character( sort(unique(na.omit(dataframe_largest$p_labels))) ), shape_labels = label_solver, linetype_labels = label_solver, legend_title = element_text( family = font, size = fontsize - 2, face = "bold" ), legend_text = element_text(family = font, size = fontsize - 2), theme_text = element_text(family = font, size = fontsize) ) + guides( fill = FALSE ) } else { plot <- plot + labs( shape = "Solver coupling", linetype = "Solver coupling" ) + generate_theme( shape_labels = label_solver, shape_override_aes = list(color = colors[scheme]), linetype_labels = label_solver, linetype_override_aes = list(color = colors[scheme]), legend_title = element_text( family = font, size = fontsize - 2, face = "bold" ), legend_text = element_text(family = font, size = fontsize - 2), theme_text = element_text(family = font, size = fontsize) ) + guides( color = FALSE, fill = FALSE ) } return(plot) }
8.6.5. multifacto_times_by_nbpts_and_schur_size
This function returns a plot of factorization time relatively to linear system's unknown count and the number of blocks \(n_b\) the Schur submatrix is split into during the Schur complement computation phase when relying on the two-stage multi-factorization implementation scheme.
The out-of-core block size column is not the same for all the solver couplings
we consider (column disk_block_size
in case of MUMPS/SPIDO and schur_size
in
case of MUMPS/HMAT), so we need to copy the data to the same column,
size_schur
.
multifacto_times_by_nbpts_and_schur_size <- function(dataframe, ooc = FALSE, distributed = FALSE) { local <- dataframe local$size_schur[local$solver == "mumps/spido"] = local$disk_block_size[local$solver == "mumps/spido"]
Now, we can begin to define the plot object. The column names featured in this plotting function are:
nbpts
linear system's unknown count,tps_facto
represents factorization time in seconds,n_blocks
gives the count of blocks the Schur complement matrix was split to,size_schur
gives the out-of-core block size in number of matrix lines or columnssolver
contains the names of solvers featured in the coupling.
if(distributed) { local$p_labels <- paste0( local$processes, " (", local$p_units, " threads)" ) plot <- ggplot( local, aes( x = nbpts, y = tps_facto, color = p_labels, shape = solver, linetype = solver ) ) } else { plot <- ggplot( local, aes( x = nbpts, y = tps_facto, color = as.character(n_blocks), shape = solver, linetype = solver ) ) } plot <- plot + geom_line() + geom_point(size = 2.5) +
The X-axis shows the count of unknowns in the linear system and the Y-axis shows the computation time.
scale_x_continuous( "# Unknowns (\U1D441)", breaks = local$nbpts, labels = label_nbpts_ticks ) + scale_y_continuous( "Factorization time", breaks = breaks_log(n = 10, base = 10), trans = "log10", labels = label_time_short )
If the ooc
switch is set to TRUE
, include a facet grid based on the
dense_ooc
column values.
if(ooc) { plot <- plot + facet_grid( . ~ dense_ooc, labeller = labeller(dense_ooc = label_ooc) ) }
Finally, we set the legend title, apply the custom theme and return the plot
object. The color breaks are automatically determined based on the values of the
nbpts
column.
if(distributed) { plot <- plot + labs( shape = "Solver coupling", linetype = "Solver coupling", color = "# Nodes" ) + generate_theme( color_breaks = as.character(sort(unique(na.omit(local$p_labels)))), shape_labels = label_solver, linetype_labels = label_solver ) } else { plot <- plot + labs( shape = "Solver coupling", linetype = "Solver coupling", color = expression(n[italic(b)]) ) + generate_theme( color_breaks = as.character(sort(unique(na.omit(local$n_blocks)))), shape_labels = label_solver, linetype_labels = label_solver ) } return(plot) }
8.6.6. multifacto_rss_by_nbpts_and_schur_size
This function returns a plot of real memory (RAM) usage peaks relatively to
linear system's unknown count and the number of blocks \(n_b\) the Schur submatrix
is split into during the Schur complement computation phase when relying on the
two-stage multi-factorization implementation scheme. The function can take two
extra arguments, limit_max
and tick_values
, allowing to redefine the default
Y-axis maximum and tick values respectively.
The out-of-core block size column is not the same for all the solver couplings
we consider (column disk_block_size
in case of MUMPS/SPIDO and schur_size
in
case of MUMPS/HMAT), so we need to copy the data to the same column,
size_schur
.
multifacto_rss_by_nbpts_and_schur_size <- function(dataframe, limit_max = 128, tick_values = c( 0, 30, 60, 90, 120), ooc = FALSE, distributed = FALSE) { local <- dataframe local$size_schur[local$solver == "mumps/spido"] = local$disk_block_size[local$solver == "mumps/spido"]
Now, we can begin to define the plot object. The column names featured in this plotting function are:
nbpts
linear system's unknown count,rm_peak
real memory usage peaks, stored in mibibytes (MiB) but converted to gibibytes (GiB),n_blocks
gives the count of blocks the Schur complement matrix was split to,size_schur
gives the out-of-core block size in number of matrix lines or columnssolver
contains the names of solvers featured in the coupling.
if(distributed) { local$p_labels <- paste0( local$processes, " (", local$p_units, " threads)" ) plot <- ggplot( local, aes( x = nbpts, y = rm_peak / 1024., color = as.character(p_labels), shape = solver, linetype = solver ) ) } else if(ooc) { local <- gather( local, key = "rss_kind", value = "rss_peak", c("rm_peak", "hdd_peak") ) local$rss_kind <- factor(local$rss_kind, levels = c("rm_peak", "hdd_peak")) plot <- ggplot( local, aes( x = nbpts, y = rss_peak / 1024., color = as.character(n_blocks), shape = solver, linetype = solver ) ) } else { plot <- ggplot( local, aes( x = nbpts, y = rm_peak / 1024., color = as.character(n_blocks), shape = solver, linetype = solver ) ) } plot <- plot + geom_line() + geom_point(size = 2.5) + geom_hline( data = subset(local, rss_kind == "rm_peak"), aes(yintercept = limit_max) ) + geom_text( data = subset(local, rss_kind == "rm_peak"), aes( min(local$nbpts), limit_max, label = "Physical memory limit", vjust = 2, hjust = 0 ), color = "black", family = plot_font, size = 5 ) +
The X-axis shows the count of unknowns in the linear system and the Y-axis shows the RAM usage peaks. On the Y-axis, we round the values to 0 decimal places.
scale_x_continuous( "# Unknowns (\U1D441)", breaks = local$nbpts, labels = label_nbpts_ticks ) + scale_y_continuous( "Storage usage peak [GiB]", labels = function(label) sprintf("%.0f", label), limits = c(NA, NA), breaks = tick_values )
If the ooc
switch is set to TRUE
, include a facet grid based on the
rss_kind
and the dense_ooc
column values.
if(ooc) { plot <- plot + facet_grid( rss_kind ~ dense_ooc, labeller = labeller(rss_kind = label_storage, dense_ooc = label_ooc) ) }
Finally, we set the legend title, apply the custom theme and return the plot
object. The color breaks are automatically determined based on the values of the
nbpts
column.
if(distributed) { plot <- plot + labs( shape = "Solver coupling", linetype = "Solver coupling", color = "# Nodes" ) + generate_theme( color_breaks = as.character(sort(unique(na.omit(local$p_labels)))), shape_labels = label_solver, linetype_labels = label_solver ) } else { plot <- plot + labs( shape = "Solver coupling", linetype = "Solver coupling", color = expression(n[italic(b)]) ) + generate_theme( color_breaks = as.character(sort(unique(na.omit(local$n_blocks)))), shape_labels = label_solver, linetype_labels = label_solver ) } return(plot) }
8.6.7. compare_coupled
This function returns a plot of the best factorization times relatively to linear system's unknown count for each implementation scheme for solving coupled systems.
For example, if we want to plot only the results regarding two-stage schemes, we
can use the legend_rows
argument to adjust the number of rows in the legend.
Then, the shorten
argument allows to remove legend titles and use shorter
legend key labels in order to reduce the final canvas size.
We begin by restraining the input data frame to the best results of two-stage and single-stage schemes.
Note that as the latter imply the use of only one solver and not a coupling of solvers, we need to specify the coupling method name manually to enable data frame treatment.
compare_coupled <- function(dataframe, legend_rows = 3, legend_box = "horizontal", shorten = FALSE, check_epsilon = FALSE, font = plot_font, fontsize = 16) { dataframe_best <- dataframe dataframe_best$coupled_method <- as.character(dataframe_best$coupled_method) dataframe_best$coupled_method[dataframe_best$solver == "hmat/hmat"] <- "full-hmat" dataframe_best$coupled_method[dataframe_best$solver == "qr_mumps/hmat"] <- "single-stage" dataframe_best <- merge( aggregate( tps_facto ~ coupled_method + nbpts + solver, min, data = dataframe_best ), dataframe_best )
Now, we can begin to define the plot object. The column names featured in this plotting function are:
nbpts
linear system's unknown count,tps_facto
represents factorization time in seconds,coupled_method
gives the name of the approach used to solve coupled FEM/BEM systems (e. g. multi-solve, etc.),solver
contains the names of solvers featured in the coupling.
plot <- NA if(check_epsilon) { plot <- ggplot( dataframe_best, aes( x = nbpts, y = error, color = coupled_method, shape = solver, linetype = solver ) ) } else { plot <- ggplot( dataframe_best, aes( x = nbpts, y = tps_facto, color = coupled_method, shape = solver, linetype = solver ) ) } plot <- plot + geom_line() + geom_point(size = 2.5)
The X-axis shows the count of unknowns in the linear system and the Y-axis shows the computation time.
if(check_epsilon) { vjust <- -0.6 size <- 5 if(shorten) { vjust <- -0.3 size <- 4 } plot <- plot + geom_hline(yintercept = 1.0e-3) + geom_text( aes( 1.0e+6, 1.0e-3, label = as.character(expression(epsilon == 10^{-3})), vjust = vjust, hjust = 0 ), parse = TRUE, color = "black", family = font, size = size ) } plot <- plot + scale_x_continuous( "# Unknowns (\U1D441)", breaks = dataframe_best$nbpts, labels = label_nbpts_ticks ) if(check_epsilon) { plot <- plot + scale_y_continuous( expression(paste("Relative error (", E[rel], ")")), limits = c(min(dataframe_best$error), 2.0e-3), labels = scientific, breaks = c(1e-16, 1e-13, 1e-10, 1e-6, 1e-3), trans = "log10" ) } else { plot <- plot + scale_y_continuous( "Factorization time", labels = label_time_short, trans = "log10", limits = c(NA, NA) ) }
Finally, we set the legend title, apply the default theme and return the plot
object. The color, shape and line type breaks are automatically determined based
on the values of the coupled_method
and solver
columns.
plot <- plot + labs( shape = "Solvers", linetype = "Solvers", color = "Implementation\nscheme" ) if(shorten) { plot <- plot + generate_theme( color_breaks = unique(na.omit(dataframe_best$coupled_method)), color_labels = label_coupling_short, color_override_aes = list(linetype = 0, shape = 15, size = 6), shape_breaks = unique(na.omit(dataframe_best$solver)), shape_labels = label_solver, linetype_breaks = unique(na.omit(dataframe_best$solver)), linetype_labels = label_solver, legend_rows = legend_rows, legend_box = legend_box, legend_title = element_blank(), legend_text = element_text(family = font, size = fontsize - 2), theme_text = element_text(family = font, size = fontsize) ) } else { plot <- plot + generate_theme( color_breaks = unique(na.omit(dataframe_best$coupled_method)), color_labels = label_coupling, color_override_aes = list(linetype = 0, shape = 15, size = 8), shape_breaks = unique(na.omit(dataframe_best$solver)), shape_labels = label_solver, linetype_breaks = unique(na.omit(dataframe_best$solver)), linetype_labels = label_solver, legend_rows = legend_rows, legend_box = legend_box, legend_text = element_text(family = font, size = fontsize - 2), theme_text = element_text(family = font, size = fontsize) ) } return(plot) }
8.6.8. compare_rss_coupled
This function returns a plot of the real memory (RAM) usage peaks for the best factorization times relatively to linear system's unknown count for each implementation scheme for solving coupled systems.
We begin by restraining the input data frame to the best results of two-stage schemes and the results of the single-stage scheme implemented using HMAT.
Note that as the latter only implies the use of one single solver, we need to specify the coupling method name manually to enable the data frame treatment.
compare_rss_coupled <- function(dataframe) { dataframe_best <- dataframe dataframe_best$coupled_method <- as.character(dataframe_best$coupled_method) dataframe_best$coupled_method[dataframe_best$solver == "hmat/hmat"] <- "full-hmat" dataframe_best <- merge( aggregate( tps_facto ~ coupled_method + nbpts + solver, min, data = dataframe_best ), dataframe_best )
Now, we can begin to define the plot object. The column names featured in this plotting function are:
nbpts
linear system's unknown count,tps_facto
represents factorization time in seconds,rm_peak
real memory usage peaks, stored in mibibytes (MiB) but converted to gibibytes (GiB),coupled_method
gives the name of the approach used to solve coupled FEM/BEM systems (e. g. multi-solve, etc.),solver
contains the names of solvers featured in the coupling.
plot <- ggplot(
dataframe_best,
aes(
x = nbpts,
y = rm_peak / 1024.,
color = coupled_method,
shape = solver,
linetype = solver
)
) +
geom_line() +
geom_point(size = 2.5) +
The X-axis shows the count of unknowns in the linear system and the Y-axis shows the RAM usage peaks.
scale_x_continuous( "# Unknowns (\U1D441)", breaks = dataframe_best$nbpts, labels = scientific ) + scale_y_continuous( "Real memory (RAM) usage peak [GiB]", labels = function(label) sprintf("%.0f", label), limits = c(NA, 126), breaks = c(0, 30, 60, 90, 120) ) +
Finally, we set the legend title, apply the default theme and return the plot
object. The color, shape and line type breaks are automatically determined based
on the values of the coupled_method
and solver
columns.
labs( shape = "Solvers", linetype = "Solvers", color = "Implementation\nscheme" ) + generate_theme( color_breaks = length(unique(na.omit(dataframe_best$coupled_method))), color_labels = label_coupling, shape_breaks = length(unique(na.omit(dataframe_best$solver))), shape_labels = label_solver, linetype_breaks = length(unique(na.omit(dataframe_best$solver))), linetype_labels = label_solver, legend_rows = 3, legend_box = "horizontal" ) return(plot) }
8.6.9. accuracy_by_nbpts
This function returns a plot of relative solution error with respect to linear system's unknown count.
The solver_config
option allows one to distinguish between different solver
configurations on the figure instead of different precision parameter
\(\epsilon\) values.
The column names featured in this plotting function are:
nbpts
linear system's unknown count,solver_config
gives the solver configuration used,error
relative forward error of the solution approximation.
accuracy_by_nbpts <- function(dataframe, solver_config = FALSE) { if(solver_config) { plot <- ggplot( dataframe, aes( x = nbpts, y = error, color = as.character(solver_config), shape = solver, linetype = solver ) ) } else { plot <- ggplot( dataframe, aes( x = nbpts, y = error, color = as.character(desired_accuracy), shape = solver, linetype = solver ) ) } plot <- plot + geom_line() + geom_point(size = 2.5) +
We draw a horizontal line representing the maximal admitted value of \(E_{rel}\) which is 10-3.
geom_hline(yintercept = 1.0e-3) + geom_text( aes( min(dataframe$nbpts), 1.0e-3, label = as.character(expression(epsilon == 10^{-3})), vjust = -0.6, hjust = 0 ), parse = TRUE, family = plot_font, color = "black", size = 5 ) +
For the Y-axis, we define the tick values manually for better comparison to the associated relative error values.
scale_x_continuous( "# Unknowns (\U1D441)", trans = "log10", breaks = dataframe$nbpts, labels = label_nbpts_ticks ) + scale_y_continuous( expression(paste("Relative error (", E[rel], ")")), limits = c(min(dataframe$error), 1e-1), breaks = c(1e-16, 1e-13, 1e-10, 1e-6, 1e-3), labels = scientific, trans = "log10" )
Finally, we set the legend labels, apply the common theme parameters and return the plot object.
if(solver_config) { plot <- plot + labs(color = "Configuration", shape = "Solver", linetype = "Solver") + generate_theme( color_labels = label_solver_config, color_override_aes = list(linetype = 0, shape = 15, size = 8), shape_labels = label_solver, linetype_labels = label_solver, legend_rows = length(unique(na.omit(dataframe$solver_config))) / 2 ) } else { plot <- plot + labs(color = "\U1D700", shape = "Solver", linetype = "Solver") + generate_theme( color_labels = label_epsilon, color_override_aes = list(linetype = 0, shape = 15, size = 8), shape_labels = label_solver, linetype_labels = label_solver ) } return(plot) }
8.6.10. rss_peaks_by_nbpts
This function returns a plot of real memory (RAM) and disk space usage peaks
relatively to linear system's unknown count. The function can take five extra
arguments, limit_max
, tick_values
, solver_config
, trans_x
and trans_y
.
The first two of them can be used to redefine the default Y-axis maximum and
tick values respectively. The solver_config
option allows one to distinguish
between different solver configurations on the figure. By default, the
trans_x
argument sets the X-axis scale to logarithmic. To use standard scale,
the value identity
should be used.
The column names featured in this plotting function are:
nbpts
linear system's unknown count,solver_config
gives the solver configuration used,rm_peak
real memory usage peaks, stored in mibibytes (MiB) but converted to gibibytes (GiB),hdd_peak
disk space usage peaks, stored in MiB but converted to GiB.
At first, we convert the data frame from wide to long format.
rss_peaks_by_nbpts <- function(dataframe, limit_max = 126, tick_values = c(0, 30, 60, 90, 120), solver_config = FALSE, trans_x = "log10") { dataframe_long <- gather( dataframe, key = "memory_type", value = "memory_usage", c("rm_peak", "hdd_peak") )
Also, we fix the order of the values in memory_type
so that real memory (RAM)
usage peaks come before disk usage peaks in the facet grid.
dataframe_long$memory_type <- factor( dataframe_long$memory_type, c('rm_peak', 'hdd_peak') )
Then, we configure the plot object itself as described above and handle
different levels of expected relative solution error in case of solvers using
data compression by adding the shape aesthetics. If solver_config
is TRUE
,
we want to distinguish between different solver configurations instead.
if(solver_config) { plot <- ggplot( dataframe_long, aes( x = nbpts, y = memory_usage / 1024., color = as.character(solver_config), shape = solver, linetype = solver ) ) } else { plot <- ggplot( dataframe_long, aes( x = nbpts, y = memory_usage / 1024., color = as.character(desired_accuracy), shape = solver, linetype = solver ) ) } plot <- plot + geom_line() + geom_point(size = 2.5) +
The limit_max
argument is used to plot a horizontal line indicating physical
memory limit.
geom_hline( data = subset(dataframe_long, memory_type == "rm_peak"), aes(yintercept = limit_max) ) + geom_text( data = subset(dataframe_long, memory_type == "rm_peak"), aes( min(dataframe_long$nbpts), limit_max, label = "Physical memory limit", vjust = 2, hjust = 0 ), color = "black", family = plot_font, size = 5 ) +
The X-axis shows the count of unknowns in the linear system and the Y-axis shows the storage usage peaks. On the Y-axis, we round the values to 0 decimal places.
scale_x_continuous( "# Unknowns (\U1D441)", trans = trans_x, breaks = dataframe_long$nbpts, labels = label_nbpts_ticks ) + scale_y_continuous( "Storage usage peak [GiB]", labels = function(label) sprintf("%.0f", label), limits = c(NA, NA), breaks = tick_values ) +
Storage types are distinguished using a facet grid.
facet_grid( . ~ memory_type, labeller = labeller(memory_type = label_storage) )
We end by setting legend titles, applying the common theme parameters and returning the plot object.
if(solver_config) { plot <- plot + labs(color = "Configuration", shape = "Solver", linetype = "Solver") + generate_theme( color_labels = label_solver_config, color_override_aes = list(linetype = 0, shape = 15, size = 8), shape_labels = label_solver, linetype_labels = label_solver, legend_rows = length(unique(na.omit(dataframe_long$solver_config))) / 2 ) } else { plot <- plot + labs(color = "\U1D700", shape = "Solver", linetype = "Solver") + generate_theme( color_labels = label_epsilon, color_override_aes = list(linetype = 0, shape = 15, size = 8), shape_labels = label_solver, linetype_labels = label_solver ) } return(plot) }
8.6.11. rss_by_time
This function returns a plot of real memory (RAM) usage relatively to the
execution time. The dataframe
argument represents the data frame containing
RAM consumption data read using the read_rss
function (see Section
8.2.3.1). Moreover, using limit_max
and tick_values
one
can redefine the default Y-axis maximum and tick values respectively to improve
readibility of the plot. The timeline
argument is useful when dataframe
contains coupled FEM/BEM benchmark results. timeline
represents an additional
data frame that can be used to highlight selected computation phases and
estimated reference RAM usage peaks of a sparse FEM factorization and Schur
matrix size in the plot (see Section 8.2.3.4).
The column names featured in this plot function are:
time
timestamp in seconds giving the time elapsed since the beginning of the execution,rss
global real memory usage in mibibytes (MiB) but converted to gibibytes (GiB).
We begin directly by defining the plot object.
rss_by_time <- function(dataframe, limit_max = 126, tick_values = c(0, 30, 60, 90, 120), timeline = NA, additional = FALSE) { is_coupled <- FALSE fill_legend <- "Solver" color_labeller <- label_solver if("coupled_method" %in% colnames(dataframe)) { is_coupled <- TRUE fill_legend <- "Implementation scheme" color_labeller <- label_coupling } if(is_coupled) { plot <- ggplot( dataframe, aes( x = timestamp / 1000, y = consumption, fill = coupled_method ) ) } else { plot <- ggplot( dataframe, aes( x = timestamp / 1000, y = consumption, fill = solver ) ) }
We draw the RAM usage as an area chart.
plot <- plot +
geom_area() +
The limit_max
argument is used to plot a horizontal line indicating physical
memory limit.
geom_hline( aes(yintercept = limit_max) ) + geom_text( aes(0, limit_max, label = "Physical memory limit", vjust = 2, hjust = 0), color = "black", family = plot_font, size = 5 )
In the case the extra timeline
argument is present, we draw additional
labelled vertical lines corresponding to selected computation phases using the
geom_vline
and geom_text
layers. Note that for the geom_text
layer we also
need to adjust the text position and rotation. To enable the processing of the
labels as math expressions, we use the parse
parameter.
if(is.data.frame(timeline)) { timeline$solver <- rep(unique(na.omit(dataframe$solver)), nrow(timeline)) plot <- plot + geom_vline( data = timeline, aes(xintercept = time), linetype = "dotted", size = 0.4 ) + geom_text( data = timeline, aes( x = middle, y = 0.5 * limit_max, label = label ), parse = T, angle = 90, vjust = "center", family = plot_font, size = 5 )
Then, if additional
is TRUE
, we draw also the horizontal segments
corresponding to estimated RAM usage peaks of a sparse FEM factorization and
estimated Schur matrix size.
However, at first, we have to convert the timeline
data frame to the long
format (see Section 8.2). This allows us to use different sizes
to distinguish different segments based on the kind of value they represent.
if(additional) { columns <- c("assembly_estimation", "peak_symmetric_factorization", "peak_non_symmetric_factorization", "schur_estimation") timeline_long <- gather( timeline, key = "peak_kind", value = "peak_value", columns ) plot <- plot + geom_spoke( data = timeline_long, aes( x = middle - 0.5 * (time - middle), y = peak_value, linetype = peak_kind, radius = time - middle ), angle = 2 * pi ) } }
Next, we set up the axis and their scaling. We round the Y-axis values to two decimal places.
if(is.data.frame(timeline)) { plot <- plot + scale_x_continuous( "Execution time", labels = label_time, breaks = c(0, na.omit(timeline$time), max(dataframe$timestamp) / 1000) ) } else { plot <- plot + scale_x_continuous("Execution time", labels = label_time) } plot <- plot + scale_y_continuous( "Memory (RAM) usage [GiB]", labels = function(label) sprintf("%.0f", label), limits = c(NA, NA), breaks = tick_values )
We end by configuring the visual aspect of the plot.
if(is.data.frame(timeline) && additional) { plot <- plot + labs( fill = fill_legend, linetype = "Estimated peak usage" ) + generate_theme( color_labels = color_labeller, linetype_labels = label_peak_kind, legend_rows_linetype = length(columns) ) } else { plot <- plot + labs(fill = fill_legend) + generate_theme(color_labels = color_labeller) } return(plot) }
8.6.12. eprofile
This function returns a plot of power consumtion relatively to execution time.
The first and only argument es_data
of the function corresponds to the data
frame containing power consumption data as well as tag information, if any (see
Section 8.2.3.3).
eprofile <- function(es_data, zoom_on = c(NA, NA), TDP = NA, rss = TRUE, flops = TRUE, bw = FALSE) { zoomed <- FALSE solvers <- unique(na.omit(es_data$solver)) es <- es_data if(!is.na(zoom_on[1]) && !is.na(zoom_on[2])) { zoomed <- TRUE es <- subset( es, timestamp >= zoom_on[1] * 1000 & timestamp <= zoom_on[2] * 1000 ) } edata <- subset(es, kind_general == "e") power <- ggplot( data = subset(edata, is.na(type)) ) sdata <- NA storage <- NA if(rss) { sdata <- subset(es, kind_general == "s") sdata$kind <- factor(sdata$kind, levels = c("sram", "shdd")) storage <- ggplot( data = subset(sdata, is.na(type)) ) } fdata <- NA if(flops) { fdata <- subset(es, kind_general == "flops") fdata$kind <- factor(fdata$kind, levels = c("flops")) } bwdata <- NA if(bw) { bwdata <- subset(es, kind_general == "bw") bwdata$kind <- factor(bwdata$kind, levels = c("bw")) }
At first, we draw the power consumption curve. The tags are represented by
colored points surrounded with black stroke and annotated with labels. Note that
the latter are stored as math expressions and need to be parsed by the
geom_label_repel
function.
if(flops) { power <- power + geom_line( data = fdata, aes(x = timestamp, y = consumption / 1000, color = kind) ) } if(bw) { power <- power + geom_line( data = bwdata, aes(x = timestamp, y = consumption / 1024, color = kind) ) } power <- power + geom_line(aes(x = timestamp, y = consumption, color = kind)) + geom_point( data = subset(edata, !is.na(type) & kind == "ecpu"), aes(x = timestamp, y = consumption, shape = tag), color = "black", fill = "white", size = 3 ) + geom_label_repel( data = subset(edata, !is.na(type) & kind == "ecpu"), aes(x = timestamp, y = consumption, label = type_label), force_pull = 0, segment.color = "black", min.segment.length = 0, max.overlaps = 20, family = plot_font, parse = TRUE, show.legend = FALSE ) if(rss) { storage <- storage + geom_line(aes(x = timestamp, y = consumption, color = kind)) + geom_point( data = subset(sdata, !is.na(type)), aes(x = timestamp, y = consumption, shape = tag), color = "black", fill = "white", size = 3 ) + geom_label_repel( data = subset(sdata, !is.na(type)), aes(x = timestamp, y = consumption, label = type_label), force_pull = 0, segment.color = "black", min.segment.length = 0, max.overlaps = 20, family = plot_font, parse = TRUE, show.legend = FALSE ) }
If the TDP
parameter is provided, we add a horizontal line matching its
value.
if(!is.na(TDP)) { power <- power + geom_hline(yintercept = TDP) + geom_text( aes( 0, TDP, label = paste("TDP:", as.character(TDP), "W"), vjust = 2, hjust = 0 ), color = "black", family = plot_font, size = 5 ) }
Next, we set up the axis and their scaling. We round the Y-axis values to two decimal places.
power <- power + scale_x_continuous( "Execution time [s]", labels = function(label) sprintf("%d", as.integer(label / 1000)), limits = c(NA, NA), breaks = breaks_extended(n = 10) ) if(flops) { coeff <- 0.2 if(zoomed) { coeff <- 1.1 } power <- power + scale_y_continuous( "Power consumption [W]", labels = function(label) sprintf("%.0f", label), breaks = breaks_width( as.integer(0.25 * max(edata$consumption)) ), sec.axis = sec_axis( ~ . * 1, name = "Flop rate [Gflop/s]", labels = function(label) sprintf("%.0f", label), breaks = breaks_width( as.integer(coeff * (max(fdata$consumption) / 1000)) ) ) ) } else { power <- power + scale_y_continuous( "Power consumption [W]", labels = function(label) sprintf("%.0f", label), breaks = breaks_width( as.integer(0.25 * max(edata$consumption)) ) ) } if(rss) { y_axis_title <- ifelse( length(unique(na.omit(sdata$kind))) < 2, "RAM usage [GiB]", "Storage usage [GiB]" ) storage <- storage + scale_x_continuous( "Execution time [s]", labels = function(label) sprintf("%d", as.integer(label / 1000)), limits = c(NA, NA), breaks = breaks_extended(n = 10) ) + scale_y_continuous( y_axis_title, labels = function(label) sprintf("%.0f", label), breaks = breaks_width( as.integer(0.2 * max(sdata$consumption)) ) ) }
We end by configuring the visual aspect of the plot. We use a facet grid to draw a separate plot for each solver coupling.
if("coupled_method" %in% colnames(es_data)) { power <- power + facet_grid( . ~ coupled_method + solver, labeller = labeller( coupled_method = label_coupling, solver = label_solver ), scales = "free_x", switch = "y" ) } else { power <- power + facet_grid( . ~ solver, labeller = labeller( solver = label_solver ), scales = "free_x", switch = "y" ) } if(rss) { storage <- storage + facet_grid( . ~ solver, labeller = labeller( solver = label_solver ), scales = "free_x", switch = "y" ) + labs( shape = "Computation phase" ) } power <- power + generate_theme( color_labels = label_kind, color_override_aes = list(size = 8), shape_labels = label_tag, legend_rows_shape = 1, legend_box = "horizontal" ) if(rss) { power <- power + labs( color = "Device" ) + guides( fill = "none", shape = "none", linetype = "none" ) + theme( axis.title.x = element_blank() ) storage <- storage + generate_theme( shape_labels = label_tag, legend_rows_shape = 1, legend_box = "horizontal" ) + guides(color = "none", fill = "none") + theme( strip.text.x = element_blank(), strip.background.x = element_blank() ) return( plot_grid( power, storage, nrow = 2, ncol = 1, align = "v", axis = "b", rel_heights = c(1.1, 1) ) ) } power <- power + labs( color = "Device", shape = "Computation phase" ) + guides( fill = "none" ) return(power) }
8.6.13. es_comparison
es_comparison <- function(dataframe) { extended <- dataframe extended$etotal <- as.numeric(0) for(i in 1:nrow(extended)) { gcvb_retrieve_files(extended[i, ]) eprofile <- fromJSON(file = "./eprofile.txt") extended[i, "etotal"] <- as.numeric( eprofile$data$data$tags$es_appli_total$`joule(J)` ) extended[i, "es_appli_total"] <- as.numeric( eprofile$data$data$tags$es_appli_total$`stop(sec)` ) - as.numeric( eprofile$data$data$tags$es_appli_total$`start(sec)` ) } extended$solver <- ifelse( extended$solver == "mumps" & extended$error < 10e-12, "mumps-blr", extended$solver ) postamble <- generate_theme( color_labels = label_solver, legend_rows = 2, legend_title = element_blank() ) energy <- ggplot( data = extended, aes( x = nbpts, y = etotal, fill = solver ) ) + geom_bar(position = "dodge", stat = "identity") + scale_x_continuous( "# Unknowns (\U1D441)", breaks = extended$nbpts, labels = scientific ) + scale_y_continuous( "Energy consumption [J]", labels = scientific ) + postamble + guides(fill = "none") + theme(axis.title.x = element_blank()) time <- ggplot( data = extended, aes( x = nbpts, y = es_appli_total, fill = solver ) ) + geom_bar(position = "dodge", stat = "identity") + scale_x_continuous( "# Unknowns (\U1D441)", breaks = extended$nbpts, labels = scientific ) + scale_y_continuous( "Computation time [s]" ) + postamble rss <- ggplot( data = extended, aes( x = nbpts, y = rm_peak / 1024., fill = solver ) ) + geom_bar(position = "dodge", stat = "identity") + scale_x_continuous( "# Unknowns (\U1D441)", breaks = extended$nbpts, labels = scientific ) + scale_y_continuous( "RAM usage peak [GiB]" ) + postamble + guides(fill = "none") + theme(axis.title.x = element_blank()) return( plot_grid( energy, time, rss, nrow = 1, ncol = 3, align = "h", axis = "b", rel_widths = c(1, 1.15, 1) ) ) }
8.6.14. scalability
This function returns a plot of factorization and solve computation times relatively to available processor core count.
The column names featured in this plotting function are:
p_units
giving the count of cores,tps_facto
represents factorization time in seconds,tps_solve
represents solve time in seconds,mapping
provides the mapping configuration of MPI processes.
Then, data frame is converted to long format (see Section 8.2)
so we are able to combine times of both phases using a facet grid. Also, we
force the order of facet labels for the solver
column by refactoring the
latter.
scalability <- function(dataframe) { dataframe_long <- gather( subset(dataframe, !is.na(dataframe$nbpts)), key = "computation_phase", value = "time", c("tps_facto", "tps_solve") ) dataframe_long$solver <- factor( dataframe_long$solver, levels = unique(na.omit(dataframe_long$solver)) ) plot <- ggplot( dataframe_long, aes( x = p_units, y = time, group = mapping, color = mapping ) ) + geom_line() + geom_point(size = 2.5, shape = 16) +
As there may be a considerable scale difference between factorization and solve time values, we use log10 scale on the Y-axis to be able to combine both metrics on the same axis without loosing on readability.
scale_x_continuous( "# Cores (\U1D450)", breaks = (dataframe_long$p_units) ) + scale_y_continuous( "Computation time", labels = label_time_short, trans = "log10" ) +
Finally, using facet grid we distinguish between factorization and solve times on horizontal and between each solver considered on vertical.
For the sake of readability, we specify that scales on Y-axis can be different. It is useful when multiple solvers are considered having each considerably different time values.
Also, before returning the plot we apply the common theme parameters. For scalability and for parallel efficiency plots, we split the legend items to two lines as the labels are too long.
facet_grid( computation_phase ~ solver + nbpts, labeller = labeller( solver = label_solver, computation_phase = phase.labs, nbpts = label_nbpts ), scales = "free" ) + labs(color = "Parallel\nconfiguration") + generate_theme( color_breaks = c("node", "socket", "numa", "core"), color_labels = label_mapping, legend_rows = 4 ) return(plot) }
8.6.15. scalability_ram
This function is a variation of scalability
(see Section 8.6.14) and
returns a plot of real memory (RAM) usage peaks relatively to available
processor core count.
The column names featured in this plotting function are:
p_units
giving the count of cores,rm_peak
represents real memory usage peaks, stored in mibibytes (MiB) but converted to gibibytes (GiB),mapping
provides the mapping configuration of MPI processes.
scalability_ram <- function(dataframe) { plot <- ggplot( dataframe, aes( x = p_units, y = rm_peak / 1024., group = mapping, color = mapping ) ) + geom_line() + geom_point(size = 2.5, shape = 16) +
Note that we round the values on the Y-axis to two decimal places and define custom axis limits and breaks adapted for the current type of plot.
scale_x_continuous( "# Cores (\U1D450)", breaks = (dataframe$p_units) ) + scale_y_continuous( "Real memory (RAM) usage peaks [GiB]", labels = function(label) sprintf("%.0f", label), limits = c(NA, 126), breaks = c(0, 30, 60, 90, 120) ) +
In this case, we use using facet grid exclusively as a mean to better annotate the resulting plot.
facet_grid( . ~ solver + nbpts, labeller = labeller( solver = label_solver, nbpts = label_nbpts ) ) + labs(color = "Parallel\nconfiguration") + generate_theme( color_breaks = c("node", "socket", "numa", "core"), color_labels = label_mapping, legend_rows = 4 ) return(plot) }
8.6.16. scalability_distributed
This function returns a plot of factorization and solve computation times relatively to available processor core count.
The column names featured in this plotting function are:
p_units
giving the count of cores,tps_facto
represents factorization time in seconds,tps_solve
represents solve time in seconds,mapping
provides the mapping configuration of MPI processes.
Then, data frame is converted to long format (see Section 8.2)
so we are able to combine times of both phases using a facet grid. Also, we
force the order of facet labels for the solver
column by refactoring the
latter.
scalability_distributed <- function(dataframe) { dataframe$solver <- factor( dataframe$solver, levels = unique(na.omit(dataframe$solver)) ) color_lab <- NA if(unique(na.omit(dataframe$coupled_method)) == "multi-solve") { dataframe$block_size <- ifelse( dataframe$solver != "mumps/spido", paste0( "(", dataframe$coupled_nbrhs, ", ", dataframe$cols_schur, ")" ), paste0( "(", dataframe$coupled_nbrhs, ", ", dataframe$coupled_nbrhs, ")" ) ) color_lab <- expression( bold(group("(", list(n[italic(c)], n[italic(S)]), ")")) ) } else { dataframe$block_size <- factor( dataframe$n_blocks, levels = sort(unique(na.omit(dataframe$n_blocks))) ) color_lab <- expression( bold(n[italic(b)]) ) } plot <- ggplot( dataframe, aes( x = processes, y = tps_facto + tps_solve, group = block_size, color = block_size ) ) + geom_line() + geom_point(size = 2.5, shape = 16) +
As there may be a considerable scale difference between factorization and solve time values, we use log10 scale on the Y-axis to be able to combine both metrics on the same axis without loosing on readability.
scale_x_continuous( "# MPI processes", breaks = dataframe$processes ) + scale_y_continuous( "Computation time", labels = label_time_short, trans = "log10" ) +
Finally, using facet grid we distinguish between factorization and solve times on horizontal and between each solver considered on vertical.
For the sake of readability, we specify that scales on Y-axis can be different. It is useful when multiple solvers are considered having each considerably different time values.
Also, before returning the plot we apply the common theme parameters. For scalability and for parallel efficiency plots, we split the legend items to two lines as the labels are too long.
facet_grid( solver ~ nbpts, labeller = labeller( solver = label_solver, nbpts = label_nbpts ), scales = "free" ) + labs( color = color_lab ) + generate_theme( color_breaks = as.character(sort(unique(na.omit(dataframe$block_size)))) ) return(plot) }
8.6.17. efficiency
This function returns a plot of parallel efficiency, i.e. the percentage of computational resource usage relatively to available processor core count.
The column names featured in this plotting function are:
p_units
giving the count of cores,efficiency_facto
represents factorization phase efficiency,efficiency_solve
represents solve phase efficiency,mapping
provides the mapping configuration of MPI processes.
The function itself follows the exact same logic as the scalability
function
(see Section 8.6.14).
efficiency <- function(dataframe) { dataframe_long <- gather( subset(dataframe, !is.na(dataframe$nbpts)), key = "computation_phase", value = "efficiency", c("efficiency_facto", "efficiency_solve") ) dataframe_long$solver <- factor( dataframe_long$solver, levels = unique(na.omit(dataframe_long$solver)) ) plot <- ggplot( dataframe_long, aes( x = p_units, y = efficiency, group = mapping, color = mapping ) ) + geom_line() + geom_point(size = 2.5, shape = 16) + geom_hline(yintercept = 1) + scale_x_continuous( "# Cores (\U1D450)", breaks = (dataframe_long$p_units) ) +
There is only one additional step, i. e. to convert efficiency values from \(x\)
such as \(0 \leq x \leq 1\) to percentages using the label_percentage
label
function (see Section 8.5.1). Also, we fix our own Y-axis tick
values (breaks).
scale_y_continuous( "Parallel efficiency [%]", breaks = c(0.0, 0.2, 0.4, 0.6, 0.8, 1.0), labels = label_percentage ) + facet_grid( computation_phase ~ solver + nbpts, labeller = labeller( solver = label_solver, computation_phase = phase.labs, nbpts = label_nbpts ) ) + labs(color = "Parallel\nconfiguration") + generate_theme( color_breaks = c("node", "socket", "numa", "core"), color_labels = label_mapping, legend_rows = 4 ) return(plot) }
8.6.18. efficiency_distributed
This function returns a plot of parallel efficiency, i.e. the percentage of computational resource usage relatively to available processor core count.
The column names featured in this plotting function are:
p_units
giving the count of cores,efficiency_total
represents factorization and solve phase efficiency,mapping
provides the mapping configuration of MPI processes.
The function itself follows the exact same logic as the
scalability_distributed
function (see Section 8.6.16).
efficiency_distributed <- function(dataframe) { dataframe$solver <- factor( dataframe$solver, levels = unique(na.omit(dataframe$solver)) ) color_lab <- NA if(unique(na.omit(dataframe$coupled_method)) == "multi-solve") { dataframe$block_size <- ifelse( dataframe$solver != "mumps/spido", paste0( "(", dataframe$coupled_nbrhs, ", ", dataframe$cols_schur, ")" ), paste0( "(", dataframe$coupled_nbrhs, ", ", dataframe$coupled_nbrhs, ")" ) ) } else { dataframe$block_size <- factor( dataframe$n_blocks, levels = sort(unique(na.omit(dataframe$n_blocks))) ) color_lab <- expression( bold(n[italic(b)]) ) } print(colnames(dataframe)) plot <- ggplot( dataframe, aes( x = processes, y = efficiency_total, group = block_size, color = block_size ) ) + geom_line() + geom_point(size = 2.5, shape = 16) + geom_hline(yintercept = 1) + scale_x_continuous( "# MPI processes", breaks = dataframe$processes )
There is only one additional step, i. e. to convert efficiency values from \(x\)
such as \(0 \leq x \leq 1\) to percentages using the label_percentage
label
function (see Section 8.5.1). Also, we fix our own Y-axis tick
values (breaks).
y_ticks <- seq( from = 0.0, to = max(na.omit(dataframe$efficiency_total)), by = 0.2 ) plot <- plot + scale_y_continuous( "Parallel efficiency [%]", breaks = y_ticks, labels = label_percentage ) +
Finally, using facet grid we distinguish between factorization and solve times on horizontal and between each solver considered on vertical.
For the sake of readability, we specify that scales on Y-axis can be different. It is useful when multiple solvers are considered having each considerably different time values.
Also, before returning the plot we apply the common theme parameters. For scalability and for parallel efficiency plots, we split the legend items to two lines as the labels are too long.
facet_grid( solver ~ nbpts, labeller = labeller( solver = label_solver, nbpts = label_nbpts ), scales = "free" ) + labs( color = color_lab ) + generate_theme( color_breaks = as.character(sort(unique(na.omit(dataframe$block_size)))) ) return(plot) }
9. Post-processing StarPU execution traces
To visualize StarPU execution traces in FXT format, we also make use of R, the ggplot2 package and of the dedicated StarVZ library starvz16,starvz18.
9.1. StarVZ configuration
To configure the output plot produced by StarVZ, we use a Yaml configuration
file. We leave cosmetic options take their default values, except the base font
size (see base_size
value), and we override those specifying what kind of
information shall be included in resulting plots. In our case, we want to plot
only the StarPU worker statistics (items st
and starpu
).
default: base_size: 14 starpu: active: TRUE legend: TRUE ready: active: FALSE submitted: active: FALSE st: active: TRUE
Showing critical path time bound is not necessary.
cpb: FALSE
On the other hand, we want to display all the labels, the legend and the application time span.
labels: "ALL" legend: TRUE makespan: TRUE
To reduce the size of the resulting plot, we enable the aggregation of visualized tasks in a time step.
aggregation: active: TRUE method: "static" step: 500
9.2. Plotting FXT traces
At first, we have to determine the path to the latest gcvb
benchmark session
directory in order to be able to read the FXT trace files.
output = try( system("ls -t ~/benchmarks/results | head -n 1", intern = TRUE) ) latest_session_path = paste( "~/benchmarks/results", output, sep = "/" )
Then, we import the required R libraries and set the starvz_config
variable
holding the path to the Yaml configuration file of the StarVZ package (see
Section 9.1).
library(ggplot2) library(starvz) library(grid) starvz_config = "starvz.yaml"
Bibliography
- [guix] @miscguix, title = GNU Guix software distribution and transactional package manager, howpublished = \urlhttps://guix.gnu.org
- [Knuth84] Knuth, Literate Programming, Comput. J., 27(2), 97–111 (1984). link. doi.
- [Dominik18] Dominik, The Org Mode 9.1 Reference Manual, 12th Media Services (2018).
- [emacs] @miscemacs, title = GNU Emacs: An extensible, customizable, free/libre text editor — and more. , howpublished = \urlhttps://www.gnu.org/software/emacs/
- [OrgTangle] @miscOrgTangle, title = Org mode documentation (Extracting source code), howpublished = \urlhttps://orgmode.org/manual/Extracting-Source-Code.html
- [OrgExport] @miscOrgExport, title = Org mode documentation (Exporting), howpublished = \urlhttps://orgmode.org/manual/Exporting.html
- [gcvb] @miscgcvb, title = (g)enerate (c)ompute (v)alidate (b)enchmark, a Python 3 module aiming at facilitating non-regression, validation and benchmarking of simulation codes , howpublished = \urlhttps://github.com/felsocim/gcvb
- [proc5] @miscproc5, title = proc(5) - Linux manual page, howpublished = \urlhttps://man7.org/linux/man-pages/man5/proc.5.html
- [ESwebsite] @miscESwebsite, title = Energy Scope website, howpublished = \urlhttps://sed-bso.gitlabpages.inria.fr/datacenter/energy_scope.html
- [ESjcad2021] @miscESjcad2021, title = Energy Scope: a tool for measuring the energy profile of HPC and AI applications , author = Hervé Mathieu, year = 2021, howpublished = \urlhttps://jcad2021.sciencesconf.org/data/Herve_Mathieu_energy_scope.pdf
- [starvz-readme] @miscstarvz-readme, title = StarVZ: R-based visualization techniques for the StarPU runtime , author = Lucas M. Schnorr and Vinicius Garcia Pinto and Lucas Leando Nesi and Marcelo Cogo Miletto , addendum = (consulted on April 8, 2022), howpublished = \urlhttps://github.com/schnorr/starvz/blob/master/README.org
- [ggplot2] @miscggplot2, title = ggplot2, a system for declaratively creating graphics, howpublished = \urlhttps://ggplot2.tidyverse.org/
- [sqliteLike] @miscsqliteLike, title = The LIKE, GLOB, REGEXP, and MATCH operators, SQL Language Expressions, SQLite , howpublished = \urlhttps://www.sqlite.org/lang_expr.html#like
- [likwid] @misclikwid, title = LIKWID, author = Thomas Gruber and Jan Eitzinger and Georg Hager and Gerhard Wellein, year = 2021, month = 12, note = This research has been partially funded by grants: BMBF 01IH13009 and BMBF 01IH16012C , howpublished = \urlhttps://doi.org/10.5281/zenodo.5752537
- [treibig2010likwid] Treibig, Hager & Wellein, Likwid: A lightweight performance-oriented tool suite for x86 multicore environments , 207-216, in in: 2010 39th international conference on parallel processing workshops, edited by (2010)
- [RwideToLong] @miscRwideToLong, title = Cookbook for R: Converting data between wide and long format, howpublished = \url http://www.cookbook-r.com/Manipulating_data/Converting_data_between_wide_and_long_format/
- [starvz16] , Pinto, Stanisic, Legrand, Schnorr, , Thibault, Danjean & , Analyzing Dynamic Task-Based Applications on Hybrid Platforms: An Agile Scripting Approach , 17-24, in in: 2016 Third Workshop on Visual Performance Analysis (VPA), edited by (2016)
- [starvz18] , Garcia Pinto, Mello Schnorr, Stanisic, , Legrand, Thibault, Danjean & , A visual performance analysis framework for task-based parallel applications running on hybrid clusters , Concurrency and Computation: Practice and Experience, 30(18), e4472 (2018). link. doi.
Footnotes:
The source of this document as well as the source code files described in the latter are available for download at: https://thesis-mfelsoci.gitlabpages.inria.fr/sources/files.tar.gz