Initial testing framework (#2095)

TYPE: enhancement KEYWORDS: testing, regression, test framework SOURCE: internal DESCRIPTION OF CHANGES: Problem: The current regression suite code is complex, requires maintenance of multiple alternate repositories, and takes involved effort to add a new test making community contribution limited at best. Likewise, the complexity of the system reduces the likelihood of independent local testing of changes, leading to a development cycle of one-off commits done to reinvoke testing to see if meaningful commits fix the issues. Solution: This new proposed regression suite addresses these shortcomings in a number of discrete ways: 1. Modularize the testing framework to an generalized independent repo usable by any repo seeking to set up tests that can run locally, on HPC systems, and within any CI/CD framework 2. Write WRF-specific test scripts _inside_ the WRF repo and in a manner that does not rely on specific layouts/hardware/etc. so long as WRF can compile and run on intended system (i.e. able to be run locally) 3. Write CI/CD tests in a simple and generally CI/CD framework-agnostic method where definitions of these also reside _within the WRF repo_ 4. Utilize HPC resources in a safe manner to increase breadth of testing to allow testing of many more compilers and on similar hardware to the general use case of WRF As a first pass at demonstrating this solution, this PR implements a simple set of compilation tests using GNU x86 configurations testing serial, sm, dm, and sm+dm selections. The CI/CD portion is done via GitHub workflow actions on a specific trigger event. The values and trigger methods are configurable, but this initial implementation will use the `labeled` trigger, which will initiate tests when `compile-tests` or `all-tests` is added as a label to a pull request. TESTS CONDUCTED: 1. Testing of this github workflow was done in a separate fork also testing on Derecho. Both positive and negative tests were used to demonstrate respective output usefulness. RELEASE NOTE: Introduce a modularized testing framework that allows testing locally and natively on HPC systems that lives within the WRF repository
wrf-model · Sep 19, 2024 · 958ce12 · 958ce12
1 parent 1d86bcb
commit 958ce12
Show file tree

Hide file tree

Showing 10 changed files with 513 additions and 0 deletions.
diff --git a/.ci/env/derecho.sh b/.ci/env/derecho.sh
@@ -0,0 +1,22 @@
+#!/bin/sh
+
+echo "Setting up derecho environment"
+workingDirectory=$PWD
+. /etc/profile.d/z00_modules.sh
+echo "Loading modules : $*"
+cmd="module purge"
+echo $cmd && eval "${cmd}"
+
+# We should be handed in the modules to load
+while [ $# -gt 0 ]; do 
+  cmd="module load $1"
+  echo $cmd && eval "${cmd}"
+  shift
+done
+
+#  Go back to working directory if for unknown reason HPC config changing your directory on you
+if [ "$workingDirectory" != "$PWD" ]; then
+  echo "derecho module loading changed working directory"
+  echo "  Moving back to $workingDirectory"
+  cd $workingDirectory
+fi
diff --git a/.ci/env/helpers.sh b/.ci/env/helpers.sh
@@ -0,0 +1,46 @@
+#!/bin/sh
+
+# Useful string manipulation functions, leaving in for posterity
+# https://stackoverflow.com/a/8811800
+# contains(string, substring)
+#
+# Returns 0 if the specified string contains the specified substring,
+# otherwise returns 1.
+contains()
+{
+  string="$1"
+  substring="$2"
+
+  if [ "${string#*"$substring"}" != "$string" ]; then
+    echo 0    # $substring is in $string
+  else
+    echo 1    # $substring is not in $string
+  fi
+}
+
+setenvStr()
+{
+  # Changing IFS produces the most consistent results
+  tmpIFS=$IFS
+  IFS=","
+  string="$1"
+  for s in $string; do
+    if [ ! -z $s ]; then 
+      eval "echo export \"$s\""
+      eval "export \"$s\""
+    fi
+  done
+  IFS=$tmpIFS
+}
+
+banner()
+{
+  lengthBanner=$1
+  shift
+  # https://www.shellscript.sh/examples/banner/
+  printf "#%${lengthBanner}s#\n" | tr " " "="
+  printf "# %-$(( ${lengthBanner} - 2 ))s #\n" "`date`"
+  printf "# %-$(( ${lengthBanner} - 2 ))s #\n" " "
+  printf "# %-$(( ${lengthBanner} - 2 ))s #\n" "$*"
+  printf "#%${lengthBanner}s#\n" | tr " " "="
+}
diff --git a/.ci/env/hostenv.sh b/.ci/env/hostenv.sh
@@ -0,0 +1,16 @@
+#!/bin/sh
+
+# Allow selection of hostname, and if none is provided use the current machine
+# While this may seem unintuitive at first, it provides the flexibility of using
+# "named" configurations without being explicitly tied to fqdn
+hostname=$AS_HOST
+if [ -z "$hostname" ]; then
+  hostname=$( python3 -c "import socket; print( socket.getfqdn() )" )
+fi
+
+if [ $( contains ${hostname} hsn.de.hpc ) -eq 0 ]; then
+  # Derecho HPC SuSE PBS
+  . .ci/env/derecho.sh
+else
+  echo "No known environment for '${hostname}', using current"
+fi
diff --git a/.ci/hpc-workflows b/.ci/hpc-workflows
diff --git a/.ci/tests/build.sh b/.ci/tests/build.sh
@@ -0,0 +1,108 @@
+#!/bin/sh
+help()
+{
+  echo "./build.sh as_host workingdir [options] [-- <hostenv.sh options>]"
+  echo "  as_host                   First argument must be the host configuration to use for environment loading"
+  echo "  workingdir                First argument must be the working dir to immediate cd to"
+  echo "  -c                        Configuration build type, piped directly into configure"
+  echo "  -n                        Configuration nesting type, piped directly into configure"
+  echo "  -o                        Configuration optstring passed into configure"
+  echo "  -b                        Build command passed into compile"
+  echo "  -e                        environment variables in comma-delimited list, e.g. var=1,foo,bar=0"
+  echo "  -- <hostenv.sh options>   Directly pass options to hostenv.sh, equivalent to hostenv.sh <options>"
+  echo "  -h                  Print this message"
+  echo ""
+  echo "If you wish to use an env var in your arg such as '-c \$SERIAL -e SERIAL=32', you must"
+  echo "you will need to do '-c \\\$SERIAL -e SERIAL=32' to delay shell expansion"
+}
+
+echo "Input arguments:"
+echo "$*"
+
+AS_HOST=$1
+shift
+if [ $AS_HOST = "-h" ]; then
+  help
+  exit 0
+fi
+
+workingDirectory=$1
+shift
+
+cd $workingDirectory
+
+# Get some helper functions
+. .ci/env/helpers.sh
+
+while getopts c:n:o:b:e:h opt; do
+  case $opt in
+    c)
+      configuration="$OPTARG"
+    ;;
+    n)
+      nesting="$OPTARG"
+    ;;
+    o)
+      configOpt="$OPTARG"
+    ;;
+    b)
+      buildCommand="$OPTARG"
+    ;;
+    e)
+      envVars="$envVars,$OPTARG"
+    ;;
+    h)  help; exit 0 ;;
+    *)  help; exit 1 ;;
+    :)  help; exit 1 ;;
+    \?) help; exit 1 ;;
+  esac
+done
+
+shift "$((OPTIND - 1))"
+
+# Everything else goes to our env setup
+. .ci/env/hostenv.sh $*
+
+# Now evaluate env vars in case it pulls from hostenv.sh
+if [ ! -z "$envVars" ]; then
+  setenvStr "$envVars"
+fi
+
+# Re-evaluate input values for delayed expansion
+eval "configuration=\"$configuration\""
+eval "nesting=\"$nesting\""
+eval "configOpt=\"$configOpt\""
+eval "buildCommand=\"$buildCommand\""
+
+./clean -a
+
+echo "Compiling with option $configuration nesting=$nesting and additional flags '$configOpt'"
+./configure $configOpt << EOF
+$configuration
+$nesting
+EOF
+
+if [ ! -f configure.wrf ]; then
+  echo  "Failed to configure"
+  exit 1
+fi
+
+echo "./compile $buildCommand"
+./compile $buildCommand
+
+result=$?
+
+if [ $result -ne 0 ]; then
+  echo "Failed to compile"
+  exit 1
+fi
+
+# And a *very* special check because WRF compiles the WRF way and force-ignores all make errors
+# putting the onus on US to check for things
+if [ ! -x ./main/wrf.exe ]; then # There's a bunch of other execs but this is the most important and 
+                                 # doing more checks to accomodate just reinforces this bad design
+  echo "Failed to compile"
+  exit 1
+fi
+
+echo "TEST $(basename $0) PASS"
diff --git a/.ci/wrf_compilation_tests-make.json b/.ci/wrf_compilation_tests-make.json
@@ -0,0 +1,69 @@
+{
+  "submit_options" :
+  {
+    "timelimit" : "00:20:00",
+    "working_directory" : "..",
+    "arguments" :
+    {
+      "base_env_numprocs"      : [ "-e", "NUM_PROCS=4" ],
+
+      ".*make.*::args_nesting"       : [ "-n", "1" ],
+      ".*make.*::args_configopt"     : [ "-o", "-d" ],
+      ".*make.*::args_build_tgt"     : [ "-b", "em_real -j $NUM_PROCS" ]
+    },
+    "hsn.de.hpc"  :
+    {
+      "submission" : "PBS",
+      "queue"      : "main",
+      "hpc_arguments"  : 
+      {
+        "node_select" : { "-l " : { "select"       : 1, "ncpus" : 16 } },
+        "priority"    : { "-l " : { "job_priority" : "economy"       } }
+      },
+      "arguments"  : 
+      {
+        "base_env_numprocs"      : [ "-e", "NUM_PROCS=16" ],
+        "very_last_modules"       : [ "netcdf" ],
+        ".*gnu.*::test_modules"   : [ "gcc" ],
+        ".*intel(?!-llvm).*::test_modules" : [ "intel-classic" ],
+        ".*intel-llvm.*::test_modules"     : [ "intel-oneapi" ],
+        ".*pgi.*::test_modules"   : [ "nvhpc" ],
+        ".*dm.*::test_mpi_module" : [ "cray-mpich" ]
+      }
+    }
+  },
+  "make-gnu" :
+  {
+    "steps" :
+    {
+      "serial" :
+      {
+        "command"      : ".ci/tests/build.sh",
+        "arguments"    : [ "-c", "32" ]
+      },
+      "sm" :
+      {
+        "command"      : ".ci/tests/build.sh",
+        "arguments"    : [ "-c", "33" ],
+        "dependencies" : { "serial" : "afterany" }
+      }
+    }
+  },
+  "make-gnu-mpi" :
+  {
+    "steps" :
+    {
+      "dm" :
+      {
+        "command"      : ".ci/tests/build.sh",
+        "arguments"    : [ "-c", "34" ]
+      },
+      "dm+sm" :
+      {
+        "command"      : ".ci/tests/build.sh",
+        "arguments"    : [ "-c", "35" ],
+        "dependencies" : { "dm" : "afterany" }
+      }
+    }
+  }
+}
diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml
@@ -0,0 +1,97 @@
+name: Regression Suite
+run-name : ${{ github.event_name == 'push' && 'CI' || github.event.label.name }} (${{ github.event_name }})
+
+on:
+  push:
+    branches: [ master, develop ]
+# See https://stackoverflow.com/a/78444521 and 
+# https://github.com/orgs/community/discussions/26874#discussioncomment-3253755
+# as well as official (but buried) documentation :
+# https://docs.github.com/en/actions/writing-workflows/choosing-when-your-workflow-runs/events-that-trigger-workflows#pull-request-events-for-forked-repositories-2
+  pull_request:
+    types:    [ labeled ]
+
+# https://docs.github.com/en/actions/sharing-automations/reusing-workflows#supported-keywords-for-jobs-that-call-a-reusable-workflow
+# Also https://stackoverflow.com/a/74959635
+# TL;DR - For public repositories the safest approach will be to use the default read permissions, but at the cost
+# of not being able to modify the labels. That will need to be a separate [trusted] workflow that runs from the base repo
+# permissions :
+#   contents : read
+#   pull-requests : write
+
+# Write our tests out this way for easier legibility
+# testsSet    :
+#   - key : value
+#     key : value
+#     tests :
+#       - value
+#       - value
+#   - < next test >
+# https://stackoverflow.com/a/68940067
+jobs:
+  buildtests:
+    if : ${{ github.event.label.name == 'compile-tests' || github.event.label.name == 'all-tests' || github.event_name == 'push' }}
+    strategy:
+      max-parallel: 4
+      fail-fast: false
+      matrix:
+
+        testSet  :
+          - host : derecho
+            hpc-workflows_path : .ci/hpc-workflows
+            archive : /glade/work/aislas/github/runners/wrf/derecho/logs/
+            account : NMMM0012
+            name : "Make Compilation Tests"
+            id   : make-tests
+            fileroot : wrf_compilation_tests-make
+            args : -j='{"node_select":{"-l ":{"select":1}}}'
+            pool  : 8
+            tpool : 1
+            mkdirs : true
+            tests :
+              - make-gnu
+              - make-gnu-mpi
+              # add new compilation tests here
+
+    uses : ./.github/workflows/test_workflow.yml
+    with :
+      # This should be the only hard-coded value, we don't use ${{ github.event.label.name }}
+      # to avoid 'all-tests' to be used in this workflow
+      label    : compile-tests
+
+      # Everything below this should remain the same and comes from the testSet matrix
+      hpc-workflows_path : ${{ matrix.testSet.hpc-workflows_path }}
+      archive  : ${{ matrix.testSet.archive }}
+      name     : ${{ matrix.testSet.name }}
+      id       : ${{ matrix.testSet.id }}
+      host     : ${{ matrix.testSet.host }}
+      fileroot : ${{ matrix.testSet.fileroot }}
+      account  : ${{ matrix.testSet.account }}
+      tests    : ${{ toJson( matrix.testSet.tests ) }}
+      mkdirs   : ${{ matrix.testSet.mkdirs }}
+      args     : ${{ matrix.testSet.args }}
+      pool     : ${{ matrix.testSet.pool }}
+      tpool    : ${{ matrix.testSet.tpool }}
+    # I am leaving this here for posterity if this is to be replicated in private repositories for testing
+    permissions:
+      contents: read
+      pull-requests: write
+    name : Test ${{ matrix.testSet.name }} on ${{ matrix.testSet.host }}
+
+  # In the event that 'all-tests' is used, this final job will be the one to remove
+  # the label from the PR
+  removeAllLabel :
+    if : ${{ !cancelled() && github.event.label.name == 'all-tests' }}
+    name : Remove 'all-tests' label
+    runs-on: ubuntu-latest
+    needs : [ buildtests ] # Put tests here to make this wait for the tests to complete
+    steps: 
+      - name : Remove '${{ github.event.label.name }}' label
+        env:
+          PR_NUMBER: ${{ github.event.number }}
+        run: |
+          curl \
+            -X DELETE \
+            -H "Accept: application/vnd.github.v3+json" \
+            -H 'Authorization: token ${{ github.token }}' \
+            https://api.github.com/repos/${GITHUB_REPOSITORY}/issues/${PR_NUMBER}/labels/${{ github.event.label.name }}