Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial testing framework #2095

Merged
merged 12 commits into from
Sep 19, 2024

Conversation

islas
Copy link
Collaborator

@islas islas commented Aug 9, 2024

TYPE: enhancement

KEYWORDS: testing, regression, test framework

SOURCE: internal

DESCRIPTION OF CHANGES:
Problem:
The current regression suite code is complex, requires maintenance of multiple alternate repositories, and takes involved effort to add a new test making community contribution limited at best. Likewise, the complexity of the system reduces the likelihood of independent local testing of changes, leading to a development cycle of one-off commits done to reinvoke testing to see if meaningful commits fix the issues.

Solution:
This new proposed regression suite addresses these shortcomings in a number of discrete ways:

  1. Modularize the testing framework to an generalized independent repo usable by any repo seeking to set up tests that can run locally, on HPC systems, and within any CI/CD framework
  2. Write WRF-specific test scripts inside the WRF repo and in a manner that does not rely on specific layouts/hardware/etc. so long as WRF can compile and run on intended system (i.e. able to be run locally)
  3. Write CI/CD tests in a simple and generally CI/CD framework-agnostic method where definitions of these also reside within the WRF repo
  4. Utilize HPC resources in a safe manner to increase breadth of testing to allow testing of many more compilers and on similar hardware to the general use case of WRF

As a first pass at demonstrating this solution, this PR implements a simple set of compilation tests using GNU x86 configurations testing serial, sm, dm, and sm+dm selections. The CI/CD portion is done via GitHub workflow actions on a specific trigger event. The values and trigger methods are configurable, but this initial implementation will use the labeled trigger, which will initiate tests when compile-tests or all-tests is added as a label to a pull request.

TESTS CONDUCTED:

  1. Testing of this github workflow was done in a separate fork also testing on Derecho. Both positive and negative tests were used to demonstrate respective output usefulness.

RELEASE NOTE:
Introduce a modularized testing framework that allows testing locally and natively on HPC systems that lives within the WRF repository

islas added 7 commits August 8, 2024 14:41
In order to run test scripts outside of a testing framework, the handling of
environment setup should not be solely dependent on running within a dedicated
test framework. This has the added benefit of compartmentalizing the duties of
environment and dependency solving from running the tests.

These environment scripts allow for the selection of a particular environment
with the default being the fqdn of the current host. From there, arguments are
routed using standard POSIX-sh to a respective script. In the case of Derecho
(applicable to any system using lmod) all subsequent argument are treated as
modules to load into the current session.

The hostenv.sh script relies on one "argument" $AS_HOST being passed in via
variable setting to facilitate selection.

The helpers.sh script provides convenience features for substing checking in sh,
delayed environment variable expansion via eval, and quick banner creation.

The derecho.sh script is included as the first supported environment.
This script will facilitate the first tests. There are only three requirements
of any given test script with the planned testing framework. If a different
testing framework is used in the future, these requirements of the test scripts
can and should be re-evaluated.

The test script should :
1. Take the intended host / configuration environment as the first argument
2. Take the working directory to immediately change to as the second argument
3. Output some key phrase at the end of the test to denote success, anything else
   (non-zero exit code, no phrase but return zero) is a failure

This particular compilation test script satisfies the above while also providing
enough flexibility to select compile target, stanza configuration, parallel jobs,
and other command-line options into the make build.

Additionally, for convenience environment variables can be passed in as command-line
options to the test script to modularize certain inputs.
Following the documentation of the hpc-workflows testing framework and the
testing structure found in .ci/, a JSON file for a GNU compilation test was added.
This test will compile the em_real core using the GNU Linux x86 stanza configuration.

All other options are left as default. If this test is run using the derecho
configuration the appropriate modules will attempt to be loaded. For non-derecho
environments, per the testing structure under .ci/, if no configuration exists in
.ci/hostenv.sh then the current environment wil be used verbatim.
This reusable workflow balances quick setup with github actions-specific features.
It assumes that the tests can be controlled via a label being set in a PR.

To coordinate PR vs primary branch testing, a suffix is generated using either
the PR number or the branch name. This suffix is then used to relocate log files
to an archival location in an organized fashion. Github artifacts are still used
for failed test capture, but logs will also be moved to the archive location for
quicker access if one has access to where these tests execute.

To allow for parallelized testing available from hpc-workflows, the workflow can
make duplicate directories of the repository that can each run their own test
instance without clobbering files.

Once tests are run, results are gathered, relocated to archival location,
reported and printed to the screen, summarized into the actions summary page,
and then packaged into an artifact if failure occured.

Finally, the test label is removed if the named tests and label match.
This pipeline is triggered if any pushes occur on master or develop OR if a PR
is labeled with an appropriate tag as specified by the tests within this
workflow. Additionally, a specific label to trigger all tests can be used that
will be removed from the PR when all tests finish, regardless of exit status.

The pipeline makes extensive use of the reusable test_workflow.yml to
instantiate tests on runners.

This pipeline currently only includes the definition for one test to be run on
a github runner with tags that satisfy "derecho". Likewise, other hard-coded
values appearing in here assume a particular runner setup and environment.
@islas islas requested a review from a team as a code owner August 9, 2024 21:20
@islas
Copy link
Collaborator Author

islas commented Aug 9, 2024

I'm using the approach we're using in MPAS to setup testing with a very limited minimal setup (simple compilation tests) at first to get something started.

The idea would be to then gradually translate the current tests to a usable format by this framework.

@weiwangncar
Copy link
Collaborator

The regression test results:

Test Type              | Expected  | Received |  Failed
= = = = = = = = = = = = = = = = = = = = = = = =  = = = =
Number of Tests        : 23           24
Number of Builds       : 60           57
Number of Simulations  : 158           150        0
Number of Comparisons  : 95           86        0

Failed Simulations are: 
None
Which comparisons are not bit-for-bit: 
None

.ci/env/derecho.sh Outdated Show resolved Hide resolved
@mgduda mgduda self-requested a review September 16, 2024 23:21
@mgduda mgduda self-requested a review September 17, 2024 00:32
@islas islas merged commit 958ce12 into wrf-model:release-v4.6.1 Sep 19, 2024
3 of 12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants