Running ufs s2s model regression test using rt.sh

Minsuk Ji (Minsuk.Ji@noaa.gov), Jun Wang, Dusan Jovic
google slide

Shell script-based Regression Test: rt.sh

rt.sh calls:

detect_machine.sh
compile.sh, compile.sh calls GNUmakefile
run_test.sh, run_test.sh calls rt_fv3.sh, rt_fv3.sh calls rt_utils.sh
run_tests.sh uses following input files:
- rt.conf
- default_vars.sh
- <test-name>
- <run-setup-name>

detect_machine.sh: detect and assign machine name, set account (nems is default)
compile.sh: build a model using GNUmakefile (make app=*)
Run regression test
- run_test.sh: sets environment variables, run directory, etc., and calls rt_fv3.sh
- rt_fv3.sh: prepares a canned case in the run directory, and calls rt_utils functions
- rt_utils.sh: contains utility functions, e.g.,
  - submit_and_wait
  - check_results
  - rocoto_create_compile_task, rocoto_create_run_task, rocoto_run, rocoto_kill
  - ecflow_create_compile_task, ecflow_create_run_task, ecflow_run, ecflow_kill

rt.conf: specify compile and run cases corresponding to ../compsets/all.input
COMPILE: specify <appBuilder-name> located in ../
RUN: specify <test-name> located in tests/
Each row is processed sequentially
Workflow managers: RUN depends on preceding COMPILE. Currently, only one COMPILE at a time (cf. ufs-weather-model)

Two levels to set simulation parameters
- default_vars.sh sets default values, similar to cpl_defaults in ../compsets/fv3mom6cice5.input
- <test-name> overrides default values, adds test-specific parameters, e.g.,
  - SYEAR=2013, FHMAX=24, FDIAG=6, WLCLK=30
Set environment variables that are passed onto various template files in ../parm/
- input.*.nml.IN
- nems.configure.*.IN
- model_configure.IN
Specify configuration templates to use, e.g.,
- INPUT_NML=”input.mom6_ccpp.nml.IN”
- NEMS_CONFIGURE=”nems.configure.med_atm_ocn_ice_wav.IN”
- FV3_RUN=”cpld_fv3_mom6_cice_atm_flux_run.IN”

Set up input data, grid data, etc. by copying files from baseline directory to run directory
Baseline directory contains
- Subdirectories for input data (e.g., CICE_IC, MOM6_IC, FV3_input_data, MEDIATOR_ccpp)
- Subdirectories for previous run results (e.g., RT-Baselines_2d_warm_ccpp384)
Make sure directories and files exist in RTPWD

Baseline directory (RTPWD)
- Hera: /scratch1/NCEPDEV/nems/emc.nemspara/RT/FV3-MOM6-CICE5/develop-YYYYMMDD
- Orion: /work/noaa/stmp/jminsuk/RT/FV3-MOM6-CICE5/develop-20200504 (temporary)
Run directory root (RUNDIR_ROOT)
- Hera: /scratch1/NCEPDEV/stmp2/${USER}/S2S_RT/rt_$$
- Orion: /work/noaa/stmp/${USER}/stmp/${USER}/S2S_RT/rt_$$
- RUNDIR=${RUNDIR_ROOT}/${TEST_NAME}
New baseline directory (NEW_BASELINE)
- Hera: /scratch1/NCEPDEV/stmp4/${USER}/S2S_RT/REGRESSION_TEST_INTEL
- Orion: /work/noaa/stmp/${USER}/stmp/${USER}/S2S_RT/REGRESSION_TEST_INTEL

Triggered by COMPILE row in rt.conf with specified <appBuilder-name>
As in NEMSCompsetRun, build is done using GNUmakefile in ../NEMS/
compile.sh is a simple wrapper around GNUmakefile to interface with rt.sh
- $ ./compile.sh coupledFV3_CCPP_MOM6_CICE
- make app=coupledFV3_CCPP_MOM6_CICE build
If you prefer to build exe file separately (i.e., w/o rt.sh), place a copy in ufs-s2s-model/tests
- $ cp ../NEMS/exe/NEMS.x fcst_0.exe
- $ cp ../NEMS/src/conf/modules.nems modules.fcst_0
If you want to reuse your exe, keep a copy with a different name

If you make code changes that are not expected to change simulation results, you can run full regression tests afterward to demonstrate your changes do not break anything
Currently, there are 14 standard regression tests on Hera and Orion
In ufs-s2s-model/tests/ directory, use any one of the following:
- $ ./rt.sh -f >output 2>&1 &
- $ ./rt.sh -f -e (use ecFlow)
- $ ./rt.sh -fr (use Rocoto)
- $ ./rt.sh -fek (use ecFlow, keep run directory for post-run diagnosis)

Create a file, say my_test.conf, with a single COMPILE and a single RUN
- $ cp rt.conf my_test.conf
- $ vi my_test.conf
- $ ./rt.sh -l my_test.conf
Or make a copy of original rt.conf file
- $ cp rt.conf rt.conf.orig
- $ vi rt.conf
- $ ./rt.sh -f

Your code changes are expected to change simulation results (e.g., physics change), and thus cannot be compared against existing baseline results
You still need RTPWD as it contains the simulation input data
./rt.sh -c -f OR ./rt.sh -c -l my_test.conf
- rt.sh will copy input data from RTPWD to NEW_BASELINE
- Simulation results will be copied from RUNDIR to NEW_BASELINE
If warm start (i.e., requires mediator files generated by a cold run)
- Run the corresponding cold run -- this will generate NEW_BASELINE/MEDIATOR_*/
- Use a new directory for NEW_BASELINE
- Change RTPWD to old NEW_BASELINE directory, which contains input and MEDIATOR_*
- ./rt.sh -c
Manually move your NEW_BASELINE to emc.nemspara

Configuration files (select, or copy and modify):
- rt.conf
- tests/<test-name>
- tests/fv3_conf/
- ../parm/input.*.nml.IN
- ../parm/nems.configure.*.IN
- ../parm/model_configure.IN
- ../parm/ice_in_template
- ../parm/MOM_input_template
./rt.sh -c -l my_test.conf
- Will not compare with existing baselines
If your case requires new input data not in RTPWD, set RTPWD to your local directory

Remove COMPILE row in rt.conf
$ cp ../NEMS/exe/NEMS.x fcst_0.exe
$ cp ../NEMS/src/conf/modules.nems modules.fcst_0
- This module file needs to be identical to the one you used for build
$ ./rt.sh -f
This approach does not work with workflow managers because RUN depends on COMPILE

Summary files
- Hera: RegressionTests_hera.intel.log, Compile_hera.intel.log
- Orion: RegressionTests_orion.intel.log, Compile_orion.intel.log
- MISSING file, MISSING baseline, OK, NOT OK...
./rt.sh >output 2>&1 &: output of rt.sh
Log files in log_hera.intel/ and log_orion.intel/
- compile_*.log: output of compile.sh and GNUmakefile
- run_*.log: output of run_test.sh
Run directory RUNDIR_ROOT/
- .log: output of rt_fv3.sh. If rocoto used, also contains err & out from sbatch job
- subdir: contains all files necessary for simulation, e.g., sbatch job_card
- QUEUE is set to batch in rt.sh