Skip to content

Automatic Splitting: TaskWorker

Matthias Wolf edited this page Feb 7, 2018 · 3 revisions

Naming Conventions

  1. For the probe stage, probe jobs have a job id of the form 0-[1-9]+, and are submitted by a DAG named RunJobs.dag.
  2. In the processing stage, jobs have ids [1-9][0-9]+, and are contained in RunJobs0.dag.
  3. Finally, for every tail process (numbered n, n>0), job ids are n-[1-9][0-9]+, and are in RunJobsn.dag.

Configuration Parameters for the TaskWorker

The following lists the parameters for automatic splitting that are used in the code with their default values:

config.TaskWorker.minAutomaticRuntimeMins = 180
config.TaskWorker.numAutomaticProbes = 5
config.TaskWorker.minAutomaticTailSize = 100
config.TaskWorker.minAutomaticTailTriggers = [50, 80, 100]

The minimum runtime for a job is considered 3 hours, as per minAutomaticRuntimeMins. The number of splitting probe jobs is controlled by numAutomaticProbes. For tail jobs, if less than minAutomaticTailSize processing jobs are present, one tail DAG will be created. Otherwise, minAutomaticTailTriggers lists the percentages of completed processing jobs that trigger the creation of a tail DAG. With the above, when 50% of the processing jobs have completed, the first tail DAG will start, the next at 80%, and the final at 100%.

Timing of the probe and tail jobs can be controlled via

config.TaskWorker.automaticProbeRuntimeMins = 15
config.TaskWorker.automaticTailRuntimeMinimumMins = 45
config.TaskWorker.automaticTailRuntimeFraction = 0.2

The default runtime for the probes is 15 minutes, and either 20% of the user-set processing runtime or 45 minutes for the tails, whichever is longer.

To avoid the generated jobs fail due to excessive disk usage, a cap of 5 GB per job is put in place and can be configured:

config.TaskWorker.automaticOutputSizeMaximum = 5 * 1000**3