Skip to content

Automatic Splitting: User

Matthias Wolf edited this page Feb 7, 2018 · 2 revisions

Naming Conventions

  1. For the probe stage, probe jobs have a job id of the form 0-[1-9]+, these jobs cannot be resubmitted, and the task will fail if none of the probe jobs complete successfully.

  2. In the processing stage, jobs have ids [1-9][0-9]+, these jobs cannot be resubmitted. They have a limited runtime as specified in the configuration, after which they stop processing input data. Any unprocessed input data will be dealt with by the tail jobs.

  3. Finally, there are several tail stages (numbered n, n>0), with job ids n-[1-9][0-9]+. Each stage is triggered at a certain task completion level:

    • for small tasks, less than 100 jobs, one tail stage is started when all jobs have completed (successfully or failed)
    • for larger tasks, a first tail stage collects all remaining input data from the first 50% of completed jobs, followed by a stage that processes data when 80% of jobs have completed, and finally a stage collecting leftover input data at 100% job completion.

    These tail jobs deal with input data that has not been processed by the regular processing jobs. They can be resubmitted when failed.

User-side Configuration Parameters

This is an example client configuration to use the automatic splitting feature:

from CRABClient.UserUtilities import config
config = config()

config.section_("General")
config.General.requestName = 'taskname2'
config.General.instance = 'preprod'
config.General.activity = 'analysistest'

config.section_("JobType")
config.JobType.pluginName = 'Analysis'
config.JobType.psetName = 'step2_L1REPACK_HLT.py'

config.section_("Data")
config.Data.inputDataset = '/JetHT/Run2017B-v1/RAW'
config.Data.splitting = 'Automatic'
config.Data.unitsPerJob = 600

config.section_("Site")
config.Site.storageSite = 'T2_IT_Legnaro'

Notice the config.Data.splitting = 'Automatic' parameter and config.Data.unitsPerJob = 600 that indicates the amount of minutes your job will last (unitsPerJob is optional), in this case 600 minutes or 10 hours.

FAQ

Configuration

What should I do if I get the following error:

Minimum runtime requirement for automatic splitting is 600 seconds.

Please change the config.Data.unitsPerJob in your client configuration. The parameter represents the number of seconds the jobs will run and needs to be at least 600 (seconds)

Processing

Why can't I resubmit failed jobs?

The data from failed jobs are reprocessed by tail jobs. For bookkeeping reasons, it is not possible to resubmit any job whose id consists of a number only. Tail jobs, e.g., a job with an id 1-3 or 3-2, can be resubmitted.

How can I limit how much of a dataset is processed?

The most unambiguous way to do so currently is to specify a LumiList using the desired luminosity sections to be processed or using the runRange parameter.