Skip to content

Timing Measurements

Paul Nilsson edited this page Jun 30, 2022 · 2 revisions

Pilot Timing

The pilot sends a timing string to the server during the final job update with the following condense format:

pilotTiming = time_getjob | time_stagein | time_payload | time_stageout | time_initial_setup | time_payload_setup

(was: pilotTiming = time_getjob | time_stagein | time_payload | time_stageout | time_total_setup)

where

  • time_getjob: time for getJob curl operation to finish.
  • time_stagein: time for entire stage-in to complete, including replica lookup. Note: the pilot cannot measure the time for direct i/o as this operation is handled by the transform.
  • time_payload: time for payload execution. Note: this includes any pre- or post-processing.
  • time_stageout: time for stage-out to complete, including log transfer.
  • time_initial_setup: the initial setup time is the time measured from pilot startup to the get job operation. During this time the pilot downloads queue data, checks the proxy lifetime, etc.
  • time_payload_setup: the time measured from before to after the payload setup (in case a '%H:%M:%S %Y/%m/%d'-time/date string is present at the beginning of the payload.stdout, the pilot will use it to improve the setup time measurement at the end of the payload)

(as of June 2022, currently the time_total_setup is still being reported which means time_initial_setup + time_payload_setup).

CPU Consumption

The Pilot reports CPU timing information on every server update. The measurements (system+user time for all child processes) are done during running approximately once a minute (using /prod/pid/stat) and a final measurement done immediately after the payload has finished (using os.times()).

Given an initial t0, user+system time is calculated like so:

  • t1 = os.times()
  • user_time = t1[2] - t0[2]
  • system_time = t1[3] - t0[3]

The instant CPU timing calculation extracts the system+user time from /proc/pid/stat for a given pid (using os.sysconf_names['SC_CLK_TCK'] for conversion) and loops over all child process stat files.