Skip to content

Timing Measurements

Paul Nilsson edited this page Apr 25, 2022 · 2 revisions

Pilot Timing

The pilot sends a timing string to the server during the final job update with the following condense format:

pilotTiming = time_getjob | time_stagein | time_payload | time_stageout | time_total_setup

where

  • time_getjob: time for getJob curl operation to finish.
  • time_stagein: time for entire stage-in to complete, including replica lookup. Note: the pilot cannot measure the time for direct i/o as this operation is handled by the transform.
  • time_payload: time for payload execution. Note: this includes any pre- or post-processing.
  • time_stageout: time for stage-out to complete, including log transfer.
  • time_total_setup: the total setup time is the time measured from pilot startup to the get job operation. During this time the pilot downloads queue data, checks the proxy lifetime, etc.

CPU Consumption

The Pilot reports CPU timing information on every server update. The measurements (system+user time for all child processes) are done during running approximately once a minute (using /prod/pid/stat) and a final measurement done immediately after the payload has finished (using os.times()).

Given an initial t0, user+system time is calculated like so:

  • t1 = os.times()
  • user_time = t1[2] - t0[2]
  • system_time = t1[3] - t0[3]

The instant CPU timing calculation extracts the system+user time from /proc/pid/stat for a given pid (using os.sysconf_names['SC_CLK_TCK'] for conversion) and loops over all child process stat files.