Skip to content
Stefano Belforte edited this page Jun 24, 2024 · 8 revisions

Goals:

  • prevent one (a few) user from monopolizing resources
  • give new users a change to get some work through even if slowly when resources are saturated

Design

  1. introduce a new task status before NEW: WAITING

  2. crab submit puts tasks in WAITING

  3. Option 1

    a new TW-like component handles tasks in WAITING moving them in NEW "a bit at a time"
    pros: no changes to current TW
    cons: a new service to start/manage

  4. Option 2

    in TW MasterWorker introduce a selectWork step before lockWork which moves tasks from WAITING to NEW
    pros: only run TW, like now, all code stays together
    cons: TW becomes more complex, bugs in new code may make everything crash

Stefano favours Option 2

Implementation

skeleton (likely suited for Summer Student)

  1. modify list of task statuses in code and documentation
  2. have an initial transparent selectWork which finds all tasks in WAITING and moves to NEW. Reuse code (20 lines !) from lockWork but make it call an (initially trivial) external scheduling method. Insert it at beginning of this loop https://github.com/dmwm/CRABServer/blob/f88a9b98f80f9a38e3180ed0a5b2b193a75ee5c0/src/python/TaskWorker/MasterWorker.py#L417-L419 but it should be possible to have it act at a lower frequency
  3. modify handling of submission in REST to put tasks in WAITING. Change https://github.com/dmwm/CRABServer/blob/f88a9b98f80f9a38e3180ed0a5b2b193a75ee5c0/src/python/CRABInterface/DataWorkflow.py#L178
  4. modify crab status to properly inform user when task is in WAITING
  5. add a waiting reason in the message field in the task table

muscle

  1. improve selectWork by adding a simple algorithm which e.g. picks tasks in round robin among all users (possibly achievable for Summer Student)
  2. further improve adding knowledge of how many resources a user is using now and some fair share algorithm for this we need to differentiate tasks in progress vs. completed (i.e. Dagman running or not) in the Task table
  3. add pruning of tasks queue: users able to kill, us/TW able to "give it a cut"
  4. report queue status back to user and possibly refuse further submissions