Skip to content

Action chains implementation

Johannes Hahn edited this page Sep 14, 2022 · 1 revision

This page describes the implementation of the action chains for regular Salt minion and for SSH minions.

Overview

The action chain implementation for Salt uses the same database tables used by the traditional clients.

   +------------------------------+                   +----------------------------------------+ 
   |                              |                   |                                        | 
   |        rhnActionChain        |                   |         rhnActionChainEntry            | 
   |                              |                   |                                        | 
   --------------------------------                   ------------------------------------------ 
   |                              |                   |                                        | 
   | id                           |                   |  actionchain_id -> rhnActionChain(id)  | 
   | label                        | 1               * |  action_id -> rhnAction(id)            | 
   | user_id -> web_contact(id)   <--------------------  server_id -> rhnServer(id)            | 
   | created                      |                   |  sort_order                            | 
   | modified                     |                   |  created                               | 
   |                              |                   |  modified                              | 
   |                              |                   |                                        | 
   |                              |                   |                                        | 
   |                              |                   |                                        | 
   |                              |                   |                                        | 
   +------------------------------+                   +----------------------------------------+ 

An action chain has one or more entries. Each entry points to an Action and to a Server entity. For each server there is one action being created, even if the actions are added from SSM and target multiple servers. The Action doesn't have any corresponding ServerAction until it's executed. The target server for the action is stored in the ActionChainEntry.

The entries are ordered according to the field sort_order.

When the action chain is executed by Taskomatic:

  • ServerActions are created to store the result of the execution
  • for each minion an .sls file is generated that contains states for each action in the chain. This .sls file is then applied to the target minion by a custom module mgractionchains. Under the hood this module does a state.sls <generated sls>.
  • the action chain is deleted from the database

Action chain execution

The steps are the following:

  1. The user creates the action chain by adding one or more actions to it. ActionChainEntry objects are stored in the db, one for each target server. No ServerActions are created yet.
  2. The user schedules the execution at a certain date and time
  3. Taskomatic executes the action chain at the schedule time by invoking MinionActionChainExecutor:
    1. For each minion, ServerAction and the LocalCall objects are created.

    2. The LocalCall objects are converted into SaltState objects. Reboot actions are conerted into SaltSystemReboot objects while all other actions are converted into SaltModuleRun objects.

    3. For salt-ssh minions include additional files in the state tarball that gets copied over to the SSH minion.

      By default, salt-ssh tries to figure out what files to include in the state tarball, besides the .sls to be applied. However Uyuni typically uses mgrcompat.module_run + state.apply to apply the .sls file that corresponds to a particular action type. E.g.:

      mgr_actionchain_131_action_1_chunk_1:
        mgrcompat.module_run:
        -   name: state.apply
        -   mods: remotecommands
      . . .  

      Any state applied this can't be inspected by Salt for things like includes, salt://... urls, etc. Therefore any additional files referenced by the state applied via mgrcompat.module_run + state.apply need to be included explicitly.

    4. The SaltState objects are rendered into .sls files. The resulting .sls files contain states for each Action, with one state per Action.

      If one of the actions is a reboot action or a package upgrade action that touches the salt-minion package the resulting .sls is split in two or more chunks depending on the number of reboot/package upgrade actions.

      The resulting files are written to /srv/susemanager/salt/actionchains/. The filename has the format actionchain_<CHAIN_ID>_<MACHINE_ID>_<CHUNK>.sls

      Example 1 An action on two minions is added from SSM to an action chain. The resulting files are:

      /srv/susemanager/salt/actionchains/actionchain_45_df00a3b56f3aa159746b8c835eaaeede_1.sls
      /srv/susemanager/salt/actionchains/actionchain_45_dec27e1bc3cffa7749c965c15eaca15c_1.sls
      

      Example 2 A simple action chain:

      • Action 1: Run script
      • Action 2: Apply highstate

      is rendered to:

      mgr_actionchain_131_action_1_chunk_1:
        mgrcompat.module_run:
        -   name: state.apply
        -   mods: remotecommands
        -   kwargs:
                pillar:
                    mgr_remote_cmd_runas: joe
                    mgr_remote_cmd_script: salt://scripts/script_1.sh
      mgr_actionchain_131_action_2_chunk_1:
        mgrcompat.module_run:
        -   name: state.apply
        -   require:
            -   mgrcompat: mgr_actionchain_131_action_1_chunk_1

      Example 3 An action chain containing a reboot action:

      • Action 1: Run script
      • Action 2: Reboot
      • Action 3: Apply highstate

      is rendered to:

      Chunk 1 (actionchain_45_df00a3b56f3aa159746b8c835eaaeede_1.sls):

      mgr_actionchain_131_action_1_chunk_1:
        mgrcompat.module_run:
        -   name: state.apply
        -   mods: remotecommands
        -   kwargs:
                pillar:
                    mgr_remote_cmd_runas: foobar
                    mgr_remote_cmd_script: salt://scripts/script_1.sh
      mgr_actionchain_131_action_2_chunk_1:
        mgrcompat.module_run:
        -   name: system.reboot
        -   at_time: 1
        -   require:
            -   mgrcompat: mgr_actionchain_131_action_1_chunk_1
      schedule_next_chunk:
        mgrcompat.module_run:
        -   name: mgractionchains.next
        -   actionchain_id: 131
        -   chunk: 2
        -   next_action_id: 3
        -   require:
            -   mgrcompat: mgr_actionchain_131_action_2_chunk_1

      Chunk 2 (actionchain_45_df00a3b56f3aa159746b8c835eaaeede_2.sls):

      mgr_actionchain_131_action_3_chunk_2:
        mgrcompat.module_run:
        -   name: state.apply
    5. Split the minions into regular minions and salt-ssh minions

    6. Invoke Salt to execute the action chain by calling the custom module mgractionchains.start. This will calculate the name of the chunk based on the action chain id and the machine id of the minion and will do a state.sls <CHUNK> to apply the file generated in step 4.

      For regular minions the execution happens asynchronously.

      For salt-ssh minions the execution is synchronous.

    7. The action chain is deleted from the database.

      For regular minions this happens once all the Salt calls are made. There is no waiting for the Salt jobs to return because regular minions operate in an asynchronous manner.

      For SSH minions the delete happens once all the salt-ssh calls complete and the response is returned. This is because the salt-api works synchronously for SSH minions.

Resuming execution after reboot or salt-minion package upgrade

As mentioned above, the generated .sls file is split into multiple chunks in case the action chain contains reboot actions or upgrade actions that affect the salt-minion package (for regular minions).

The resume mechanism is different for regular minions and for SSH minions.

Regular minions:

  1. The first chunk is executed/applied by mgractionchains.start.

  2. If there are multiple chunks, the next chunk to be applied will be saved in a local file on the minion (in /etc/salt/minion.d/_mgractionchains.conf). This is done by adding a call to mgractionchains.next in the .sls file of the chunk as the last state:

    ...
    schedule_next_chunk:
        mgrcompat.module_run:
        -   name: mgractionchains.next
        -   actionchain_id: <CHAIN_ID>
        -   chunk: <NEXT_CHUNK>
        -   next_action_id: <FIRST_ACTION_ID_IN_THE_NEXT_CHUNK>
        -   require:
            -   mgrcompat: <LAST_ACTION_IN_THIS_CHUNK>
  3. After a reboot or salt-minion package upgrade, the minion service is started and the minion/start event is fired. This triggers a reactor on the master which calls the mgractionchains.resume module on the minion.

    The mgractionchains.resume module reads the file /etc/salt/minion.d/_mgractionchains.conf to get the <NEXT_CHUNK>, it deletes the file and then does a state.sls <NEXT_CHUNK>.

    The reactor is configured in /etc/salt/master.d/susemanager.conf:

    ...
    reactor:
     - 'salt/minion/*/start':
         - /usr/share/susemanager/reactor/resume_action_chain.sls
    ...     

    If the next chunk contains again a reboot, the steps 2 and 3 are repeated.

SSH minions:

The generated .sls is split into chunks only if there's a reboot action present in the chain. Upgrading the salt-minion package doesn't affect an SSH minion so there's no need to split.

  1. Check first if there are any pending action chains to be resumed by calling mgractionchains.get_pending_resume.
  2. If there's not pending action chain to be resumed, execute the first chunk by calling mgractionchains.start synchronously. This may trigger a reboot. If there's already an action chain to be resumed on a minion set the ServerActionss of that minion to failed to avoid concurrent execution.
  3. Handle the execution result and update the corresponding ServerActions. The reboot action is set to STATUS_PICKED_UP.
  4. The SSHPushWorkerSalt will check periodically for SSH minions that have any of these:
    • reboot actions older than 4 minutes
    • queued actions that have as prerequisite a completed reboot action.
  5. Call mgractionchains.get_pending_resume on each of the minions found in the previous step to get the information on which action chain and chunk to resume
  6. Call actionchains.resumessh synchronously and handle the results updating the corresponding ServerActions

Independent of steps 4-6, the SSHPushWorkerSalt always updates system information like the kernel version and the uptime. It also sets reboot actions to STATUS_COMPLETED if one of:

  • the ServerAction is in STATUS_PICKED_UP and the boot time is after the action pickup time
  • the ServerAction is in STATUS_PICKED_UP but the pickup time is missing and the boot time is after the schedule time of the Action.earliestAction
  • the action is in STATUS_QUEUED and the boot time is after Action.earliestAction

Cleanup

For regular minions, the generated .sls files are removed after the job return event is processed.

For SSH minions, the generated .sls files are removed after the synchronous calls finish and the result is processed.

Possible improvements

  • Asynchronous execution for SSH minions. One option would be to create a new job type to execute mgractionchains.start and to schedule a job execution for each SSH minion instead of making a synchronous call to the salt-api. A similar approach is used for executing regular actions on SSH minions - instead of making a sync call, a ssh-minion-action-executor job is scheduled for each SSH minion and the number of parallel jobs is controlled with the taskomatic.com.redhat.rhn.taskomatic.task.SSHMinionActionExecutor.parallel_threads parameter.

  • Improve error handling. See issue https://github.com/SUSE/spacewalk/issues/12826.

    One solution would be to keep the action chain in the DB but add a flag to the rhnActionChain table and set this flag to true for the action chains that have already been started. This way only the action chains not yet scheduled can be shown in the UI while allowing for better error handling.

  • Rename the SSHPushWorkerSalt to something more appropriate (see issue https://github.com/SUSE/spacewalk/issues/12914).

Clone this wiki locally