You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have tested with the :latest image tag (i.e. quay.io/argoproj/workflow-controller:latest) and can confirm the issue still exists on :latest. If not, I have explained why, in detail, in my description below.
I have searched existing issues and could not find a match for this bug
Stopping a workflow with its step in ErrImagePull/ErrImageNeverPull state causes a DAG format exit handler to be stuck. The exit-handler step may or may not be invoked (as in whether the main-container is run or not), in the end the exit-handler is left either in Pending or in Running state. Garbage collecting of Pods does not happen either as workflow does not seem to end its execution, although its state is indicated as Failed.
Invoking a plain template or a steps format template as exit-handler seems to not have this issue, problem is reproducible when a DAG format exit handler is used. To be noted is that similar symptoms have been occasionally observed with DAG exit-handler when stopping a workflow where steps have been normally running with container images available, where stopping may lead to exit-handler also be stuck in Running state. That has not been reproducible consistently with a test workflow but the symptoms have been similar to this case. Reporting this also in hoping that maybe such issues get also resolved if there is a possible fix for this problem.
Attached minimal workflow demonstrates the problem with DAG exit handler template. The "say-hello" step invokes a template with a non-existing image tag parameter and the template has imagePullPolicy: Never to create ErrImageNeverPull status for the main-container. Stopping the workflow at this state produces the described situation. Having "alpine:latest" image in container cache should be sufficient to run this workflow.
The workflow example has also other options made available for invoking different strategy exit-handlers that can demonstrate that both plain template and a steps template work OK as exit-handler in described situation. In the real-world use case the DAG format is needed due to the complexity of the exit-handler logic needed.
Expectation is that when a workflow is stopped by user, a DAG exit-handler is run and it can be completed successfully. This should happen regardless at which state the workflow steps are at that moment.
From workflow instance data it can be seen that exit-handler steps are in Failed state, and the last step is stuck as Running when expectation is that all exit-handler steps should complete with Succeeded status.
agilgur5
changed the title
Execution of DAG-format exit-handler is stuck when workflow with image pull failure is stopped.
DAG exit-handler stuck when workflow with image pull failure is stopped.
Jul 29, 2024
Pre-requisites
:latest
image tag (i.e.quay.io/argoproj/workflow-controller:latest
) and can confirm the issue still exists on:latest
. If not, I have explained why, in detail, in my description below.What happened? What did you expect to happen?
Stopping a workflow with its step in
ErrImagePull
/ErrImageNeverPull
state causes aDAG
format exit handler to be stuck. The exit-handler step may or may not be invoked (as in whether the main-container is run or not), in the end the exit-handler is left either inPending
or inRunning
state. Garbage collecting of Pods does not happen either as workflow does not seem to end its execution, although its state is indicated asFailed
.Invoking a plain template or a
steps
format template as exit-handler seems to not have this issue, problem is reproducible when aDAG
format exit handler is used. To be noted is that similar symptoms have been occasionally observed withDAG
exit-handler when stopping a workflow where steps have been normally running with container images available, where stopping may lead to exit-handler also be stuck inRunning
state. That has not been reproducible consistently with a test workflow but the symptoms have been similar to this case. Reporting this also in hoping that maybe such issues get also resolved if there is a possible fix for this problem.Attached minimal workflow demonstrates the problem with
DAG
exit handler template. The "say-hello" step invokes a template with a non-existing image tag parameter and the template hasimagePullPolicy: Never
to createErrImageNeverPull
status for the main-container. Stopping the workflow at this state produces the described situation. Having "alpine:latest" image in container cache should be sufficient to run this workflow.The workflow example has also other options made available for invoking different strategy exit-handlers that can demonstrate that both plain template and a
steps
template work OK as exit-handler in described situation. In the real-world use case the DAG format is needed due to the complexity of the exit-handler logic needed.Expectation is that when a workflow is stopped by user, a
DAG
exit-handler is run and it can be completed successfully. This should happen regardless at which state the workflow steps are at that moment.From workflow instance data it can be seen that exit-handler steps are in
Failed
state, and the last step is stuck asRunning
when expectation is that all exit-handler steps should complete withSucceeded
status.Version(s)
v3.5.8
Paste a minimal workflow that reproduces the issue. We must be able to run the workflow; don't enter a workflows that uses private images.
Logs from the workflow controller
Logs from in your workflow's wait container
The text was updated successfully, but these errors were encountered: