depends_on_past
is ignored when upstream is skipped
#26460
Replies: 3 comments 2 replies
-
Thanks for opening your first issue here! Be sure to follow the issue template! |
Beta Was this translation helpful? Give feedback.
-
Hmmm I think everything is as expected.
All is fine, as expected and perfectly reflects the current state of Airflow implementation, so this is not a bug. What you likely would like to get instead is a new feature for example "depend_on_all_past_not_runnning" or similar, where you should check if there is ANY past task that is currently not "completed" and "queued" or "running". But I think this one is not needed. It is quite a bit more complex to implemented, I think and it might be a bit more "predictable" in terms of execution sequence, but probably much better, simpler and something that you can implement even now, is to assign a separate pool for your task with size = 1. This will make sure that only one task instance can run at a time. This is PROBABLY what you really want to achieve. There is no guarantee on sequence of execution - so if you run 5 of those instances one after the other then, they will execute in a "random" sequence, but In most cases of scheduled run, they should run in a proper sequence as past task will be already running while the future task is scheduled. Converting to a discussion. |
Beta Was this translation helpful? Give feedback.
-
I just created a new draft PR which add a new conf for task skipping, when it is false, and it is decided to skip a task, the task will stay in None state waiting the past depends to be met. In this case no need to check all the past depends because each one ensure that the previous one is met. I didn't add unit tests yet, I prefer to validate the concept with you before adding them. Can you please check it? |
Beta Was this translation helpful? Give feedback.
-
Apache Airflow version
2.3.4
What happened
If we have an hourly dag of 2 tasks
T1
andT2
(T1 >> T2
), andT2
hasdepends_on_past=True
, if at the run of 10:00T1
is skipped,T2
is directly marked as skipped too without waiting the task of the previous run 09:00, then in the next run 11:00,T2
can be executed whereT2
of its previous run is skipped.What you think should happen instead
If the task has
depends_on_past=True
, we should not change its state before that the same task of the previous run is marked assucceeded
orskipped
.How to reproduce
Here is a simple dag which can help to reproduce the problem:
Then using Airflow CLI:
Operating System
Debian GNU/Linux
Versions of Apache Airflow Providers
No response
Deployment
Official Apache Airflow Helm Chart
Deployment details
No response
Anything else
In some cases
max_active_runs
can help to avoid this problem, but we have dags with tasks not dependent on past, which we prefer to run ASAP and keep the dag running waiting the tasks which depend on past runs.Are you willing to submit PR?
Code of Conduct
Beta Was this translation helpful? Give feedback.
All reactions