[CI] azure runners are flaky — remove them? #4689
Replies: 9 comments 1 reply
-
The recent Azure pipeline failures are very recent and generally Azure is less flaky than GH actions. As far as I can tell the rdkit failures are an actual issue that need to be addressed, it's a Windows (or pypi) thing not an azure thing.
I'm not aware of any such policy. This seems like a brand new thing? If Azure is failing so much that folks are doing this, why hasn't it been reported until now? It feels a bit much that we've jumped straight to "let's remove it", rather than "we'll fix it". |
Beta Was this translation helpful? Give feedback.
-
This isn't unique to azure pipelines, it happens with gh actions too |
Beta Was this translation helpful? Give feedback.
-
Every single PR that I have recently glanced over was ultimately merged while Azure was red. Basically, if it fails with something that is clearly not related to the content of the PR then we have been merging. The GH runners for Linux and macOS were green and that was "good enough" — "someone" probably should have raised issues for Windows things but at least I haven't so far as it wasn't clear to me that these were Windows specific issues. On that note, would it make sense to dump Azure and just use GitHub for Windows, too, as this would simplify our CI?
This is true. It would be very nice if PR #4584 improved this situation – I think you wanted to offer some input there? |
Beta Was this translation helpful? Give feedback.
-
I opened #4687 for the RDKit test failure. |
Beta Was this translation helpful? Give feedback.
-
I don't think I fully understand what the rationale or strategy is for all our CI, i.e., at the big picture level what we decided was the priority to cover and how we decided to implement this strategy. I started https://github.com/MDAnalysis/mdanalysis/wiki/CI-strategy with some notes but I am sure that I am missing important details — everyone is welcome to edit. |
Beta Was this translation helpful? Give feedback.
-
This is indeed on my to-do list, but it's down prioritised currently, I'm happy to revise this as necessary. |
Beta Was this translation helpful? Give feedback.
-
This is another to-do item that I just haven't had the time to deal with, happy to have a call at some point, but I won't have time to prioritise this any time soon unfortunately. |
Beta Was this translation helpful? Give feedback.
-
No, dumping Azure means adding work to move what those pipelines do (which is not the same as what our GH actions pipelines do, even if you ignore the OS differences) to GH action, whilst retaining the exact same issues. Long term that might be a strategy, but if we're looking to reduce our workload then this is not the answer. The provider isn't the issue, it's what we are covering. |
Beta Was this translation helpful? Give feedback.
-
This seems to be the crux of the issue here, if folks don't want to communicate issues then we can't prioritise and/or deal with them. |
Beta Was this translation helpful? Give feedback.
-
Even when all the GH actions runners succeed, many/all of the azure runners fail in some way. I have seen failures related to RDKIT and to timeouts with multiprocessing.
Do we know why the Azure runners appear to fail most of the time somehow?
At the moment we seem to have decided to ignore them and purely rely on GH ones so unless we figure out why the Azure ones are failing and work on fixing these issues, we might as well disable them and not bother because right now they don't seem to fulfill the purpose of guiding decisions on PR review.
Beta Was this translation helpful? Give feedback.
All reactions