-
Notifications
You must be signed in to change notification settings - Fork 281
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
POD NodeSelector is not always consistent with their MPIJob node selector #3400
Comments
I think this happens because the launcher job does not get the nodeselector that kueue is adding to the podset. So only the worker replicas get the correct NodeSelector |
@GonzaloSaez thank you for the report - this clearly looks like a bug, does it only happen if you use AdmissionChecks, or is independent of that? Let us also know if you suspect where is the bug, and feel free to propose a fix. cc @tenzen-y @mbobrovskyi |
@mimowo I think kubeflow/mpi-operator#670 sould fix it. That said, there are more open questions regarding NodeSelectors changing after a job is suspended (i.e. the need to change more internals of the MPIOperator launcher job in case the NodeSelector or other details change). Given that this seems related more to mpi-operator than kueue, should we consider closing this? |
I see, thanks for explaining and driving the fix. I think we may actually consider e2e tests for mpi-job in Kueue to cover such critical aspects of the integration. WDYT @tenzen-y ? |
FYI for some more context this also has slack discussion: https://kubernetes.slack.com/archives/C032ZE66A2X/p1730369507818399 |
What happened:
We are launching MPIJobs using a
LocalQueue
with kueue (in particularcpu-local-queue
from the Yaml fround at the end of the issue). TheClusterQueue
associatedResourceFlavor
uses the appropriatenodeLabels
to target a specific GKE nodepool. We are not setting the MPIJobNodeSelector
when launching it. When launching the job, kueue sets the correctNodeSelector
on the MPI job. However, the podsNodeSelector
is empty. Note that we are not setting the suspend field in the MPIJob, I let kueue do it for us.What you expected to happen:
The MPIJob pods should have the same NodeSelector as the MPIJob. This is also documented in https://kueue.sigs.k8s.io/docs/concepts/resource_flavor/
Environment:
GKE 1.30 + kueue 0.8.1 + waitForPodsReady=true. These are the kueue resources
This can be replicated with the MPIOperator example. The launcher does not have NodeSelector set but the workers do have it.
The text was updated successfully, but these errors were encountered: