Pod rejection errors and unexpected MIG slice counts #96

tardieu · 2024-09-13T19:14:39Z

When following the Kueue+InstaSlice demo #92, I expect 7 pods to run to completion with no more than 3 pods running at a time to due the configuration of the cluster queue. While I observed no violation of the max concurrency, in many instances I observed a couple of unexpected outcomes:

One or several pods report the status OutOfnvidia.com/mig-1g.5gb with the message Pod was rejected: Node didn''t have enough resource: nvidia.com/mig-1g.5gb, requested: 1, used: 0, capacity: 0
The count of nvidia.com/mig-1g.5gb resources in the node capacity no longer matches the count of org.instaslice/* resources in the node capacity. More specifically the former is lower than the latter and lower than the number of slices required to run pending ungated pods.

This is running Kind 1.31 on rancher desktop 1.15.1 on Sonoma. The demo scenario relies on fake GPUs and InstaSlice emulator mode.

The text was updated successfully, but these errors were encountered:

tardieu · 2024-09-13T20:41:48Z

I encounter the same node capacity mismatch issue (nvidia.com/mig-1g.5gb count vs org.instaslice/* count) on kind 1.30.4:

kind create cluster --image kindest/node:v1.30.4@sha256:976ea815844d5fa93be213437e3ff5754cd599b040946b5cca43ca45c2047114

tardieu · 2024-09-13T20:47:46Z

My suspicion at this point is that in this scenario InstaSlice may destroy the slice intended for a pod right after ungating the pod leading to the 3 scenarios I have observed:

The pod was scheduled and started by the kubelet before the deletion. It appears to run successfully but only because the untimely deletion of the slice does not affect the sleep command running in the pod.
The pod remains pending because the slice has been deleted before the scheduler had a chance to schedule the pod.
The pod was scheduled but the kubelet reports an error because the slice was gone in-between the scheduler scheduling and the kubelet doing its thing.

In these emulated scenarios, the creation/deletion of the slice boils down to the addition and removal of one unit of the MIG resource in the node capacity.

tardieu · 2024-09-13T20:50:21Z

Example of scenario 2:

tardieu@indigo:instaslice-operator$ kubectl get node kind-control-plane -o json | jq .status.capacity; kubectl get pods
{
  "cpu": "8",
  "ephemeral-storage": "102625208Ki",
  "hugepages-1Gi": "0",
  "hugepages-2Mi": "0",
  "hugepages-32Mi": "0",
  "hugepages-64Ki": "0",
  "memory": "16351912Ki",
  "nvidia.com/accelerator-memory": "80Gi",
  "nvidia.com/mig-1g.5gb": "0",
  "org.instaslice/a5870bf6-1307-4ca9-b36d-d8f5937042c3": "1",
  "pods": "110"
}
NAME   READY   STATUS      RESTARTS   AGE
p1     0/1     Pending     0          21m
p2     0/1     Completed   0          21m
p3     0/1     Completed   0          21m
p4     0/1     Completed   0          21m
p5     0/1     Completed   0          21m
p6     0/1     Completed   0          21m
p7     0/1     Completed   0          21m

It should not be possible for a pod to be pending, i.e., ungated by InstaSlice with a count of mig slices equal to 0.

tardieu · 2024-09-13T21:06:22Z

Example of scenario 3:

tardieu@indigo:instaslice-operator$ kubectl get node kind-control-plane -o json | jq .status.capacity; kubectl get pods
{
  "cpu": "8",
  "ephemeral-storage": "102625208Ki",
  "hugepages-1Gi": "0",
  "hugepages-2Mi": "0",
  "hugepages-32Mi": "0",
  "hugepages-64Ki": "0",
  "memory": "16351912Ki",
  "nvidia.com/accelerator-memory": "80Gi",
  "nvidia.com/mig-1g.5gb": "2",
  "org.instaslice/171f3a10-1663-4cc9-992e-a9141734ea21": "1",
  "org.instaslice/17b63565-c7a8-489d-8491-d9a165ad4745": "1",
  "org.instaslice/8e2a5159-ae95-4eb5-8898-02081732f1b0": "1",
  "org.instaslice/8f9ddcf2-1caf-4979-b051-477c88a3149a": "1",
  "pods": "110"
}
NAME   READY   STATUS                       RESTARTS   AGE
p1     0/1     SchedulingGated              0          2m33s
p2     0/1     Completed                    0          2m33s
p3     0/1     Completed                    0          2m33s
p4     0/1     SchedulingGated              0          2m33s
p5     0/1     OutOfnvidia.com/mig-1g.5gb   0          2m33s
p6     0/1     Completed                    0          2m33s
p7     1/1     Running                      0          2m33s

harche · 2024-09-16T15:09:10Z

/cc @sairameshv

asm582 · 2024-09-17T02:20:14Z

Thanks for this issue. scenario 2 can be reproduced and we have a PR for it #99. Scenario 3 could result from dangling org.instaslice resource on the node which is causes emulator to perform excessive deletes

asm582 · 2024-09-23T14:09:18Z

Solved by #121

harche · 2024-09-23T18:20:46Z

Solved by #121

/assign @asm582

asm582 · 2024-10-08T02:07:49Z

With new design changes, I am not sure if this issue is valid

tardieu mentioned this issue Sep 16, 2024

Inaccurate node capacity #100

Closed

openshift-ci bot assigned asm582 Sep 23, 2024

asm582 closed this as completed Oct 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pod rejection errors and unexpected MIG slice counts #96

Pod rejection errors and unexpected MIG slice counts #96

tardieu commented Sep 13, 2024 •

edited

Loading

tardieu commented Sep 13, 2024 •

edited

Loading

tardieu commented Sep 13, 2024

tardieu commented Sep 13, 2024 •

edited

Loading

tardieu commented Sep 13, 2024 •

edited

Loading

harche commented Sep 16, 2024

asm582 commented Sep 17, 2024

asm582 commented Sep 23, 2024

harche commented Sep 23, 2024

asm582 commented Oct 8, 2024

Pod rejection errors and unexpected MIG slice counts #96

Pod rejection errors and unexpected MIG slice counts #96

Comments

tardieu commented Sep 13, 2024 • edited Loading

tardieu commented Sep 13, 2024 • edited Loading

tardieu commented Sep 13, 2024

tardieu commented Sep 13, 2024 • edited Loading

tardieu commented Sep 13, 2024 • edited Loading

harche commented Sep 16, 2024

asm582 commented Sep 17, 2024

asm582 commented Sep 23, 2024

harche commented Sep 23, 2024

asm582 commented Oct 8, 2024

tardieu commented Sep 13, 2024 •

edited

Loading

tardieu commented Sep 13, 2024 •

edited

Loading

tardieu commented Sep 13, 2024 •

edited

Loading

tardieu commented Sep 13, 2024 •

edited

Loading