Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Liveness Probe on CSI SMB pod was restarting multiple times on Windows Server 2022 Node #846

Closed
Ponmuthu-hub opened this issue Sep 12, 2024 · 5 comments

Comments

@Ponmuthu-hub
Copy link

Ponmuthu-hub commented Sep 12, 2024

What happened:

The Liveness probe container was restarting continuously on windows server 2022 node. getting following error:
Events:
Type Reason Age From Message


Warning BackOff 10m (x183 over 97m) kubelet Back-off restarting failed container smb in pod csi-smb-node-win-78wz7_kube-system(de7b8343-5faa-4fbf-8a19-54f23a6cb28b)
Warning Unhealthy 55s (x119 over 123m) kubelet Liveness probe failed: Get "http://10.42.113.131:29643/healthz": context deadline exceeded (Client.Timeout exceeded while awaiting headers)

Anything else we need to know?:
This same application was running in windows server 2019 without any issues. This error occurred in windows server 2022 node only.

Environment:
image: registry.k8s.io/sig-storage/smbplugin:v1.16.0

  • CSI Driver version:
    image: registry.k8s.io/sig-storage/csi-node-driver-registrar:v2.12.0
  • Kubernetes version (use kubectl version): v1.29.5+rke2r1
  • OS : Windows Server 2022 Standard (latest)
  • Tools : CSI Proxy - latest version
  • Logs:
kubectl logs csi-smb-node-win-hvv6v -n kube-system
Defaulted container "liveness-probe" out of: liveness-probe, node-driver-registrar, smb
W0912 06:25:58.847329    6324 connection.go:173] Still connecting to unix://C:\\csi\\csi.sock
W0912 06:26:08.847382    6324 connection.go:173] Still connecting to unix://C:\\csi\\csi.sock
I0912 06:26:14.681069    6324 main.go:149] calling CSI driver to discover driver name
I0912 06:26:14.699665    6324 main.go:155] CSI driver name: "smb.csi.k8s.io"
I0912 06:26:14.700754    6324 main.go:183] ServeMux listening at "0.0.0.0:29643"
  • Others:
    ~$ kubectl describe pod csi-smb-node-win-78wz7 -n kube-system
Name:                 csi-smb-node-win-78wz7
Namespace:            kube-system
Priority:             2000001000
Priority Class Name:  system-node-critical
Service Account:      csi-smb-node-sa
Node:                 edgewinworker1/172.17.1.102
Start Time:           Wed, 11 Sep 2024 09:13:46 +0000
Labels:               app=csi-smb-node-win
                      controller-revision-hash=56d85c5977
                      doesnotcontainpersonalinformation=true
                      pod-template-generation=1
Annotations:          cni.projectcalico.org/containerID: 61e31d25a730c095c461bc97e063010f528bf4a0fb77370f8b65382a8ca5f736
                      cni.projectcalico.org/podIP: 10.42.113.131/32
                      cni.projectcalico.org/podIPs: 10.42.113.131/32
Status:               Running
IP:                   10.42.113.131
IPs:
  IP:           10.42.113.131
Controlled By:  DaemonSet/csi-smb-node-win
Containers:
  liveness-probe:
    Container ID:  containerd://68bf8f8921dc19847bec6d9dd1b787e2d3fd1d56306287d6ab6e44a92920b970
    Image:         registry.k8s.io/sig-storage/livenessprobe:v2.14.0
    Image ID:      registry.k8s.io/sig-storage/livenessprobe@sha256:33692aed26aaf105b4d6e66280cceca9e0463f500c81b5d8c955428a75438f32
    Port:          <none>
    Host Port:     <none>
    Args:
      --csi-address=$(CSI_ENDPOINT)
      --probe-timeout=3s
      --health-port=29643
      --v=2
    State:          Running
      Started:      Wed, 11 Sep 2024 09:14:14 +0000
    Ready:          True
    Restart Count:  0
    Limits:
      memory:  512Mi
    Requests:
      cpu:     50m
      memory:  100Mi
    Environment:
      CSI_ENDPOINT:  unix://C:\\csi\\csi.sock
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-k7ng8 (ro)
      C:\csi from plugin-dir (rw)
  node-driver-registrar:
    Container ID:  containerd://2038cfc8b3d6d0aee932fa5700c67dd0119b947b588a191c129055e694071b07
    Image:         registry.k8s.io/sig-storage/csi-node-driver-registrar:v2.12.0
    Image ID:      registry.k8s.io/sig-storage/csi-node-driver-registrar@sha256:0d23a6fd60c421054deec5e6d0405dc3498095a5a597e175236c0692f4adee0f
    Port:          <none>
    Host Port:     <none>
    Args:
      --v=2
      --csi-address=$(CSI_ENDPOINT)
      --kubelet-registration-path=$(DRIVER_REG_SOCK_PATH)
    State:          Running
      Started:      Wed, 11 Sep 2024 09:14:16 +0000
    Ready:          True
    Restart Count:  0
    Limits:
      memory:  250Mi
    Requests:
      cpu:     10m
      memory:  40Mi
    Liveness:  exec [/csi-node-driver-registrar.exe --kubelet-registration-path=$(DRIVER_REG_SOCK_PATH) --mode=kubelet-registration-probe] delay=60s timeout=30s period=10s #success=1 #failure=3
    Environment:
      CSI_ENDPOINT:          unix://C:\\csi\\csi.sock
      DRIVER_REG_SOCK_PATH:  C:\\var\\lib\\kubelet\\plugins\\smb.csi.k8s.io\\csi.sock
      KUBE_NODE_NAME:         (v1:spec.nodeName)
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-k7ng8 (ro)
      C:\csi from plugin-dir (rw)
      C:\registration from registration-dir (rw)
      C:\var\lib\kubelet from kubelet-dir (rw)
  smb:
    Container ID:  containerd://9f473765e2a29a1e33c84d7bd09c6e81b53528ca64455ab8287d31422fbea65d
    Image:         registry.k8s.io/sig-storage/smbplugin:v1.16.0
    Image ID:      registry.k8s.io/sig-storage/smbplugin@sha256:1ec16928aa355e3dafdc84be2acf88d7d9124816e4cd580411536eae064f1d37
    Port:          29643/TCP
    Host Port:     0/TCP
    Args:
      --v=5
      --endpoint=$(CSI_ENDPOINT)
      --nodeid=$(KUBE_NODE_NAME)
      --metrics-address=0.0.0.0:29645
    State:          Running
      Started:      Wed, 11 Sep 2024 11:17:05 +0000
    Last State:     Terminated
      Reason:       Error
      Exit Code:    -1073741510
      Started:      Wed, 11 Sep 2024 11:13:35 +0000
      Finished:     Wed, 11 Sep 2024 11:17:03 +0000
    Ready:          True
    Restart Count:  24
    Limits:
      memory:  400Mi
    Requests:
      cpu:     10m
      memory:  40Mi
    Liveness:  http-get http://:healthz/healthz delay=60s timeout=15s period=30s #success=1 #failure=5
    Environment:
      CSI_ENDPOINT:    unix://C:\\csi\\csi.sock
      KUBE_NODE_NAME:   (v1:spec.nodeName)
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-k7ng8 (ro)
      C:\csi from plugin-dir (rw)
      C:\var\lib\kubelet from kubelet-dir (rw)
      \\.\pipe\csi-proxy-filesystem-v1 from csi-proxy-fs-pipe-v1 (rw)
      \\.\pipe\csi-proxy-filesystem-v1beta1 from csi-proxy-fs-pipe-v1beta1 (rw)
      \\.\pipe\csi-proxy-smb-v1 from csi-proxy-smb-pipe-v1 (rw)
      \\.\pipe\csi-proxy-smb-v1beta1 from csi-proxy-smb-pipe-v1beta1 (rw)
Conditions:
  Type                        Status
  PodReadyToStartContainers   True
  Initialized                 True
  Ready                       True
  ContainersReady             True
  PodScheduled                True
Volumes:
  csi-proxy-fs-pipe-v1:
    Type:          HostPath (bare host directory volume)
    Path:          \\.\pipe\csi-proxy-filesystem-v1
    HostPathType:
  csi-proxy-smb-pipe-v1:
    Type:          HostPath (bare host directory volume)
    Path:          \\.\pipe\csi-proxy-smb-v1
    HostPathType:
  csi-proxy-fs-pipe-v1beta1:
    Type:          HostPath (bare host directory volume)
    Path:          \\.\pipe\csi-proxy-filesystem-v1beta1
    HostPathType:
  csi-proxy-smb-pipe-v1beta1:
    Type:          HostPath (bare host directory volume)
    Path:          \\.\pipe\csi-proxy-smb-v1beta1
    HostPathType:
  registration-dir:
    Type:          HostPath (bare host directory volume)
    Path:          C:\var\lib\kubelet\plugins_registry\
    HostPathType:  Directory
  kubelet-dir:
    Type:          HostPath (bare host directory volume)
    Path:          C:\var\lib\kubelet\
    HostPathType:  Directory
  plugin-dir:
    Type:          HostPath (bare host directory volume)
    Path:          C:\var\lib\kubelet\plugins\smb.csi.k8s.io\
    HostPathType:  DirectoryOrCreate
  kube-api-access-k7ng8:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Burstable
Node-Selectors:              kubernetes.io/os=windows
Tolerations:                 node.kubernetes.io/disk-pressure:NoSchedule op=Exists
                             node.kubernetes.io/memory-pressure:NoSchedule op=Exists
                             node.kubernetes.io/not-ready:NoExecute op=Exists
                             node.kubernetes.io/os:NoSchedule op=Exists
                             node.kubernetes.io/pid-pressure:NoSchedule op=Exists
                             node.kubernetes.io/unreachable:NoExecute op=Exists
                             node.kubernetes.io/unschedulable:NoSchedule op=Exists
Events:
  Type     Reason     Age                   From     Message
  ----     ------     ----                  ----     -------
  Warning  BackOff    10m (x183 over 97m)   kubelet  Back-off restarting failed container smb in pod csi-smb-node-win-78wz7_kube-system(de7b8343-5faa-4fbf-8a19-54f23a6cb28b)
  Warning  Unhealthy  55s (x119 over 123m)  kubelet  Liveness probe failed: Get "http://10.42.113.131:29643/healthz": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
@Ponmuthu-hub Ponmuthu-hub changed the title Liveness Probe on CSI SMB pod for Windows 2022 server restarting multiple times Liveness Probe on CSI SMB pod was restarting multiple times on Windows Server 2022 Node Sep 12, 2024
@andyzhangx
Copy link
Member

the smb container is failing, can you increase Liveness: http-get http://:healthz/healthz delay=60s timeout=15s period=30s #success=1 #failure=5 in smb container config?

@Ponmuthu-hub
Copy link
Author

@andyzhangx I have increased and tried these times it was not helping. Liveness: http-get http://:healthz/healthz delay=120s timeout=60s period=60s #success=1 #failure=5 I'm getting same error.

@andyzhangx
Copy link
Member

@Ponmuthu-hub where is cpu core num and memory size of your windows node? we found if Windows node is in heavy workloads, the liveness-probe would also fail.

@Ponmuthu-hub
Copy link
Author

@andyzhangx CPU core number is 4, memory is 8GB

@Ponmuthu-hub
Copy link
Author

The issue was fixed because of this reason microsoft/Windows-Containers#516

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants