[BUG][OpenSearch Helm 2.0.1] FailedScheduling : N pod has unbound immediate PersistentVolumeClaims #558

YeonghyeonKO · 2024-07-15T03:10:51Z

Describe the bug

NAME                                                   READY   STATUS            RESTARTS   AGE
pod/test-opensearch-helm-dashboards-7f498c4684-lld2g   1/1     Running           0          12m
pod/test-opensearch-helm-master-0                      0/1     PodInitializing   0          12m
pod/test-opensearch-helm-master-1                      0/1     PodInitializing   0          12m
pod/test-opensearch-helm-master-2                      0/1     PodInitializing   0          12m

NAME                                           TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)             AGE
service/test-opensearch-helm-dashboards        ClusterIP   172.31.193.248   <none>        5601/TCP            12m
service/test-opensearch-helm-master            ClusterIP   172.31.113.251   <none>        9200/TCP,9300/TCP   12m
service/test-opensearch-helm-master-headless   ClusterIP   None             <none>        9200/TCP,9300/TCP   12m

NAME                                              READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/test-opensearch-helm-dashboards   1/1     1            1           12m

NAME                                                         DESIRED   CURRENT   READY   AGE
replicaset.apps/test-opensearch-helm-dashboards-7f498c4684   1         1         1       12m

NAME                                           READY   AGE
statefulset.apps/test-opensearch-helm-master   0/3     12m

As you can see above, master nodes for OpenSearch Cluster doesn't start as a pod.
I am suspicious of the number 27, because the number of kubernetes nodes is exactly 27.
The available resource of CPU & Memory of each k8s worker node is enough yet.
The logs from each pod(pod/test-opensearch-helm-master-0) are like:

Events:
  Type     Reason            Age   From               Message
  ----     ------            ----  ----               -------
  Warning  FailedScheduling  12m   default-scheduler  0/27 nodes are available: 27 pod has unbound immediate PersistentVolumeClaims.
  Warning  FailedScheduling  12m   default-scheduler  0/27 nodes are available: 27 pod has unbound immediate PersistentVolumeClaims.
  Normal   Scheduled         12m   default-scheduler  Successfully assigned test-opensearch-helm/test-opensearch-helm-master-2 to ick8ssrep01w003
  Normal   Pulling           12m   kubelet            Pulling image "docker-repo.xxx.com/hcp-docker/busybox:latest"
  Normal   Pulled            12m   kubelet            Successfully pulled image "docker-repo.xxx.com/hcp-docker/busybox:latest" in 136.527586ms
  Normal   Created           12m   kubelet            Created container fsgroup-volume
  Normal   Started           12m   kubelet            Started container fsgroup-volume
  Normal   Pulled            12m   kubelet            Container image "docker-repo.xxx.com/hcp-docker/opensearchproject/opensearch:2.0.1" already present on machine
  Normal   Created           12m   kubelet            Created container opensearch
  Normal   Started           12m   kubelet            Started container opensearch

To Reproduce
Steps to reproduce the behavior:
I've tried to run OpenSearch and its dashboard using Helm Chart (v 2.1.0).

/test-opensearch-helm/namespaces.yaml

apiVersion: v1
kind: Namespace
metadata:
  name: test-opensearch-helm
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: xxx-anyuid-hostpath-clusterrole-rolebinding
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: xxx-anyuid-hostpath-psp-clusterrole
subjects:
- apiGroup: rbac.authorization.k8s.io
  kind: Group
  name: system:serviceaccounts:test-opensearch-helm

/test-opensearch-helm/kustomization.yaml

namespace: test-opensearch-helm
bases:
# - ../../../base/common
- ./opensearch/common
- ./opensearch/master
- ./opensearch-dashboards
resources:
- namespaces.yaml

/test-opensearch-helm/opensearch/master/kustomization.yaml

helmGlobals:
  chartHome: ../../../../../base/opensearch/charts

helmCharts:
- name: opensearch-2.1.0
  version: 2.1.0
  releaseName: test-opensearch-helm
  namespace: test-opensearch-helm
  valuesFile: values.yaml
  # includeCRDs: true

/test-opensearch-helm/opensearch/master/values.yaml

---
clusterName: "test-opensearch-helm"
nodeGroup: "master"

# The service that non master groups will try to connect to when joining the cluster
# This should be set to clusterName + "-" + nodeGroup for your master group
masterService: "test-opensearch-helm-master"

# OpenSearch roles that will be applied to this nodeGroup
# These will be set as environment variable "node.roles". E.g. node.roles=master,ingest,data,remote_cluster_client
roles:
  - master
  - ingest
  - data
  - remote_cluster_client
  # - ml

replicas: 3

majorVersion: "2"

global:
  # Set if you want to change the default docker registry, e.g. a private one.
  dockerRegistry: ""

# Allows you to add any config files in {{ .Values.opensearchHome }}/config
opensearchHome: /usr/share/opensearch
# such as opensearch.yml and log4j2.properties
config:
  # Values must be YAML literal style scalar / YAML multiline string.
  # <filename>: |
  #   <formatted-value(s)>
  opensearch.yml: |
    cluster.name: test-opensearch-helm

    # Bind to all interfaces because we don't know what IP address Docker will assign to us.
    network.host: 0.0.0.0

    plugins:
      security:
        ssl:
          transport:
            pemcert_filepath: /usr/share/opensearch/config/certs/opens.pem
            pemkey_filepath: /usr/share/opensearch/config/certs/opens-key.pem
            pemtrustedcas_filepath: /usr/share/opensearch/config/certs/root-ca.pem
            enforce_hostname_verification: false
          http:
            enabled: false
            pemcert_filepath: /usr/share/opensearch/config/certs/opens.pem
            pemkey_filepath: /usr/share/opensearch/config/certs/opens-key.pem
            pemtrustedcas_filepath: /usr/share/opensearch/config/certs/root-ca.pem
        allow_unsafe_democertificates: true
        allow_default_init_securityindex: true
        authcz:
          admin_dn:
            - CN=kirk,OU=client,O=client,L=test,C=de
        audit.type: internal_opensearch
        enable_snapshot_restore_privilege: true
        check_snapshot_restore_write_privileges: true
        restapi:
          roles_enabled: ["all_access", "security_rest_api_access"]
        system_indices:
          enabled: true
          indices:
            [
              ".opendistro-alerting-config",
              ".opendistro-alerting-alert*",
              ".opendistro-anomaly-results*",
              ".opendistro-anomaly-detector*",
              ".opendistro-anomaly-checkpoints",
              ".opendistro-anomaly-detection-state",
              ".opendistro-reports-*",
              ".opendistro-notifications-*",
              ".opendistro-notebooks",
              ".opendistro-asynchronous-search-response*",
            ]

# Extra environment variables to append to this nodeGroup
# This will be appended to the current 'env:' key. You can use any of the kubernetes env
# syntax here
extraEnvs:
  - name: OPENSEARCH_PASSWORD
    valueFrom:
      secretKeyRef:
        name: opens-credentials
        key: password
  - name: OPENSEARCH_USERNAME
    valueFrom:
      secretKeyRef:
        name: opens-credentials
        key: username
  - name: DISABLE_INSTALL_DEMO_CONFIG
    value: "true"

# Allows you to load environment variables from kubernetes secret or config map
envFrom: []
# - secretRef:
#     name: env-secret
# - configMapRef:
#     name: config-map

# A list of secrets and their paths to mount inside the pod
# This is useful for mounting certificates for security and for mounting
# the X-Pack license
secretMounts:
  - name: opensearch-cert
    secretName: opensearch-cert
    path: /usr/share/opensearch/config/certs
    defaultMode: 0755

hostAliases: []
# - ip: "127.0.0.1"
#   hostnames:
#   - "foo.local"
#   - "bar.local"

image:
  repository: "docker-repo.xxx.com/hcp-docker/opensearchproject/opensearch"
  # override image tag, which is .Chart.AppVersion by default
  tag: "2.0.1"
  pullPolicy: "IfNotPresent"

podAnnotations: {}
  # iam.amazonaws.com/role: es-cluster

# additionals labels
labels: {}

opensearchJavaOpts: "-Djava.net.preferIPv4Stack=true -Xms8g -Xmx8g -XX:+UnlockDiagnosticVMOptions -Xlog:gc+heap+coops=info"

resources:
  requests:
    cpu: "0.1"
    memory: "16Gi"
  limits:
    cpu: "4"
    memory: "16Gi"

initResources:
  limits:
    cpu: "200m"
    memory: "50Mi"
  requests:
    cpu: "200m"
    memory: "50Mi"

sidecarResources: {}

networkHost: "0.0.0.0"

rbac:
  create: true
  serviceAccountAnnotations: {}
  serviceAccountName: ""

podSecurityPolicy:
  create: true
  name: ""
  spec:
    privileged: true
    fsGroup:
      rule: RunAsAny
    runAsUser:
      rule: RunAsAny
    seLinux:
      rule: RunAsAny
    supplementalGroups:
      rule: RunAsAny
    volumes:
      - secret
      - configMap
      - persistentVolumeClaim
      - emptyDir

persistence:
  enabled: true
  # Set to false to disable the `fsgroup-volume` initContainer that will update permissions on the persistent disk.
  enableInitChown: true
  # override image, which is busybox by default
  image: "docker-repo.xxx.com/hcp-docker/busybox"
  # override image tag, which is latest by default
  # imageTag:
  labels:
    # Add default labels for the volumeClaimTemplate of the StatefulSet
    enabled: false
  # OpenSearch Persistent Volume Storage Class
  # If defined, storageClassName: <storageClass>
  # If set to "-", storageClassName: "", which disables dynamic provisioning
  # If undefined (the default) or set to null, no storageClassName spec is
  #   set, choosing the default provisioner.  (gp2 on AWS, standard on
  #   GKE, AWS & OpenStack)
  #
  storageClass: "sc-nfs-app-retain"
  accessModes:
    - ReadWriteOnce
  size: 50Gi
  annotations: {}

extraVolumes: []
  # - name: extras
  #   emptyDir: {}

extraVolumeMounts: []
  # - name: extras
  #   mountPath: /usr/share/extras
  #   readOnly: true

extraContainers: []
  # - name: do-something
  #   image: busybox
  #   command: ['do', 'something']

extraInitContainers: []
  # - name: do-somethings
  #   image: busybox
  #   command: ['do', 'something']

# This is the PriorityClass settings as defined in
# https://kubernetes.io/docs/concepts/configuration/pod-priority-preemption/#priorityclass
priorityClassName: ""

# By default this will make sure two pods don't end up on the same node
# Changing this to a region would allow you to spread pods across regions
antiAffinityTopologyKey: "kubernetes.io/hostname"

# Hard means that by default pods will only be scheduled if there are enough nodes for them
# and that they will never end up on the same node. Setting this to soft will do this "best effort"
antiAffinity: "soft"

# This is the node affinity settings as defined in
# https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#node-affinity-beta-feature
nodeAffinity: {}

# This is the pod topology spread constraints
# https://kubernetes.io/docs/concepts/workloads/pods/pod-topology-spread-constraints/
topologySpreadConstraints: []

# The default is to deploy all pods serially. By setting this to parallel all pods are started at
# the same time when bootstrapping the cluster
podManagementPolicy: "Parallel"

# The environment variables injected by service links are not used, but can lead to slow OpenSearch boot times when
# there are many services in the current namespace.
# If you experience slow pod startups you probably want to set this to `false`.
enableServiceLinks: true

protocol: http
httpPort: 9200
transportPort: 9300

service:
  labels: {}
  labelsHeadless: {}
  headless:
    annotations: {}
  type: ClusterIP
  nodePort: ""
  annotations: {}
  httpPortName: http
  transportPortName: transport
  loadBalancerIP: ""
  loadBalancerSourceRanges: []
  externalTrafficPolicy: ""

updateStrategy: RollingUpdate

# This is the max unavailable setting for the pod disruption budget
# The default value of 1 will make sure that kubernetes won't allow more than 1
# of your pods to be unavailable during maintenance
maxUnavailable: 1

podSecurityContext:
  fsGroup: 1000
  runAsUser: 1000

securityContext:
  capabilities:
    drop:
      - ALL
  # readOnlyRootFilesystem: true
  runAsNonRoot: true
  runAsUser: 1000

securityConfig:
  enabled: true
  path: "/usr/share/opensearch/plugins/opensearch-security/securityconfig"
  actionGroupsSecret:
  configSecret:
  internalUsersSecret:
  rolesSecret:
  rolesMappingSecret:
  tenantsSecret:
  # The following option simplifies securityConfig by using a single secret and
  # specifying the config files as keys in the secret instead of creating
  # different secrets for for each config file.
  # Note that this is an alternative to the individual secret configuration
  # above and shouldn't be used if the above secrets are used.
  config:
    # There are multiple ways to define the configuration here:
    # * If you define anything under data, the chart will automatically create
    #   a secret and mount it.
    # * If you define securityConfigSecret, the chart will assume this secret is
    #   created externally and mount it.
    # * It is an error to define both data and securityConfigSecret.
    securityConfigSecret: ""
    dataComplete: true
    data: {}
      # config.yml: |-
      # internal_users.yml: |-
      # roles.yml: |-
      # roles_mapping.yml: |-
      # action_groups.yml: |-
      # tenants.yml: |-

# How long to wait for opensearch to stop gracefully
terminationGracePeriod: 120

sysctlVmMaxMapCount: 262144

readinessProbe:
  failureThreshold: 3
  initialDelaySeconds: 10
  periodSeconds: 60
  successThreshold: 3
  timeoutSeconds: 60

## Use an alternate scheduler.
## ref: https://kubernetes.io/docs/tasks/administer-cluster/configure-multiple-schedulers/
##
schedulerName: ""

imagePullSecrets: []
nodeSelector:
  worker: "true"
tolerations: []

# Enabling this will publically expose your OpenSearch instance.
# Only enable this if you have security enabled on your cluster
ingress:
  enabled: true
  # For Kubernetes >= 1.18 you should specify the ingress-controller via the field ingressClassName
  # See https://kubernetes.io/blog/2020/04/02/improvements-to-the-ingress-api-in-kubernetes-1.18/#specifying-the-class-of-an-ingress
  ingressClassName: nginx

  annotations: {}
    # kubernetes.io/ingress.class: nginx
    # kubernetes.io/tls-acme: "true"
  path: /
  hosts:
    - test-opensearch-helm.srep01.xxx.com
  tls: []
  #  - secretName: chart-example-tls
  #    hosts:
  #      - chart-example.local

nameOverride: ""
fullnameOverride: ""

masterTerminationFix: false

lifecycle:
  # preStop:
  #   exec:
  #     command: ["/bin/sh", "-c", "echo Hello from the postStart handler > /usr/share/message"]
  # postStart:
  #   exec:
  #     command:
  #       - bash
  #       - -c
  #       - |
  #         #!/bin/bash
  #         # Add a template to adjust number of shards/replicas1
  #         TEMPLATE_NAME=my_template
  #         INDEX_PATTERN="logstash-*"
  #         SHARD_COUNT=8
  #         REPLICA_COUNT=1
  #         ES_URL=http://localhost:9200
  #         while [[ "$(curl -s -o /dev/null -w '%{http_code}\n' $ES_URL)" != "200" ]]; do sleep 1; done
  #         curl -XPUT "$ES_URL/_template/$TEMPLATE_NAME" -H 'Content-Type: application/json' -d'{"index_patterns":['\""$INDEX_PATTERN"\"'],"settings":{"number_of_shards":'$SHARD_COUNT',"number_of_replicas":'$REPLICA_COUNT'}}'
  postStart:
    exec:
      command:
        - bash
        - -c
        - |
          #!/bin/bash
          # Add a template to adjust number of shards/replicas1
          ES_URL=http://admin:admin12~!@localhost:9200
          while [[ "$(curl -s -o /dev/null -w '%{http_code}\n' $ES_URL)" != "200" ]]; do sleep 1; done

          # _index_template logs-template-app
          curl -XPUT "$ES_URL/_index_template/logs-template-app" -H 'Content-Type: application/json' \
            -d '{
                  "index_patterns": [
                    "app_*",
                    "sys_*"
                  ],
                  "data_stream": {
                    "timestamp_field": {
                      "name": "logTime"
                    }
                  },
                  "priority": 200,
                  "template": {
                    "settings": {
                      "number_of_shards": 1,
                      "number_of_replicas": 1
                    }
                  }
                }'

          # _index_policy logs-policy-app
          curl -XDELETE "$ES_URL/_plugins/_ism/policies/logs-policy-app"
          curl -XPUT "$ES_URL/_plugins/_ism/policies/logs-policy-app" -H 'Content-Type: application/json' \
            -d '
              {
                "policy" : {
                  "description" : "A app log of the policy",
                  "default_state" : "hot",
                  "states" : [
                    {
                      "name" : "hot",
                      "actions" : [
                        {
                          "retry" : {
                            "count" : 3,
                            "backoff" : "exponential",
                            "delay" : "1m"
                          },
                          "rollover" : {
                            "min_index_age" : "3m"
                          }
                        }
                      ],
                      "transitions" : [
                        {
                          "state_name" : "warm",
                          "conditions" : {
                            "min_index_age" : "3m"
                          }
                        }
                      ]
                    },
                    {
                      "name" : "warm",
                      "actions" : [
                        {
                          "retry" : {
                            "count" : 3,
                            "backoff" : "exponential",
                            "delay" : "1m"
                          },
                          "read_only" : { }
                        }
                      ],
                      "transitions" : [
                        {
                          "state_name" : "delete",
                          "conditions" : {
                            "min_rollover_age" : "3m"
                          }
                        }
                      ]
                    },
                    {
                      "name" : "delete",
                      "actions" : [
                        {
                          "retry" : {
                            "count" : 3,
                            "backoff" : "exponential",
                            "delay" : "1m"
                          },
                          "delete" : { }
                        }
                      ],
                      "transitions" : [ ]
                    }
                  ],
                  "ism_template" : [
                    {
                      "index_patterns" : [
                        "app_*",
                        "sys_*"
                      ],
                      "priority" : 0
                    }
                  ]
                }
              }
              '

keystore: []
# To add secrets to the keystore:
#  - secretName: opensearch-encryption-key

networkPolicy:
  create: false
  ## Enable creation of NetworkPolicy resources. Only Ingress traffic is filtered for now.
  ## In order for a Pod to access OpenSearch, it needs to have the following label:
  ## {{ template "uname" . }}-client: "true"
  ## Example for default configuration to access HTTP port:
  ## opensearch-master-http-client: "true"
  ## Example for default configuration to access transport port:
  ## opensearch-master-transport-client: "true"

  http:
    enabled: false

# Deprecated
# please use the above podSecurityContext.fsGroup instead
fsGroup: ""

## Set optimal sysctl's. This requires privilege. Can be disabled if
## the system has already been preconfigured. (Ex: https://www.elastic.co/guide/en/elasticsearch/reference/current/vm-max-map-count.html)
## Also see: https://kubernetes.io/docs/tasks/administer-cluster/sysctl-cluster/
sysctl:
  enabled: false

## Enable to add 3rd Party / Custom plugins not offered in the default OpenSearch image.
plugins:
  enabled: false
  installList: []
  # - example-fake-plugin

# -- Array of extra K8s manifests to deploy
extraObjects: []

Host/Environment (please complete the following information):

Helm Version: 2.1.0
Kubernetes Version: 1.20.7

Additional context
Add any other context about the problem here.

The text was updated successfully, but these errors were encountered:

Divyaasm · 2024-07-18T21:20:32Z

With the above error, are you able to start the cluster using 2.0.1 OS

YeonghyeonKO · 2024-07-19T02:33:56Z

@Divyaasm Hi,
I deployed with Helm Chart for OpenSearch (version: 2.1.0) and using the image for OpenSearch itself(version: 2.0.1) as below:

image:
  repository: "docker-repo.xxx.com/hcp-docker/opensearchproject/opensearch"
  # override image tag, which is .Chart.AppVersion by default
  tag: "2.0.1"
  pullPolicy: "IfNotPresent"

YeonghyeonKO · 2024-07-19T02:37:52Z

When I first wrote this issue, the number of nodes for Kubernetes cluster was 27.

 Warning  FailedScheduling  12m   default-scheduler  0/27 nodes are available: 27 pod has unbound immediate PersistentVolumeClaims.

Two days ago, new worker node was added to K8s cluster, the logs changed.

 Warning  FailedScheduling  9m   default-scheduler  0/28 nodes are available: 28 pod has unbound immediate PersistentVolumeClaims.

YeonghyeonKO added bug Something isn't working untriaged Issues that have not yet been triaged labels Jul 15, 2024

prudhvigodithi removed the untriaged Issues that have not yet been triaged label Jul 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG][OpenSearch Helm 2.0.1] FailedScheduling : N pod has unbound immediate PersistentVolumeClaims #558

[BUG][OpenSearch Helm 2.0.1] FailedScheduling : N pod has unbound immediate PersistentVolumeClaims #558

YeonghyeonKO commented Jul 15, 2024

Divyaasm commented Jul 18, 2024

YeonghyeonKO commented Jul 19, 2024

YeonghyeonKO commented Jul 19, 2024

[BUG][OpenSearch Helm 2.0.1] FailedScheduling : N pod has unbound immediate PersistentVolumeClaims #558

[BUG][OpenSearch Helm 2.0.1] FailedScheduling : N pod has unbound immediate PersistentVolumeClaims #558

Comments

YeonghyeonKO commented Jul 15, 2024

Divyaasm commented Jul 18, 2024

YeonghyeonKO commented Jul 19, 2024

YeonghyeonKO commented Jul 19, 2024