Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for OKD 3.10, first run fails, second run works #64

Closed
arashkaffamanesh opened this issue Oct 21, 2018 · 3 comments
Closed

Support for OKD 3.10, first run fails, second run works #64

arashkaffamanesh opened this issue Oct 21, 2018 · 3 comments
Assignees

Comments

@arashkaffamanesh
Copy link

Some minor changes needs to be done to provide support for OKD 3.10, the main changes in inventory.template.cfg are:

openshift_release=v3.10

# Changed for OpenShift 3.10 (filename not needed)
# https://bugzilla.redhat.com/show_bug.cgi?id=1565447

# openshift_master_identity_providers=[{'name': 'htpasswd_auth', 'login': 'true', 'challenge': 'true', 'kind': 'HTPasswdPasswordIdentityProvider', 'filename': '/etc/origin/master/htpasswd'}]

openshift_master_identity_providers=[{'name': 'htpasswd_auth', 'login': 'true', 'challenge': 'true', 'kind': 'HTPasswdPasswordIdentityProvider'}]

# Define node groups
openshift_node_groups=[{'name': 'node-config-master', 'labels': ['node-role.kubernetes.io/master=true']}, {'name': 'node-config-infra', 'labels': ['node-role.kubernetes.io/infra=true']}, {'name': 'node-config-compute', 'labels': ['node-role.kubernetes.io/compute=true']}]

# host group for nodes, includes region info
[nodes]
${master_hostname} openshift_hostname=${master_hostname} openshift_node_group_name='node-config-master' openshift_schedulable=true
${node1_hostname} openshift_hostname=${node1_hostname} openshift_node_group_name='node-config-compute'
${node2_hostname} openshift_hostname=${node2_hostname} openshift_node_group_name='node-config-compute'

and in install-from-bastion.sh set the branch to release-3.10:

git clone -b release-3.10 https://github.com/openshift/openshift-ansible

But after the first run the following failure summary is shown, but the second run succeeds:

TASK [openshift_storage_glusterfs : load kernel modules] ***********************
fatal: [ip-10-0-1-154.eu-central-1.compute.internal]: FAILED! => {"changed": false, "msg": "Unable to restart service systemd-modules-load.service: Job for systemd-modules-load.service failed because the control process exited with error code. See \"systemctl status systemd-modules-load.service\" and \"journalctl -xe\" for details.\n"}
fatal: [ip-10-0-1-29.eu-central-1.compute.internal]: FAILED! => {"changed": false, "msg": "Unable to restart service systemd-modules-load.service: Job for systemd-modules-load.service failed because the control process exited with error code. See \"systemctl status systemd-modules-load.service\" and \"journalctl -xe\" for details.\n"}
fatal: [ip-10-0-1-123.eu-central-1.compute.internal]: FAILED! => {"changed": false, "msg": "Unable to restart service systemd-modules-load.service: Job for systemd-modules-load.service failed because the control process exited with error code. See \"systemctl status systemd-modules-load.service\" and \"journalctl -xe\" for details.\n"}

RUNNING HANDLER [openshift_node : reload systemd units] ************************
	to retry, use: --limit @/home/ec2-user/openshift-ansible/playbooks/deploy_cluster.retry

PLAY RECAP *********************************************************************
ip-10-0-1-123.eu-central-1.compute.internal : ok=103  changed=51   unreachable=0    failed=1
ip-10-0-1-154.eu-central-1.compute.internal : ok=128  changed=51   unreachable=0    failed=1
ip-10-0-1-29.eu-central-1.compute.internal : ok=103  changed=51   unreachable=0    failed=1
localhost                  : ok=12   changed=0    unreachable=0    failed=0


INSTALLER STATUS ***************************************************************
Initialization              : Complete (0:00:17)
Health Check                : Complete (0:00:38)
Node Bootstrap Preparation  : In Progress (0:02:18)
	This phase can be restarted by running: playbooks/openshift-node/bootstrap.yml

Failure summary:

  1. Hosts:    ip-10-0-1-123.eu-central-1.compute.internal, ip-10-0-1-154.eu-central-1.compute.internal, ip-10-0-1-29.eu-central-1.compute.internal
     Play:     Configure nodes
     Task:     load kernel modules
     Message:  Unable to restart service systemd-modules-load.service: Job for systemd-modules-load.service failed because the control process exited with error code. See "systemctl status systemd-modules-load.service" and "journalctl -xe" for details.

make: *** [openshift] Error 2

Could one confirm this behaviour on his / her side?

It seems this issue #40 was already reported:

@arashkaffamanesh
Copy link
Author

O.k, it seems there are other problems too, docker-registry-1-deploy and router-1-deploy keep pending:

[ec2-user@ip-10-0-1-154 ~]$ oc get pods
NAME                       READY     STATUS    RESTARTS   AGE
docker-registry-1-deploy   0/1       Pending   0          13m
registry-console-1-vq7w8   1/1       Running   1          13m
router-1-deploy            0/1       Pending   0          14m

@arashkaffamanesh
Copy link
Author

The reason why docker registry and router are pending is because of missing infra nodes:
https://docs.openshift.com/container-platform/3.10/install/configuring_inventory_file.html

If there is not a node in the [nodes] section that matches the selector settings,
the default router and registry will be deployed as failed with Pending status.

@dwmkerr
Copy link
Owner

dwmkerr commented Oct 23, 2018

Hey @arashkaffamanesh - I got it working:

image

The key was to update the AMIs to RHEL 7.5 (apparently 7.4 upwards will do). This fixes the kernel module issue. I also updated the code to tag the master node as an infra node (thanks for your tips on this one!)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants