Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: overhaul vars #178

Closed
wants to merge 47 commits into from
Closed

Conversation

jacobemery
Copy link
Collaborator

@jacobemery jacobemery commented Jul 10, 2023

Ok, I've resolved merge conflicts and synced 'development' branch with 'main'. Requesting re-review!

  • Long overdue complete overhaul of how variables are structured and used.

  • Variables structure better aligned with Ansible best practices, allowing for greater flexibility, simplicity and scaling.

  • Restructured the documentation for better organization and added pictures.

  • Created setup scripts to install required software (like Ansible itself) - Required Community collections should be installed automatically by playbook0 #161

  • Re-worked many playbooks as a result of variables overhaul leading to various optimizations, including:

    • Ability to allocate VM guests to specific hosts, using their 'guests' list in inventory
    • Ability to add/delete as many compute nodes as you want
    • Enablement Ansible Vault to protect sensitive variables
    • Ability to create as many LPARs as you want
    • Ability to attach as many Storage Groups and Network Interface Cards to those LPARs as you want
    • Ability to have as many hypervisors hosting VMs as you want
    • Separation of OpenVPN from high availability OPEN VPN #118
    • Fix proxy vars indent Proxy settings #122
    • Clarify virtual network definitions in docs and create variable for interface env.bridge_name - Documentation / naming unclear. #163
    • Document storage pool_path Storage Pool_path is not documented #155
    • Don't have to single quote the pull secret.
    • Removed references to kvm_host in variables, to leave the door open for z/VM or other hypervisors in the future.
    • And more...
  • My apologies this is such a huge PR, the variables overhaul required almost everything to change, not just references to variables, but how they are used, loops, etc.

  • @veera-damisetti I wasn't 100% sure on a lot of the HyperShift changes I made, but I believe the create_inventory_setup_hypershift is no longer required because of the inventory change, and creation of secrets.yaml is no longer required because of the Ansible Vault. But please double-check me here, as I'm much less familiar with your work. I'm sure you'll have lots of comments, which would be great!

  • I'm opening this on the 'development' branch because I'm sure changes will need to be made and I'd love as many eyes on this as possible so we can get as much of the variables right as possible all in one breaking change as a 2.0

cc @AmadeusPodvratnik
cc @smolin-de
cc @ryoung1us
cc @irmsan

Signed-off-by: Jacob Emery <jacob.emery@ibm.com>
Signed-off-by: Jacob Emery <jacob.emery@ibm.com>
Signed-off-by: Jacob Emery <jacob.emery@ibm.com>
Signed-off-by: Jacob Emery <jacob.emery@ibm.com>
Signed-off-by: Jacob Emery <jacob.emery@ibm.com>
Signed-off-by: Jacob Emery <jacob.emery@ibm.com>
Signed-off-by: Jacob Emery <jacob.emery@ibm.com>
Signed-off-by: Jacob Emery <jacob.emery@ibm.com>
Signed-off-by: Jacob Emery <jacob.emery@ibm.com>
Signed-off-by: Jacob Emery <jacob.emery@ibm.com>
Signed-off-by: Jacob Emery <jacob.emery@ibm.com>
Signed-off-by: Jacob Emery <jacob.emery@ibm.com>
Signed-off-by: Jacob Emery <jacob.emery@ibm.com>
Signed-off-by: Jacob Emery <jacob.emery@ibm.com>
Signed-off-by: Jacob Emery <jacob.emery@ibm.com>
Signed-off-by: Jacob Emery <jacob.emery@ibm.com>
Signed-off-by: Jacob Emery <jacob.emery@ibm.com>
Signed-off-by: Jacob Emery <jacob.emery@ibm.com>
Signed-off-by: Jacob Emery <jacob.emery@ibm.com>
Signed-off-by: Jacob Emery <jacob.emery@ibm.com>
Previous inventory system relied on a .yaml inventory that templated out to the 'real' .ini inventory. This led to redundancies, confusion, and errors.

Old inventory system also relied on dictionaries to provide structure, which inadvertently made it impossible to substitute one variable for an individual host in a group.

By adhering to Ansible's best practices for inventories, new inventory enables easier use of group_vars and host_vars, leading to many optimizations - most importantly better scaling and simplicity.

Signed-off-by: Jacob Emery <jacob.emery@ibm.com>
Signed-off-by: Jacob Emery <jacob.emery@ibm.com>
Signed-off-by: Jacob Emery <jacob.emery@ibm.com>
Signed-off-by: Jacob Emery <jacob.emery@ibm.com>
Signed-off-by: Jacob Emery <jacob.emery@ibm.com>
Changed method for establishing SSH connection to use connect_hosts playbook, which allows for any # of hypervisors.

Moved libvirt setup tasks to their own role.

Signed-off-by: Jacob Emery <jacob.emery@ibm.com>
Also avoids dropping down to 'shell' module.

Signed-off-by: Jacob Emery <jacob.emery@ibm.com>
Signed-off-by: Jacob Emery <jacob.emery@ibm.com>
Signed-off-by: Jacob Emery <jacob.emery@ibm.com>
Signed-off-by: Jacob Emery <jacob.emery@ibm.com>
This was referenced Jul 10, 2023
Signed-off-by: Jacob Emery <jacob.emery@ibm.com>
Signed-off-by: Jacob Emery <jacob.emery@ibm.com>
@veera-damisetti
Copy link
Collaborator

@jacobemery can you add in doc how to genarate inventory for matching hosts .?
( Considering the user wants to install hypershift as day2 activity , he has already management cluster installed )

Signed-off-by: Jacob Emery <jacob.emery@ibm.com>
Signed-off-by: Jacob Emery <jacob.emery@ibm.com>
Signed-off-by: Jacob Emery <jacob.emery@ibm.com>
Signed-off-by: Jacob Emery <jacob.emery@ibm.com>
Signed-off-by: Jacob Emery <jacob.emery@ibm.com>
Signed-off-by: Jacob Emery <jacob.emery@ibm.com>
Signed-off-by: Jacob Emery <jacob.emery@ibm.com>
Signed-off-by: Jacob Emery <jacob.emery@ibm.com>
Signed-off-by: Jacob Emery <jacob.emery@ibm.com>
Signed-off-by: Jacob Emery <jacob.emery@ibm.com>
Signed-off-by: Jacob Emery <jacob.emery@ibm.com>
…ft-Provisioning into overhaul_vars

Signed-off-by: Jacob Emery <jacob.emery@ibm.com>
@jacobemery
Copy link
Collaborator Author

@jacobemery can you add in doc how to genarate inventory for matching hosts .? ( Considering the user wants to install hypershift as day2 activity , he has already management cluster installed )

Chatted off-line about this, not required as inventory.yaml acts as both variables input and Ansible inventory hosts list now.

@smolin-de
Copy link
Contributor

@jacobemery
Hi Jacob,
some of my previous accepted fixes and improvements are not included anymore. Please add them again.
Because, OCP multi-arch install will be broken, if you merge this PR "feat: overhaul vars" without my patches.

  • The variable "_vm_console" is not used anymore. It is replaced with hardcoded value: "console=ttysclp0". The console will not be displayed on x86 systems.
  • The interface ":enc1:" specified during creation of VM's does not always exist on x86 systems. It will fail on a x86 system.
  • Why are you using now "--location "{{ pool_path }}" to create nodes ? Our current code uses "--location {{ rhcos_download_url }}". Now we will need to copy the images to all users on each used hypervisors. I do not like this change.
  • The parameter "--os-variant" or "--os_info" is not always defined. This is required for newer versions of virt-install. Your code will fail e.g. on RHEL9.
  • They are probably more issues. Please test it also on a x86 system.

RHEL9 support is still not included. If we want to support RHEL8 and REHL9, then we probably need additional changes to the code structure.
We need also zVM support very soon. Should we not include RHEL9 support in your PR and define required zVM variables also, to avoid a breaking change in near future again?

The entire configuration will change. This will make our users unhappy.
Will you provide a detail instruction how to convert an existing "all.yaml" file to the new config ?
Or even better, will you provide a tool, which does the config conversion automatically for the user?

@AmadeusPodvratnik
Copy link
Collaborator

Hi @jacobemery,

thx in advance for the huge change.
Beside the comments Klaus provided here my two cents:
inventories/default/group_vars/all/vault.yaml.template is hypervisor specific and not sure if we should keep them alone in group_vars. I would suggest to rename it and move the up like the inventory.yaml.template. The same for the matching vault file.

Only a minor because of the amount of changes:

  • { role: robertdebock.epel, tags: openvpn, when: setup_openvpn }. <-------- I believe if the tags are specified on task level (tags: services, section_2, openvpn) then there is no need to specify them again e.g. for roles.
    • { role: robertdebock.openvpn, tags: openvpn, when: setup_openvpn } <---

Was NAT support and root user tested?

@jacobemery
Copy link
Collaborator Author

inventories/default/group_vars/all/vault.yaml.template is hypervisor specific and not sure if we should keep them alone in group_vars. I would suggest to rename it and move the up like the inventory.yaml.template. The same for the matching vault file.

The vault file is in group_vars/all so that it is loaded automatically, otherwise the variables won't be included without specifying the path each time in the playbooks. Open to changing it though.

@jacobemery
Copy link
Collaborator Author

Was NAT support and root user tested?

It was not. Are you able to test it out? It would be helpful to know exactly what is required for your environment.

@AmadeusPodvratnik
Copy link
Collaborator

Try to install NAT on my KVM env. Following issues I saw so far:
pexpect could not be installed ( "msg": "Failed to import the required Python library (pexpect) on MacBook-Pro-von-Amadeus.local's Python /opt/homebrew/opt/python@3.10/bin/python3.10. Please read the module documentation and install it in the appropriate location. If the required library is installed, but Ansible is using the wrong Python interpreter, please consult the documentation on ansible_python_interpreter"
). I need to install it manually by execute 2pip3 install pexpect"
Error tolerance should be improved. E.g. 3_setup_hypervisors :
TASK [Check if all hosts from inventory are reachable. Disregard an error here, it just triggers tasks that copy your SSH key to unreachable host(s).] ****************************************************************************************************************************************************************
fatal: [hypervisor_2]: FAILED! => {"msg": "The field 'become_pass' has an invalid value, which includes an undefined variable. The error was: 'vault_hypervisor_2_pass' is undefined. 'vault_hypervisor_2_pass' is undefined. 'vault_hypervisor_2_pass' is undefined. 'vault_hypervisor_2_pass' is undefined"}
fatal: [hypervisor_3]: FAILED! => {"msg": "The field 'become_pass' has an invalid value, which includes an undefined variable. The error was: 'vault_hypervisor_3_pass' is undefined. 'vault_hypervisor_3_pass' is undefined. 'vault_hypervisor_3_pass' is undefined. 'vault_hypervisor_3_pass' is undefined"}
This might irritate users. I need to comment the hypervisor_2 section in inventory.yaml file. Better would be to have a working default and comment all other out. I would suggest to start with a single hypervisor.

Attach subscription should not be done automatically (use a flag - role - attach_subscription).
In setup bastion playbook there is when statement on rh_name ... A flag is better in my view.

"jumpost" was not taken and needed to be configured manually (config file).

Typo (missing coma) in inventory.yaml file ... should be:
pkgs: [ haproxy, httpd, bind, bind-utils, expect, firewalld, mod_ssl, python3, python3-pip, python3-policycoreutils, rsync, curl, jq, mc, net-tools, wget, network-scripts ]

Failure setup_bastion:
File "/opt/homebrew/lib/python3.11/site-packages/ansible/template/init.py", line 1035, in do_template
raise AnsibleUndefinedVariable(e, orig_exc=e)
ansible.errors.AnsibleUndefinedVariable: 'ansible.vars.hostvars.HostVarsVars object' has no attribute 'net_type'. 'ansible.vars.hostvars.HostVarsVars object' has no attribute 'net_type'
fatal: [bastion]: FAILED! => {
"changed": false,
"msg": "AnsibleUndefinedVariable: 'ansible.vars.hostvars.HostVarsVars object' has no attribute 'net_type'. 'ansible.vars.hostvars.HostVarsVars object' has no attribute 'net_type'"
Var not part of the inventory.yaml file

@AmadeusPodvratnik
Copy link
Collaborator

Installation hangs after 3 masters and 1 bootstrap node were created:

FAILED - RETRYING: [bastion]: Wait for first node-bootstrapper request, should be started within 6 min (retry every 30s)...To watch progress, SSH to root@bastion, SSH to core@bootstrap-ip and run 'journalctl -b -f -u release-image.service -u bootkube.service' (15 retries left).Result was: {
"attempts": 46,
"changed": true,
"cmd": "/usr/sbin/oc get csr | grep ':node-bootstrapper'",
"delta": "0:00:50.138316",
"end": "2023-08-17 17:40:17.025823",
"invocation": {
"module_args": {
"_raw_params": "/usr/sbin/oc get csr | grep ':node-bootstrapper'",
"_uses_shell": true,
"argv": null,
"chdir": null,
"creates": null,
"executable": null,
"removes": null,
"stdin": null,
"stdin_add_newline": true,
"strip_empty_ends": true
}
},
"msg": "non-zero return code",
"rc": 1,
"retries": 61,
"start": "2023-08-17 17:39:26.887507",
"stderr": "E0817 17:39:36.956203 43879 memcache.go:238] couldn't get current server API group list: Get "https://api.a314lp34-pod.lnxero1.boe:6443/api?timeout=32s\": EOF\nE0817 17:39:46.971992 43879 memcache.go:238] couldn't get current server API group list: Get "https://api.a314lp34-pod.lnxero1.boe:6443/api?timeout=32s\": EOF\nE0817 17:39:56.990624 43879 memcache.go:238] couldn't get current server API group list: Get "https://api.a314lp34-pod.lnxero1.boe:6443/api?timeout=32s\": EOF\nE0817 17:40:07.007125 43879 memcache.go:238] couldn't get current server API group list: Get "https://api.a314lp34-pod.lnxero1.boe:6443/api?timeout=32s\": EOF\nE0817 17:40:17.023091 43879 memcache.go:238] couldn't get current server API group list: Get "https://api.a314lp34-pod.lnxero1.boe:6443/api?timeout=32s\": EOF\nUnable to connect to the server: EOF",
"stderr_lines": [
"E0817 17:39:36.956203 43879 memcache.go:238] couldn't get current server API group list: Get "https://api.a314lp34-pod.lnxero1.boe:6443/api?timeout=32s\": EOF",
"E0817 17:39:46.971992 43879 memcache.go:238] couldn't get current server API group list: Get "https://api.a314lp34-pod.lnxero1.boe:6443/api?timeout=32s\": EOF",
"E0817 17:39:56.990624 43879 memcache.go:238] couldn't get current server API group list: Get "https://api.a314lp34-pod.lnxero1.boe:6443/api?timeout=32s\": EOF",
"E0817 17:40:07.007125 43879 memcache.go:238] couldn't get current server API group list: Get "https://api.a314lp34-pod.lnxero1.boe:6443/api?timeout=32s\": EOF",
"E0817 17:40:17.023091 43879 memcache.go:238] couldn't get current server API group list: Get "https://api.a314lp34-pod.lnxero1.boe:6443/api?timeout=32s\": EOF",
"Unable to connect to the server: EOF"
],
"stdout": "",
"stdout_lines": []
}

nslookup api-int.a314lp34-pod.lnxero1.boe
Server: 192.168.122.2
Address: 192.168.122.2#53

api-int.a314lp34-pod.lnxero1.boe canonical name = bastion.lnxero1.boe.
Name: bastion.lnxero1.boe
Address: 192.168.122.2

[root@bastion ~]# curl https://api.a314lp34-pod.lnxero1.boe:6443/api
curl: (35) OpenSSL SSL_connect: SSL_ERROR_SYSCALL in connection to api.a314lp34-pod.lnxero1.boe:6443
[root@bastion ~]# curl https://api.a314lp34-pod.lnxero1.boe
curl: (35) OpenSSL SSL_connect: SSL_ERROR_SYSCALL in connection to api.a314lp34-pod.lnxero1.boe:443
[root@bastion ~]# ping api.a314lp34-pod.lnxero1.boe
curl: (35) OpenSSL SSL_connect: SSL_ERROR_SYSCALL in connection to api.a314lp34-pod.lnxero1.boe:443
[root@bastion ~]# ping api.a314lp34-pod.lnxero1.boe
PING bastion.lnxero1.boe (192.168.122.2) 56(84) bytes of data.
64 bytes from api-int.a314lp34-pod.lnxero1.boe (192.168.122.2): icmp_seq=1 ttl=0 time=0.019 ms
64 bytes from api-int.a314lp34-pod.lnxero1.boe (192.168.122.2): icmp_seq=2 ttl=0 time=0.096 ms

@jacobemery jacobemery deleted the branch IBM:development December 20, 2023 21:21
@jacobemery jacobemery closed this Dec 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants