From 27ec8ca8485a099b6fdad94b82ed3ef389676041 Mon Sep 17 00:00:00 2001 From: Dmitry Tantsur Date: Thu, 25 Jul 2024 17:28:26 +0200 Subject: [PATCH] Ironic Standalone Operator This design proposal discusses ironic-standalone-operator: provides a motivation for its creation, describes its goals and outlines the current design and features, as well as the plans for the nearest future. Signed-off-by: Dmitry Tantsur --- design/ironic-standalone-operator.md | 333 +++++++++++++++++++++++++++ 1 file changed, 333 insertions(+) create mode 100644 design/ironic-standalone-operator.md diff --git a/design/ironic-standalone-operator.md b/design/ironic-standalone-operator.md new file mode 100644 index 00000000..d4f32f7a --- /dev/null +++ b/design/ironic-standalone-operator.md @@ -0,0 +1,333 @@ + + +# Ironic Standalone Operator + +## Status + +implementable + +## Summary + +This proposal discussed the [ironic-standalone-operator][ir-op] project that +was written by me (Dmitry Tantsur) using inspiration from OpenShift's +[cluster-baremetal-operator][cbo]. The project is already under the Metal3's +umbrella, so this document serves to describe its design goals, future plans +and the rough API shape. + +[ir-op]: https://github.com/metal3-io/ironic-standalone-operator +[cbo]: https://github.com/openshift/cluster-baremetal-operator + +## Motivation + +Ironic is not trivial to install and operate. Also we provide +[ironic-deployment scripts][ironic-deployment] as part of BMO, there are still +a lot of moving parts where things can be wrong. Configuring Ironic through +environment variables is error-prone and complicates upgrades. The operator +pattern is standard and ubiquitous in the Kubernetes world to manage complex +software. Metal3 should use it as well. + +[ironic-deployment]: https://github.com/metal3-io/baremetal-operator/tree/main/ironic-deployment + +### Goals + +- Provide a recommended way to install Ironic and its satellite services for + using with Metal3. +- Make it easy to install and manage Ironic for Metal3 newcomers. +- Provide a Kubernetes operator that can also be used outside of Metal3. +- Pave the way for highly-available Ironic installations. + +### Non-Goals + +Explicitly not planned: + +- Tailor the new operator to use cases outside of Metal3 except for the most + minor things. +- Support for versions of ironic-image predating this proposal (e.g. the ones + containing ironic-inspector). +- Deprecate or discourage alternative ways to install Ironic for Metal3, get + rid of ironic-image/mariadb-image. +- Install BMO, IPAM or CAPM3 via the same operator. + +May happen in the future but not as part of this proposal: + +- Radically change the installed architecture. For example, we have bold plans + to look into dropping the host networking requirement. +- Stabilize the API design. + +## Proposal + +### User Stories + +As an administrator, I want to be able to install Ironic in a way that is +suitable for Metal3 by installing an operator and creating custom Kubernetes +resources. + +## Design Details + +This proposal adds a new project under the Metal3 umbrella: +ironic-standalone-operator. It is a Kubernetes operator that exposes a few +Custom Resources and manages an Ironic installation. + +### Naming + +The project has undergone a heavy discussion on its naming. The initial name +was straightforward: ironic-operator. However, it was quickly found to conflict +with a few existing projects, including a pretty active one developed by Red +Hat as part of its OpenStack offering. + +Another candidate was metal3-ironic-operator. The arguments against it were +inconsistency with other Metal3 projects (we don't call BMO +metal3-baremetal-operator even though baremetal-operator is pretty generic) and +the desire to make the new operator usable outside of pure Metal3. + +The argument against using the word "standalone" was that this word is +overloaded in the OpenStack context and may be unclear to people without this +context. A poll among contributors showed that the intention of the word is at +least more or less clear to us, and that it clearly conveys the difference from +the Red Hat's OpenStack operator. + +A few code names were also discussed but ruled out because of a potential user +confusion and possible trademark issues. + +Note that there is no established acronym for ironic-standalone-operator like +we have for baremetal-operator (BMO) or CAPM3. Using ISO is definitely going to +be confusing. This document will be referring to ironic-standalone-operator or +just "the operator" in cases where it does not cause confusion with a human +operator. + +### Implementation Details/Notes/Constraints + +This section describes the current state of the project. It is not an attempt +to fix the details forever. I'm using it to give a reader a clearer idea what +the operator currently does. + +#### Current architecture + +The operator has two controllers: for MariaDB and for Ironic plus its auxiliary +containers. + +The MariaDB controller, also referred to as the *database controller* in this +context, starts a MariaDB instance in a *deployment* using +[mariadb-image][mariadb-image]. As with Metal3 now, MariaDB is optional: if it +is not configured, SQLite is used instead. + +The Ironic controller starts and manages the following components: + +- Ironic itself +- HTTPD for serving images and iPXE scripts +- Dnsmasq for DHCP and TFTP +- Ramdisk logs publisher +- IPA downloader + +All these components are used in the same way as in a traditional Metal3 +installation. Note that the IPA Downloader fate is under discussion: there is +a strong desire to make it optional and maybe replace with a different method +of delivering IPA images. + +Unlike the current Metal3, the operator requires authentication and will create +secrets with random credentials when a user does not provide them. We're +considering to do the same with TLS, but it requires [figuring out CA +integration][issue4]. + +[mariadb-image]: https://github.com/metal3-io/mariadb-image/ +[issue4]: https://github.com/metal3-io/ironic-standalone-operator/issues/4 + +#### HA architecture + +The *non-HA* architecture is the architecture that Metal3 uses now. All Ironic +components are run in a *deployment*. + +The *HA* architecture is a new concept in ironic-standalone-operator. It +involves running a copy of Ironic and HTTPD per control plane node (so, 3 +copies in most cases). This has two benefits: + +1. Ironic can be updated in a rolling fashion without an interruption in the + service. +2. Due to the way Ironic is designed, each replica will handle its proportion + (1/3 in most cases) of nodes (active/active architecture, not + active/backup). + +MariaDB is not going to be run in an HA fashion. The mid-term plans include +looking into using a persistent volume for it instead. + +When the HA architecture is enabled via a flag on the `Ironic` resource, all +Ironic components (except for MariaDB and dnsmasq) will be installed in a +*DaemonSet* instead of a *Deployment*. + +##### Dnsmasq, iPXE and provisioning network + +Dnsmasq is also not going to be run with more than 1 replica. It's not +impossible to run several DHCP servers on the same network, but it's harder to +configure and to debug. In the future, we might look into some sort of a +managed DHCP offering, e.g. [Kea][kea]. + +Using a provisioning network will require having a provisioning IP per each +control plane node instead of only one with the non-HA architecture. + +Using iPXE in the HA configuration poses one more problem. Our (static) DHCP +configuration must point each host at its iPXE configuration script. However, +dnsmasq does not know, which host belongs to which Ironic instance. To tackle +this limitation, a new [boot configuration API][boot config] has been proposed +(but not yet implemented) in Ironic. It will allow our DHCP configuration to +always point at the same Ironic instance for iPXE configuration, and Ironic +itself will do the required routing. + +[boot config]: https://specs.openstack.org/openstack/ironic-specs/specs/approved/boot-config-api.html + +##### JSON RPC + +Ironic itself is a clustered software. Each instance, as noted above, will +handle its share of all nodes. When an instance crashes, the remaining +instances will take over its responsibilities. You can hit the API on any +instance for any node, and the request will be forwarded to the right instance. + +To achieve that, Ironic supports JSON RPC. Metal3 currently does not use it, +and it still will not be used in the non-HA case. For JSON RPC to be usable, +each Ironic instance must register its RPC access IP or hostname in the +database. + +When TLS support is enabled, the RPC communication must be secured by +TLS as well. This may pose a problem since each Ironic instance needs a TLS +certificate that is valid for its RPC access IP or hostname. This problem has +been extensively discussed in the [initial HA proposal][issue3], and here is +the proposed solution, at least for the MVP case: + +The Ironic controller will generate a self-signed CA and pass its public and +private parts into each Ironic container. Each Ironic container will generate +its private key certificate and sign the certificate with this CA. The CA will +be trusted **only** for the RPC purpose, removing the possibility of abuse. +To reduce the number of code paths, this process will happen unconditionally, +even when TLS for Ironic itself is not enabled. + +[kea]: https://www.isc.org/kea/ +[issue3]: https://github.com/metal3-io/ironic-standalone-operator/issues/3 + +#### Architecture FAQ + +Q: Why cannot we split dnsmasq into a separate deployment in the non-HA +architecture? A: That may require having more than one IP address on the +provisioning network: for dnsmasq and for httpd/ironic. This is a new +operational requirement that I'd like to avoid at this stage. + +Q: Why does the same dnsmasq limitation affect the HA architecture. A: The HA +architecture is completely new here, so we can introduce new requirements +without regression in the operational experience. + +Q: Why using *DaemonSets* if *StatefulSets* provide us an easier way to address +separate Ironic instances? A: While we're relying on host networking, making +several Ironic instances co-exist on the same Kubernetes node is too complex. + +Q: Why aren't we using HostPort services? A: The fact that they provide a +random port is a roadblock for production deployments since many of them +require opening a predictable port in the firewall configuration. If we use a +pre-defined port, it may cause conflicts with other HostPort services or even +end up outside of the allowed range. + +#### Current API design + +Currently, the API consists of two main objects: `IronicDatabase` and `Ironic`. + +The `IronicDatabase` object is very simple: + +- `credentialsRef` - a reference to a secret with credentials (generated if + missing) +- `image` - container image to use +- `tlsRef` - a reference to a TLS secret to use for the service + +The `Ironic` object is much more complex and should probably be split into more +custom resources as we polish its internal architecture. Currently, it uses +nested structures to logically group fields. Here are the most important fields +(omitting various fine-tuning for brevity): + +- `credentialsRef` - a reference to a secret with credentials (generated if + missing) +- `databaseRef` - a reference to an `IronicDatabase` object (if needed) +- `distributed` - a boolean flag that enables the HA architecture +- `images` - a nested structure with images for all components, as well as IPA + Downloader source links +- `networking` - a nested structure that defines networking (see below) +- `nodeSelector` - a selector for nodes to run Ironic on +- `tlsRef` - a reference to a TLS secret to use for the service + +The `networking` sub-structure deserves a separate consideration: + +- `apiPort`, `imageServerPort`, `imageServerTLSPort` allow overriding listening + ports for the services +- `bindInterface` - a boolean flag that makes Ironic listen on only the + provisioning interface +- `dhcp` - another nested structure with DHCP parameters (see below) +- `externalIP` - IP through which nodes deployed over virtual media access + Ironic and HTTPD +- `interface`, `ipAddress`, `macAddresses` - various ways to specify the + provisioning interface + +Finally, the `dhcp` sub-sub-structure contains the following fields: + +- `networkCIDR` - CIDR of the provisioning network (required) +- `rangeBegin`, `rangeEnd` define the DHCP range (derived from `networkCIDR` if + missing) +- `dnsAddress`, `serveDNS` - two mutually exclusive ways to optionally provide + DNS to hosts: either a fixed address or dnsmasq itself +- `hosts`, `ignore` - fine tuning for specific hosts +- `gatewayAddress` - IP address of the default gateway (if necessary) + +Providing a non-nil `dhcp` value enables dnsmasq. + +### Risks and Mitigations + +Our reliance on host networking means that it's not trivial to have several +Ironic installations on the same cluster. Each would need to use different +ports to avoid conflicts. Even without host networking, having several dnsmasq +instances on the same network is not going to work without some sort of +coordination between them. + +### Work Items + +General enablement: + +- Add the operator to the development environment. +- Add an optional flag either to metal3-dev-env or to BMO e2e tests (TBD) that + uses ironic-standalone-operator instead of the Kustomize scripts in BMO. +- Create and run CI jobs (integration or e2e - depending on the previous work + item) on the operator. + +HA: + +- Implement the boot configuration API in Ironic (dependency). +- Start generating a private CA for JSON RPC. +- Enable the HA architecture. + +### Dependencies + +None for the core operator. + +The HA approach will require [boot configuration API][boot config]. + +### Test Plan + +The new operator will become the primary way to install Ironic. As such, it +will be tested in various CI jobs. + +### Upgrade / Downgrade Strategy + +TODO + +### Version Skew Strategy + +TODO + +## Drawbacks + +- One more project for the small Metal3 team to maintain. + +## Alternatives + +- Keep using Kustomize YAML files to install Ironic. This approach has already + proven to be error-prone and confusing especially for new users. + +## References