Split controller and node binaries #95

jarrpa · 2018-11-12T15:41:31Z

This PR splits the controller and node CSI roles into two separate binaries. Driver tests are not currently implemented as the CSI sanity test suite does not yet support split driver roles (PR submitted for that: kubernetes-csi/csi-test#135).

Signed-off-by: Jose A. Rivera jarrpa@redhat.com

centos-ci · 2018-11-12T15:46:01Z

Can one of the admins verify this patch?

Madhu-1 · 2018-11-12T18:09:23Z

add to whitelist

Madhu-1 · 2018-11-12T18:09:53Z

retest this please

Madhu-1 · 2018-11-12T18:11:14Z

@jarrpa can you give some more context why we need two separate binaries?

JohnStrunk · 2018-11-12T18:48:18Z

After taking a quick look, I think this is going to need a design doc and significant discussion. Perhaps I'm misunderstanding things like the client directory, but it looks like there's code in there that enumerates all PVs in a kube cluster. And there's also some sort of cache of servers and username?

This seems like more than "split controller and node binaries"

jarrpa · 2018-11-12T20:35:32Z

@Madhu-1 This allows logic isolation of the different roles, and is a precursor to bringing in different types of Controllers, such as for heketi and gluster-block.

@JohnStrunk Yes. The reason for the cache is because when not using default REST parameters there is no way to communicate with the target Gluster cluster from a DeleteVolume request. This is the result of me accidentally squashing two patches together, because I did the REST changes first and the split second. I'll try to untangle them after the fact.

Signed-off-by: Jose A. Rivera <jarrpa@redhat.com>

jarrpa · 2018-11-12T22:46:54Z

@JohnStrunk Patches untangled, PTAL

jarrpa · 2018-11-13T13:10:13Z

I'll also note that I do have a local version of driver_test.go that works with a modified csi-test repo, and that succeeds. If the csi-test PR merges I'll update this PR.

JohnStrunk · 2018-11-13T19:45:23Z

@Madhu-1 This allows logic isolation of the different roles, and is a precursor to bringing in different types of Controllers, such as for heketi and gluster-block.

I'm still not understanding the motivation. Now each type of CSI driver has multiple containers that must be built.
Modularizing the code (e.g., abstracting the client stuff to an interface) is good, but I don't see what we're getting from multiple binaries and containers per driver type. I can't figure out how that provides benefits.

jarrpa · 2018-11-14T17:37:20Z

I'm still not understanding the motivation. Now each type of CSI driver has multiple containers that must be built.
Modularizing the code (e.g., abstracting the client stuff to an interface) is good, but I don't see what we're getting from multiple binaries and containers per driver type. I can't figure out how that provides benefits.

That's not true. Each CSI driver just needs to build its own Controller container. We can keep one Node container compatible with multiple Controllers. This is what I do in my WIP code for having a heketi Controller driver, a gd2 Controller driver, and a GlusterFS Node driver that works with both. This would then allow you to, for example, update the Controller containers without touching the Node container.

jarrpa · 2018-11-14T18:28:48Z

It should be noted, the way I designed my multi-Controller implementation was that both Controller pods listened as the glusterfs.gluster.org provisioner, they just had tests to verify if the configured REST client met their particular API. This was sufficient to allow the correct Controller to do the provisioning. The Node easily mounted volumes from either provisioner.

JohnStrunk · 2018-11-14T22:48:18Z

I'm still not understanding the motivation. Now each type of CSI driver has multiple containers that must be built.
Modularizing the code (e.g., abstracting the client stuff to an interface) is good, but I don't see what we're getting from multiple binaries and containers per driver type. I can't figure out how that provides benefits.

That's not true. Each CSI driver just needs to build its own Controller container. We can keep one Node container compatible with multiple Controllers. This is what I do in my WIP code for having a heketi Controller driver, a gd2 Controller driver, and a GlusterFS Node driver that works with both. This would then allow you to, for example, update the Controller containers without touching the Node container.

Only having a Controller container may work for Heketi vs. GD2 for gluster-fuse, but I'm skeptical that a common node container works in all cases. Again, what does this get us?
Today, we have a single binary that has controller, identity, attacher, and node APIs. You are proposing to split this into multiple binaries, and for the Heketi vs. GD2 use case, I get that the node binary may be usable, unchanged.

My point is that having a combined binary and single container model doesn't provide any less code sharing while keeping it simple... For Heketi, use image 1 everywhere; for GD2, use image 2 everywhere. Later, when you want to use g-b, that's image 3, and something else like loopback is image 4, etc. Keeping track of which Controller container I can use with which node container on top of that sounds like it's asking for errors in deployment, so if it doesn't get us something really tangible, why do it?

jarrpa · 2018-11-15T13:01:07Z

My point is that having a combined binary and single container model doesn't provide any less code sharing while keeping it simple... For Heketi, use image 1 everywhere; for GD2, use image 2 everywhere. Later, when you want to use g-b, that's image 3, and something else like loopback is image 4, etc. Keeping track of which Controller container I can use with which node container on top of that sounds like it's asking for errors in deployment, so if it doesn't get us something really tangible, why do it?

There was already talk of having a separate driver for gluster-block, at the very least, under the name of something like glusterblock.gluster.org. It would seem like an unnecessary burden to implement all future drivers into the same binary. The argument that it sets up potential misconfiguration scenarios is moot if you consider this all mostly coming from an Operator, and in the manual case it seems like a discredit to our users to say that simply having options is a hassle. It's only end users that likely don't want to have to care about their storage configuration. Anyone who is actually interested in deploying storage should and likely will have some interest in configuration options, at which point it'd only be on us to make sure they're properly documented and don't have confusing names.

I personally see no tangible benefit in keeping things in a single binary, so it seems only natural to split the binaries at logical boundaries for crash isolation at the very least.

JohnStrunk · 2018-11-15T19:40:16Z

There was already talk of having a separate driver for gluster-block, at the very least, under the name of something like glusterblock.gluster.org. It would seem like an unnecessary burden to implement all future drivers into the same binary.

Yes, each driver should have its own binary and container. That binary supports the entire set of supported CSI APIs for that driver. This patch has been about potentially having multiple binaries and containers per driver by splitting the CSI API implementation across binaries and containers.

Today, we have a single binary that is deployed in the provisioner, attacher, and node roles (all as separate pods). Only a portion of the API is used in each of those roles, but the unified binary keeps it all together as a single unit that is built, tested, and deployed as a consistent version. Requiring an admin to specify different images for each is simply unnecessary and it increases the likelihood of error. It's also a burden on QE and the maintainers to ensure all images are built and in-sync.

jarrpa · 2018-11-16T04:01:37Z

Today, we have a single binary that is deployed in the provisioner, attacher, and node roles (all as separate pods). Only a portion of the API is used in each of those roles, but the unified binary keeps it all together as a single unit that is built, tested, and deployed as a consistent version. Requiring an admin to specify different images for each is simply unnecessary and it increases the likelihood of error. It's also a burden on QE and the maintainers to ensure all images are built and in-sync.

These are non-existent burdens if our automation is at least half-way decent. You don't have to make the admin remember different versions, just tag all the images with the same versions if you want. And again, this all becomes moot with an Operator to handle the versioning for you. If all else fails, just build all container images every time one of them changes and version all of them, with a stated policy that we only support deployments with matching tags between driver images.

The primary reason for all this is that I don't want to have two identical node drivers running per node in the case where a cluster is running both heketi- and gd2-based controller drivers. The only way to accomplish this would be to have a single driver that handles both heketi and gd2 APIs dynamically, which I believe we also wanted to avoid.

Given all this, I still see no good reason not to do this, especially given the benefit of keeping as much logic separated as possible.

Madhu-1 · 2018-11-23T06:45:46Z

@JohnStrunk any update on this PR?

JohnStrunk · 2018-11-26T15:31:24Z

@JohnStrunk any update on this PR?

I have nothing to add to my previous comments.

jarrpa · 2018-11-27T16:17:18Z

@JohnStrunk If you don't have anything else to add, that would indicate that you do not refute the final points I made. Is that the case, and if so may this PR go through?

jarrpa · 2018-11-27T19:25:37Z

In the interest of moving some things along, I created #110 to merge the work of at least splitting the driver into different libraries.

ghost assigned jarrpa Nov 12, 2018

ghost added the in progress label Nov 12, 2018

jarrpa requested review from humblec, JohnStrunk and Madhu-1 November 12, 2018 15:41

jarrpa force-pushed the split-binary branch from 3121442 to ed1bdca Compare November 12, 2018 15:57

Split controller and node binaries

6808a61

Signed-off-by: Jose A. Rivera <jarrpa@redhat.com>

jarrpa force-pushed the split-binary branch from ed1bdca to 6808a61 Compare November 12, 2018 22:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Split controller and node binaries #95

Split controller and node binaries #95

jarrpa commented Nov 12, 2018 •

edited

Loading

centos-ci commented Nov 12, 2018

Madhu-1 commented Nov 12, 2018

Madhu-1 commented Nov 12, 2018

Madhu-1 commented Nov 12, 2018

JohnStrunk commented Nov 12, 2018

jarrpa commented Nov 12, 2018

jarrpa commented Nov 12, 2018

jarrpa commented Nov 13, 2018

JohnStrunk commented Nov 13, 2018

jarrpa commented Nov 14, 2018

jarrpa commented Nov 14, 2018

JohnStrunk commented Nov 14, 2018

jarrpa commented Nov 15, 2018

JohnStrunk commented Nov 15, 2018

jarrpa commented Nov 16, 2018

Madhu-1 commented Nov 23, 2018

JohnStrunk commented Nov 26, 2018

jarrpa commented Nov 27, 2018

jarrpa commented Nov 27, 2018

Split controller and node binaries #95

Are you sure you want to change the base?

Split controller and node binaries #95

Conversation

jarrpa commented Nov 12, 2018 • edited Loading

centos-ci commented Nov 12, 2018

Madhu-1 commented Nov 12, 2018

Madhu-1 commented Nov 12, 2018

Madhu-1 commented Nov 12, 2018

JohnStrunk commented Nov 12, 2018

jarrpa commented Nov 12, 2018

jarrpa commented Nov 12, 2018

jarrpa commented Nov 13, 2018

JohnStrunk commented Nov 13, 2018

jarrpa commented Nov 14, 2018

jarrpa commented Nov 14, 2018

JohnStrunk commented Nov 14, 2018

jarrpa commented Nov 15, 2018

JohnStrunk commented Nov 15, 2018

jarrpa commented Nov 16, 2018

Madhu-1 commented Nov 23, 2018

JohnStrunk commented Nov 26, 2018

jarrpa commented Nov 27, 2018

jarrpa commented Nov 27, 2018

jarrpa commented Nov 12, 2018 •

edited

Loading