Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Benthos-Captain, the Kubernetes Operator #519

Open
mfamador opened this issue Oct 9, 2020 · 9 comments
Open

Benthos-Captain, the Kubernetes Operator #519

mfamador opened this issue Oct 9, 2020 · 9 comments
Labels
enhancement feedback wanted We want premium quality opinions on this issue

Comments

@mfamador
Copy link
Contributor

mfamador commented Oct 9, 2020

Hi!

I’ve been having this crazy idea of creating a benthos Kubernetes operator, but I'm unsure if this would be something worth to pursue. I would very much appreciate any thoughts that you might have on this.
And I'd be more than happy to bootstrap a project if anyone thinks this might be of interest.

Summary

The general idea would be having a controller/operator running inside a Kubernetes cluster, which basically would be listening to events triggered when a pipeline resource is created, changed or deleted, and orchestrating the corresponding benthos pipelines described by a custom resources definition (CRD), for example:

apiVersion: benthos.dev/v1
kind: Pipeline
metadata:
 name: foo-pipeline
 namespace: data
spec:
 workers: 3
 config:
  input:
   gcp_pubsub:
    project: foo
    subscription: bar
  pipeline:
   processors:
    - bloblang: |
      root.message = this
      root.meta.link_count = this.links.length()
      root.user.age = this.user.age.number()
  output:
   redis_streams:
    url: tcp://${REDIS_URL}:6379
    stream: baz
    max_in_flight: 20
❯ kubectl get pipelines --all-namespaces
NAMESPACE  NAME          WORKERS  STATUS
data       foo-pipeline  3        Running
data       bar-pipeline  2        Stopped
core       baz-pipeline  1        Running

To deploy a new pipeline all we’d have to do would be to create a new pipelineresource:

> kubectl apply -f foo-pipeline.yaml

benthos-captain

I’d imagine two possibilities in the way we'd run the benthos-captain controller/operator:

option a:

The benthos containers running inside the controller instance(s) in streams mode. We'd have to figure out a way to distribute evenly the workloads (benthos pipelines) among the workers.

❯ kubectl get deployment --namespace benthos
NAME             READY   UP-TO-DATE   AVAILABLE   AGE
benthos-captain  2/2     2            2           1d

option b:

The operator starting a new workload (k8s deployment) for each registered benthos pipeline

❯ kubectl get deployment --namespace benthos
NAME             READY   UP-TO-DATE   AVAILABLE   AGE
benthos-captain  2/2     2            2           1d
❯ kubectl get deployment --namespace data
NAME                          READY   UP-TO-DATE   AVAILABLE   AGE
foo-pipeline-benthos-captain  3/3     3            3           1d
bar-pipeline-benthos-captain  2/2     2            2           1d

install

To install benthos-captain operator we could create a helm chart, for example:

❯ helm repo add benthos https://charts.benthos.dev
❯ helm upgrade -i benthos-captain benthos/benthos-captain \
--namespace benthos \
--set createCRD=true

blobctl

It could also be useful to create a CLI to communicate with the operator and perform some actions, i.e., pause or resume a pipeline, etc.

❯ blobctl -n data pause foo-pipeline-benthos-captain

❯ blobctl resume --all

image

@Jeffail Jeffail added enhancement feedback wanted We want premium quality opinions on this issue labels Oct 9, 2020
@Jeffail
Copy link
Collaborator

Jeffail commented Oct 9, 2020

😍 I love this. Personally I'd lean towards option B as it seems more true to the k8s ethos, but I'd love to hear from more k8s users about preferences. If this were to become the recommended way to deploy Benthos on k8s then I would want to update create, lint and test subcommands to be able to work with this CRD as an option.

@Jeffail Jeffail pinned this issue Oct 9, 2020
@Jeffail
Copy link
Collaborator

Jeffail commented Oct 9, 2020

Relevant reading: https://thenewstack.io/kubernetes-when-to-use-and-when-to-avoid-the-operator-pattern/

@mfamador
Copy link
Contributor Author

mfamador commented Oct 9, 2020

Very relevant, indeed, and makes me wonder about what the advantages of having an operator would be.
Actually, I'm currently deploying my benthos pipelines using GitOps with Flux, Helm and Kustomize, which now seems to be very close to the outcome of option b, where k8s deployments running benthos containers are deployed.
Unless we'd want to issue operations to the pipelines (pause, resume, ...) and keep some state about the pipelines, probably there are not many advantages in having an operator, at least going with option b.

@Jeffail
Copy link
Collaborator

Jeffail commented Oct 9, 2020

Okay let's limit this to the scope of running Benthos in streams mode. I'm in the process of defining a rough spec for expanding streams mode to be more of a managed platform so this could end up working nicely with that, issue soon to come.

@mfamador
Copy link
Contributor Author

mfamador commented Oct 9, 2020

I probably wouldn't exclude also option B for now, I was thinking how simpler it would be when using GitOps, to create only a HelmRelease for the benthos-captain and a simple Pipeline manifest for each of the pipelines, instead of having the burden to create a HelmRelease for every pipeline.

@mfamador
Copy link
Contributor Author

mfamador commented Oct 11, 2020

I've created a skeleton of a "working" benthos-captain in https://github.com/mfamador/benthos-captain. We can later move it to a new repo or create a new one if you think it would be more convenient.
It was created mostly to understand better how to setup a new operator.
It's already reconciliating benthos pipelines resources! All we have to do now is add ALL the rest :D

2020-10-11T16:18:29.822Z DEBUG controller Successfully Reconciled {"reconcilerGroup": "benthos-captain.benthos.dev", "reconcilerKind": "Pipeline", "controller": "pipeline", "name": "pipeline-sample", "namespace": "default"}

@Jeffail Jeffail unpinned this issue Oct 15, 2020
@iluminae
Copy link
Contributor

I made a operator some time last year for benthos pipelines and abandoned it before it hitting production for other priorities. This is great and I would love to use it. One feature we were trying to get into our pipeline operator was in two parts:

  1. horizontal pod autoscaler (easy, just make the object in k8s). There is also some amazing things you can do with HPA like use the queue depth of your pubsub subscription as the driver for your HPA.
  2. scaling within the pipeline

The default (cpu,mem) autoscaler is only useful if you have your pipeline capable of saturating the CPU (or mem) of your pod, so the idea was to scale the #threads and brokered inputs and outputs until it does pack the pod, then put on the hpa. This was a little more difficult and we never implemented it.

Awesome work, thanks!

@girishramnani
Copy link

girishramnani commented Nov 4, 2020

wouldn't this overlap with knative eventing https://github.com/knative/eventing or tekton pipeline https://github.com/tektoncd/pipeline ?

@mfamador
Copy link
Contributor Author

mfamador commented Nov 4, 2020

I'm not familiar with those projects, but I don't think they would overlap with an Operator. But yes, there are many ways to deploy to Kubernetes, GitOps with Flux, for example. The idea behind this Operator would be to make the process of deploy Benthos pipelines in Kubernetes easier, using simple Custom Resource Definitions for the pipelines definition and letting the Operator do all the rest.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement feedback wanted We want premium quality opinions on this issue
Projects
None yet
Development

No branches or pull requests

4 participants