Skip to content

Latest commit

 

History

History
 
 

cluster-leader-election

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 

Leader election in Kubernetes: A Camel Quarkus Master example

{cq-description}

Suppose an application connects out to an external service and receives information asynchronously via a TCP socket or websocket. As part of this process, the application receives data, transforms the structure, and publishes that data into an Apache Kafka topic. In this case, only a single connection can be active at one time because of the possibility of publishing duplicate data (see diagram below).

figure1

The quick solution to this problem is actually a fundamental characteristic of Kubernetes. If the deployment is created with the replicas set to 1, then when the controller detects the pod is no longer running, it will attempt to create a new one. However, real-world situations can be more complicated. Some applications require a long startup time due to cache warming needs. When you combine slow startup times (minutes) for the pod with business requirements to minimize the loss of downtime, the default solution becomes unsuitable. In this situation, we will want to have multiple instances up and ready to take over the consumption as quickly as possible.

This example shows you how to implement leader election in Kubernetes using Apache Camel.

Hot-warm with leader election

To run an application as hot-warm means to have multiple instances of the application running and ready to serve requests, but only one instance actually doing the work. Within Kubernetes, this means having multiple pods ready at all times, but only one pod active for a particular process. In this scenario, the pods negotiate among themselves which one is active.

Apache Camel has a component (called master) that is built exactly for this scenario. As the docs explain, the Camel-Master endpoint lets us ensure only a single consumer in a cluster consumes from a given endpoint, with automatic failover if that Java virtual machine (JVM) dies. To achieve this goal, the endpoint requires a shared resource and locking. The component has multiple implementations for the locking mechanism, including camel-kubernetes.

Within the component’s configuration, the developer provides a namespace to designate the shared resource. All processes that use the same namespace for the locking will ensure that only one process at a time obtains the lock. When a process has the lock, it is the leader, and the process will run. If it loses the lock for any reason, the component will stop the process, as well.

Start in the Development mode

$ mvn clean compile quarkus:dev

The above command compiles the project, starts the application and lets the Quarkus tooling watch for changes in your workspace. Any modifications in your project will automatically take effect in the running application.

Tip
Please refer to the Development mode section of Camel Quarkus User guide for more details.

Package the application for local playground

Once you are done with developing you may want to package and run the application.

Tip
Find more details about the JVM mode and Native mode in the Package and run section of Camel Quarkus User guide

JVM mode

$ mvn clean package

Native mode

Important
Native mode requires having GraalVM and other tools installed. Please check the Prerequisites section of Camel Quarkus User guide.

To prepare a native executable using GraalVM, run the following command:

$ mvn clean package -Pnative

Local Playground

The first example uses the traditional Camel route domain-specific language (DSL), where we incorporate the master component along with the timer component (see CamelRoute).

Although we will ultimately deploy this application out to a Kubernetes cluster, we want to demonstrate and test this functionality in our local environment. When doing so, we won’t have access to the Kubernetes leases, so we will implement the locking using the camel-file component.

If a new instance of the application is started locally, then the newly started application will not be able to obtain the locks and therefore will not run the timer component. If you kill the leader, the other application will check the lock, see that it’s not locked, and subsequently obtain the lock and start processing. Additionally, we can apply settings to designate how often we want the service to check the locks and acquire the lock.

Run the packaged application in separate terminals.

JVM Mode:

$ java -jar target/quarkus-app/quarkus-run.jar

In case you built the application in Native mode:

./target/*-runner

Depending on which one will start faster, you should first see similar output:

INFO  [clustered] (Camel (camel-quarkus-examples-cluster-leader-election) thread #3 - timer://clustered) Clustered route (timer) e54cc6a7-7b5f-4aa3-a9f8-4c31536c3b75 ...

In the other terminal you should not see Clustered route message.

You can also close the application process with CTRL+C and watch the other terminal will take a lead and you should start seeing Clustered route messages.

Kubernetes

In a local environment, we used the FileLockClusterService. Now that we are ready to deploy this application on Kubernetes, we will switch the implementation from using files to using Kubernetes leases. To start, let’s take a look at the deployment manifest for the application.

Setup service account

In this deployment, we have two replicas, but we want only one pod to run the processes. The configuration note:

  • Notice the serviceAccountName in the deployment config. For this deployment, we need to set up a service account that specifically has access to the Kubernetes leases.

Before we deploy, we need to set up the service account and give it permissions to read and write to the lease objects:

kubectl apply -f src/main/kubernetes/service-account.yaml

Playground

Deploy

mvn clean package -DskipTests -Dquarkus.kubernetes.deploy=true -Dkubernetes -Dquarkus.camel.cluster.file.enabled=false -Dquarkus.camel.cluster.kubernetes.enabled=true

Once we deploy the application into Kubernetes, the application will use the KubernetesClusterService implementation of the CamelClusterService to perform the leadership elections. To do this, the service will periodically query the lease information and attempt to update the information if the last update has not been performed in the designated lease time. The configuration for the timing of the leader election activity is more detailed, which should be expected; we are no longer simply checking a file lock, but rather working in more of a heartbeat monitoring pattern.

You should see running two pods:

camel-quarkus-examples-cluster-leader-election-5d46b7564c-jwbw2                  1/1     Running             0             17m
camel-quarkus-examples-cluster-leader-election-5d46b7564c-vhvxg                  1/1     Running             0             16m

Only one of them is printing Clustered route message. You can try to kill the pod having the lead (in this case it is camel-quarkus-examples-cluster-leader-election-5d46b7564c-jwbw2).

kubectl delete pod camel-quarkus-examples-cluster-leader-election-5d46b7564c-jwbw2

Then new pod election will happen and you should see similar output:

kubectl logs camel-quarkus-examples-cluster-leader-election-5d46b7564c-vhvxg
...
INFO  [org.apa.cam.com.kub.clu.loc.KubernetesLeadershipController] (Camel (camel-quarkus-examples-cluster-leader-election) thread #1 - CamelKubernetesLeadershipController) Pod[camel-quarkus-examples-cluster-leader-election-5d46b7564c-vhvxg] Current pod is becoming the new leader now...
...
INFO  [clustered] (Camel (camel-quarkus-examples-cluster-leader-election) thread #4 - timer://clustered) Clustered route (timer) 9389bae5-7677-4679-90f7-77ce6b7e5fda ...
...

Feedback

Please report bugs and propose improvements via GitHub issues of Camel Quarkus project.