Running Spark on Kubernetes

Spark can run on clusters managed by Kubernetes. This feature makes use of native Kubernetes scheduler that has been added to Spark.

The Kubernetes scheduler is currently experimental. In future versions, there may be behavioral changes around configuration, container images and entrypoints.

Prerequisites

A running Kubernetes cluster at version +1.6

kubectl cluster-info
Kubernetes master is running at https://192.168.7.70/clusters/c-8bhgq
KubeDNS is running at https://192.168.7.70/k8s/clusters/c-8bhgq/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy

A runnable distribution of Spark +2.3

Build Apache Spark Docker image (e.g. Apache Spark 2.4.3).

# Download a pre-built Spark
cd examples/Spark/
wget http://archive.apache.org/dist/spark/spark-2.4.3/spark-2.4.3-bin-hadoop2.7.tgz

# Extract Spark
tar xzf spark-2.4.3-bin-hadoop2.7.tgz

# Build Spark Docker image
cd spark-2.4.3-bin-hadoop2.7
./bin/docker-image-tool.sh -r dwarfcu -t 2.4.3 build

Upload Spark Docker image to a repository (e.g. DockerHub).

docker login

docker push dwarfcu/spark:2.4.3

Submitting Applications to Kubernetes

Cluster Mode

Create Namespace, Service Account and Role Binding.

kubectl create -f 00-env.yaml

Submit Spark application (Pi example).

kubectl run spark --image=dwarfcu/spark:2.4.3 -it -n spark --serviceaccount=spark-sa --restart='Never' --rm=true \
 -- /opt/spark/bin/spark-submit \
 --name spark-pi \
 --master k8s://https://192.168.7.70:6443 \
 --deploy-mode cluster \
 --class org.apache.spark.examples.SparkPi \
 --conf spark.kubernetes.driver.pod.name=pi-driver-1 \
 --conf spark.kubernetes.namespace=spark \
 --conf spark.executor.instances=3 \
 --conf spark.kubernetes.container.image=dwarfcu/spark:2.4.3 \
 --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark-sa \
 --conf spark.kubernetes.authenticate.submission.oauthTokenFile=/var/run/secrets/kubernetes.io/serviceaccount/token \
 --conf spark.kubernetes.authenticate.submission.caCertFile=/var/run/secrets/kubernetes.io/serviceaccount/ca.crt \
 local:///opt/spark/examples/jars/spark-examples_2.11-2.4.3.jar 10000

Notice that in the above example it specifies a jar with a specific URI with a scheme of local://. This URI is the location of the example jar that is already in the Docker image.

Check logs.

kubectl logs -n spark pi-driver-1
...
Pi is roughly 3.141641147141641
...

Delete pi-driver-1 pod.

kubectl delete pod -n spark pi-driver-1

Delete Namespace, Service Account and Role Binding.

kubectl delete -f 00-env.yaml

Submitting a personal Spark (JAVA) project

Compile a Spark (JAVA) project, e.g. WordCount.

cd wordCount
mvn clean package
cd ..

Create Namespace, Service Account and Role Binding.

kubectl create -f 00-env.yaml

Create a ConfigMap from a binary file, i.e. from the JAR file.

kubectl create configmap -n spark wordcount-jar --from-file=wordCount-1.0.jar.file=wordCount/target/wordCount-1.0.jar

# Check that configmap has been created correctly
kubectl get configmaps -n spark wordcount-jar -o yaml
apiVersion: v1
binaryData:
  wordCount-1.0.jar.file: UEsDBAoACA...
...

Create a pod for submitting the Spark (JAVA) application (ConfigMaps/Volumes are also mounted).

kubectl create -f 20-spark-wordCount.yaml

Delete environment.

kubectl delete pod -n spark wordcount-driver-1

kubectl delete -f 20-spark-wordCount.yaml

kubectl delete -f 00-env.yaml

Or you can delete only the spark namespace and then all contained resources will also be deleted.

kubectl delete namespaces spark

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Running Spark on Kubernetes

Prerequisites

A running Kubernetes cluster at version +1.6

A runnable distribution of Spark +2.3

Submitting Applications to Kubernetes

Cluster Mode

Submitting a personal Spark (JAVA) project

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
wordCount		wordCount
00-env.yaml		00-env.yaml
10-spark-pi.yaml		10-spark-pi.yaml
20-spark-wordCount.yaml		20-spark-wordCount.yaml
README.md		README.md

DwarfCu/spark-k8s

Folders and files

Latest commit

History

Repository files navigation

Running Spark on Kubernetes

Prerequisites

A running Kubernetes cluster at version +1.6

A runnable distribution of Spark +2.3

Submitting Applications to Kubernetes

Cluster Mode

Submitting a personal Spark (JAVA) project

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages