Use a pre-installed Minikube instance for integration tests. #521

mccheah · 2017-10-06T00:43:19Z

This changes the integration test workflow to assume that Minikube is already installed on the testing machine. Previously, the integration tests downloaded Minikube in the build cycle, and started/stopped the Minikube VM on every test execution. However, this made it such that multiple integration tests cannot run concurrently.

This commit allows multiple tests to share a single Minikube instance, and also requires users that run integration tests to have Minikube pre-installed. If the minikube instance has enough resources, multiple tests can run against it at the same time. Each test needs to use its own set of Docker images, so the docker image builder now tags images uniquely on every test execution.

This changes the integration test workflow to assume that Minikube is already installed on the testing machine. Previously, the integration tests downloaded Minikube in the build cycle, and started/stopped the Minikube VM on every test execution. However, this made it such that multiple integration tests cannot run concurrently. This commit allows multiple tests to share a single Minikube instance, and also requires users that run integration tests to have Minikube pre-installed. If the minikube instance has enough resources, multiple tests can run against it at the same time. Each test needs to use its own set of Docker images, so the docker image builder now tags images uniquely on every test execution.

mccheah · 2017-10-06T00:46:50Z

This build won't pass until Minikube is pre-installed on our Jenkins node. Conversely, if we set up Minikube on the Jenkins node, all outstanding PRs will need to rebase on top of this commit once it merges in to pass those builds. i'm not sure what the best way forward is with regards to sequencing these changes.

This change allows tests to run on the Riselab Jenkins environment as well.

@ash211 @ifilonenko for review, @ssuchter @shaneknapp for infrastructure considerations.

ifilonenko · 2017-10-08T20:06:06Z

.../test/scala/org/apache/spark/deploy/k8s/integrationtest/docker/SparkDockerImageManager.scala

+      val originalDockerFileText = Files.readLines(
+          DOCKER_BUILD_PATH.resolve(dockerFile).toFile, Charsets.UTF_8).asScala
+      val dockerFileTextWithProperBaseImage = originalDockerFileText.map(
+          _.replace("FROM spark-base", s"FROM spark-base:$dockerTag"))


Is there anyway we can create pointers to the docker images themselves, or at least store their RMI so we don't have to go into the contents of the Dockerfile?

Not entirely sure - we're always building all of the Docker images from scratch including the spark-base image with the given tag, and the children images need to be built with the base set with the appropriate tag.

One option is to push this to the Maven level instead. The pre-integration-test phase can copy the Dockerfiles from docker-minimal-bundle but instead of keeping the tags, Maven can set the tags accordingly. I'm not sure what the Maven primitives would be to do this.

What kind of tags would you want to set? Something unique about the build?

The docker image could be tagged with the git hash of the head commit of the repository. This should be unique per build.

Seems like the maven mojo you're looking for is the 'buildnumber-maven-plugin': http://www.mojohaus.org/buildnumber-maven-plugin/

ssuchter

I have an overall question. Instead of having the requirement of using a pre-installed minikube, could we leave the old functionality in a way that it's flag guarded? A maven profile would be a good way to activate it.

ssuchter · 2017-10-10T23:24:50Z

.../test/scala/org/apache/spark/deploy/k8s/integrationtest/docker/SparkDockerImageManager.scala

+
+  private def deleteImage(name: String): Unit = {
+    try {
+      dockerClient.removeImage(s"$name:$dockerTag")


If I understand this right, you're modifying the system global docker images here. Given that dockerTag is always 'latest' is there a concern that multiple simultaneous invocations of this test could be fighting for the same 'latest' image? Should we change the tag to have a unique value per build?

I see you just answered this in the other comment thread.

The tag is actually generated here in the Minikube backend.

Tag should be set to a UUID. I also considered the git hash as mentioned before.

mccheah · 2017-10-10T23:25:32Z

We could but I'd prefer to avoid having to maintain two code paths if possible. If we're sending this upstream, it would be best to make the system simple.

…gration-tests-reuse-minikube

foxish · 2017-12-05T23:23:14Z

Is this waiting for any more changes? Do we need the pepperdata build system also to install minikube?

mccheah · 2017-12-05T23:28:54Z

Still ironing out some errors in the Riselab system, but yes we will need Pepperdata Jenkins to install minikube as well. Once we merge this into branch-2.2-kubernetes, ALL open PRs need to rebase on top of this change to pass CI... and to not delete Minikube accidentally.

felixcheung · 2017-12-07T06:34:25Z

...s/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/backend/minikube/Minikube.scala

@@ -37,10 +37,22 @@ private[spark] object Minikube extends Logging {

  def getMinikubeStatus: MinikubeStatus.Value = synchronized {
    val statusString = executeMinikube("status")
+<<<<<<< HEAD


echarles · 2017-12-13T15:25:10Z

Great to have the option to run concurrent integration tests on an already existing minikube instance with different tags. I haven't read the proposed changes, but how do you create the image tags : rebuilding or simply tagging an initial build?

However I'd like to second @ssuchter about the interest in keeping the previous behavior: a CI server should not have minikube pre-installed. I think it will be harder to ask Apache infra to have this on their servers, rather than letting maven download it.

My vote would go for concurrent run (with tags) on a downloaded minikube, but an option to connect to an existing one, so a mix of existing and proposed behavior.

echarles · 2017-12-14T08:14:45Z

I have run the integration test on this branch and get a failure, which I suppose, is a random one as described in #571.

- Run PySpark Job on file from SUBMITTER with --py-files
- Run PySpark Job on file from CONTAINER with spark.jar defined
- Run SparkR Job on file locally
- Run SparkR Job on file from SUBMITTER
- Simple submission test with the resource staging server.
- Enable SSL on the resource staging server
- Use container-local resources without the resource staging server
- Dynamic executor scaling basic test
- Use remote resources without the resource staging server.
- Mix remote resources with submitted ones.
- Use key and certificate PEM files for TLS.
- Use client key and client cert file when requesting executors
- Added files should be placed in the driver's working directory. *** FAILED ***
  java.net.ConnectException: Failed to connect to /192.168.99.100:30941
  at okhttp3.internal.connection.RealConnection.connectSocket(RealConnection.java:225)
  at okhttp3.internal.connection.RealConnection.connect(RealConnection.java:149)
  at okhttp3.internal.connection.StreamAllocation.findConnection(StreamAllocation.java:195)
  at okhttp3.internal.connection.StreamAllocation.findHealthyConnection(StreamAllocation.java:121)
  at okhttp3.internal.connection.StreamAllocation.newStream(StreamAllocation.java:100)
  at okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.java:42)
  at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
  at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67)
  at okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.java:93)
  at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
  ...
  Cause: java.net.ConnectException: Connection refused
  at java.net.PlainSocketImpl.socketConnect(Native Method)
  at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
  at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
  at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
  at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
  at java.net.Socket.connect(Socket.java:589)
  at okhttp3.internal.platform.Platform.connectSocket(Platform.java:124)
  at okhttp3.internal.connection.RealConnection.connectSocket(RealConnection.java:223)
  at okhttp3.internal.connection.RealConnection.connect(RealConnection.java:149)
  at okhttp3.internal.connection.StreamAllocation.findConnection(StreamAllocation.java:195)
  ...
- Setting JVM options on the driver and executors with spaces.
- Submit small local files without the resource staging server.
- Use a very long application name.
...

mccheah added 2 commits October 5, 2017 17:40

Fix docs

12287ff

Add some clarifications for docs.

bbe2a48

ifilonenko reviewed Oct 8, 2017

View reviewed changes

ssuchter reviewed Oct 10, 2017

View reviewed changes

mccheah mentioned this pull request Oct 24, 2017

[SPARK-18278] [Scheduler] Spark on Kubernetes - Basic Scheduler Backend apache/spark#19468

Closed

mccheah added 5 commits October 24, 2017 11:51

Fix compilation error

8e1e381

Force containers to halt before removing images.

cd77c2a

Fix compilation errors

ace4218

Invert the requirement check

9a6884e

Wait for all containers to disappear before deleting images.

70fee6d

mccheah mentioned this pull request Dec 1, 2017

Integration testing on master #568

Open

mccheah added 3 commits December 1, 2017 16:48

Use ImageIds to look up containers and match image ids with repo tags

1780d8e

Merge remote-tracking branch 'origin/branch-2.2-kubernetes' into inte…

8d21a4a

…gration-tests-reuse-minikube

Fix merge conflict

31b24e8

felixcheung reviewed Dec 7, 2017

View reviewed changes

ifilonenko mentioned this pull request Dec 10, 2017

[INTEGRATION_TESTS] Latency before effective tests starts #572

Closed

foxish mentioned this pull request Dec 13, 2017

Docker logging handler #576

Open

mccheah mentioned this pull request Dec 18, 2017

Add initial integration test code apache-spark-on-k8s/spark-integration#3

Merged

ifilonenko mentioned this pull request Dec 29, 2017

Use a pre-installed Minikube instance -- porting over logic from PR 521 apache-spark-on-k8s/spark-integration#14

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use a pre-installed Minikube instance for integration tests. #521

Use a pre-installed Minikube instance for integration tests. #521

mccheah commented Oct 6, 2017

mccheah commented Oct 6, 2017

ifilonenko Oct 8, 2017

mccheah Oct 9, 2017

ssuchter Oct 10, 2017

mccheah Oct 10, 2017

ssuchter Oct 10, 2017

ssuchter left a comment

ssuchter Oct 10, 2017

ssuchter Oct 10, 2017

mccheah Oct 10, 2017

mccheah Oct 10, 2017

mccheah commented Oct 10, 2017

foxish commented Dec 5, 2017

mccheah commented Dec 5, 2017

felixcheung Dec 7, 2017

echarles commented Dec 13, 2017

echarles commented Dec 14, 2017

Use a pre-installed Minikube instance for integration tests. #521

Are you sure you want to change the base?

Use a pre-installed Minikube instance for integration tests. #521

Conversation

mccheah commented Oct 6, 2017

mccheah commented Oct 6, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ssuchter left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mccheah commented Oct 10, 2017

foxish commented Dec 5, 2017

mccheah commented Dec 5, 2017

Choose a reason for hiding this comment

echarles commented Dec 13, 2017

echarles commented Dec 14, 2017