Add s3 reliability test #44

lukesteensen · 2020-03-25T22:24:07Z

While far from perfect, this works to spin up the equivalent reliability test to the old test env repo.

I'll comment on things of note, but I'm curious how this looks overall.

Signed-off-by: Luke Steensen <luke.steensen@gmail.com>

lukesteensen · 2020-03-25T22:25:11Z

ansible/roles/vector/tasks/configure.yml

@@ -1,6 +1,6 @@
 ---
 - name: Install Vector configuration
-  template:
+  copy:


This is to get around an issue where template tries to fill in the vector config's templates. We can probably get around it with escaping, but this seemed safer.

It also opens the question of, if we actually wanted to template some things in, how would we do it?

Good point! We can use different variable_end_string and variable_start_string when resolving vector templates.

Agree, @lukesteensen we use templates to insert addresses, ports, etc, so we'll need to revert this. I'll try to submit a PR today that changes this to use different variable_end_string and variable_start_string values.

lukesteensen · 2020-03-25T22:25:56Z

ansible/roles/verifiable-logger/templates/verify-logs.service

+Description=verify-logs
+
+[Service]
+ExecStart=/usr/local/bin/verifiable-logger verify file-to-s3-reliability-test-data us-east-1 --prefix "host=%H" --tail


This is pretty hardcoded right now, but should obviously be templated in once we figure out the right way to get the variables in here.

I would highly recommend not adding support for environment variables in Systemd. I also ran into weird issues when I passed more than one flag. It was a mess. But if you look at the http_test_server role I used a template for that service file.

bin/test

lukesteensen · 2020-03-25T22:26:53Z

cases/file_to_s3_reliability/README.md

+You can run this test via:
+
+```
+test -t file_to_s3_reliability


One thing missing here: how do we shut down the test and destroy all the resources?

Currently, we kill the VMs via CloudWatch alert. We need a better way, maybe could use lambda for it.

Yes, we should add a bin/teardown -t <test> -c <config>. The terraform destroy command should make this easy. It should only be called for the local test state, not global state, obviously.

Currently, we kill the VMs via CloudWatch alert. We need a better way, maybe could use lambda for it.

What's wrong with CloudWatch? I think it works quite well for this.

CloudWatch works!
But if we create resources per test - then if we have bin/teardown, we have to invoke it with enough context so that it only deletes resources associated with a particular run.

So this is where we use lambda: we can assign the invocation of bin/teardown with all required context, delayed by, let's say, three hours. We will then schedule this lambda invocation as part of bin/test run. It'll allow us to not only clean up the VMs, but also all the associated terraform state - policies, S3 buckets, VPC, and everything else that we create per-test.

I think this solution can replace our CloudWatch VMs removal because we can clean up everything, including the VMs!

This should solve it for now: #52

lukesteensen · 2020-03-25T22:27:22Z

cases/file_to_s3_reliability/ansible/config_files/vector.toml

+region = "us-east-1"
+bucket = "file-to-s3-reliability-test-data"


Again, these should be templated in somehow.

I'm ok with this stuff being hard coded since it's all contained within this test. We could use variables (this is what the configurations folder is for in the test, but I don't think it matters too much.

The problem is S3 bucket names have global scope, so we either have to use a different one each test run, or make this part of the state global. I think making it global is worse than templating the names.

lukesteensen · 2020-03-25T22:29:02Z

cases/file_to_s3_reliability/configurations/default/ansible.yml

@@ -0,0 +1,2 @@
+---
+foo: "bar"


It didn't let me not have this file.

Presumably, we could use this for things like the bucket name but a few things weren't clear:

How to template into the vector toml while leaving templated fields alone

How to get this same variable into terraform

How to get this same variable into terraform

We do the templating at part of bin/test and our lib facility, and pass it to both ansible and terraform!

lukesteensen · 2020-03-25T22:29:19Z

cases/file_to_s3_reliability/terraform/main.tf

+
+resource "aws_s3_bucket" "logs-bucket" {
+  # data is namespaced by host within the bucket
+  bucket = "file-to-s3-reliability-test-data"


Again, should be templated.

Yeah, I would template this so it's namespaced like our instance names. Ex:

vector-test-${var.user_id}-${var.test_name}-${var.test_configuration}-test-data

cases/file_to_s3_reliability/terraform/variables.tf

lukesteensen · 2020-03-25T22:32:20Z

terraform/aws_uni_topology/main.tf

@@ -11,7 +11,7 @@ locals {
 }

 module "vpc" {
-  source = "../../../terraform/aws_vpc"
+  source = "../aws_vpc"


I'm not sure if I was doing something wrong, but nothing worked at all without changing these paths.

This is odd. Test harness seems to work on master currently...

Checked, doesn't work on my end either. Created #45.

lukesteensen · 2020-03-25T22:33:54Z

cases/file_to_s3_reliability/terraform/main.tf

+}
+
+module "topology" {
+  source = "../../../terraform/aws_uni_topology"


Had to change this from the case I copied to get things working.

MOZGIII

I like the idea!

While reading, I thought - what if we use a global shared S3 bucket? But it's better this way. We definitely need to template the bucket name.
We probably should introduce a run_id, derived from the user_id + a random string. This would also be useful for proper terraform state isolation - something I wanted to work on as part of the task to run multiple test cases in parallel.

MOZGIII · 2020-03-26T14:12:01Z

ansible/roles/vector/tasks/configure.yml

@@ -1,6 +1,6 @@
 ---
 - name: Install Vector configuration
-  template:
+  copy:


Good point! We can use different variable_end_string and variable_start_string when resolving vector templates.

MOZGIII · 2020-03-26T14:15:33Z

cases/file_to_s3_reliability/README.md

+You can run this test via:
+
+```
+test -t file_to_s3_reliability


Currently, we kill the VMs via CloudWatch alert. We need a better way, maybe could use lambda for it.

MOZGIII · 2020-03-26T14:17:24Z

terraform/aws_uni_topology/main.tf

@@ -11,7 +11,7 @@ locals {
 }

 module "vpc" {
-  source = "../../../terraform/aws_vpc"
+  source = "../aws_vpc"


This is odd. Test harness seems to work on master currently...

binarylogic

This looks good! Really happy to see this implemented. It definitely raises our confidence that Vector will work reliably over long periods. I think we should clean up these final items.

Also, didn't see anything in here about Slack notifications, etc. I assume that is hard-coded into the verifiable logger? I'm wondering if we can generalize this communication strategy somehow? I don't want to overthink this, but I know we'll need "alerts" of some kind for other tests in the future. We could even throw them on a queue and handle them "generally" out of band. Just thinking out loud a little bit.

binarylogic · 2020-03-26T14:50:01Z

cases/file_to_s3_reliability/terraform/main.tf

+
+resource "aws_s3_bucket" "logs-bucket" {
+  # data is namespaced by host within the bucket
+  bucket = "file-to-s3-reliability-test-data"


Yeah, I would template this so it's namespaced like our instance names. Ex:

vector-test-${var.user_id}-${var.test_name}-${var.test_configuration}-test-data

Signed-off-by: Luke Steensen <luke.steensen@gmail.com>

add s3 reliability test

10174e0

Signed-off-by: Luke Steensen <luke.steensen@gmail.com>

lukesteensen requested a review from binarylogic March 25, 2020 22:24

lukesteensen commented Mar 25, 2020

View reviewed changes

MOZGIII reviewed Mar 26, 2020

View reviewed changes

binarylogic suggested changes Mar 26, 2020

View reviewed changes

remove unneeded stuff

3a8d13a

Signed-off-by: Luke Steensen <luke.steensen@gmail.com>

MOZGIII mentioned this pull request Apr 1, 2020

Correct paths at terraform topology configs #45

Merged

binarylogic mentioned this pull request Apr 4, 2020

Ability to tear down test resources #51

Closed

binarylogic mentioned this pull request Apr 23, 2020

chore: Kubernetes Integration RFC vectordotdev/vector#2222

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add s3 reliability test #44

Add s3 reliability test #44

lukesteensen commented Mar 25, 2020

lukesteensen Mar 25, 2020

MOZGIII Mar 26, 2020

binarylogic Mar 26, 2020

lukesteensen Mar 25, 2020

binarylogic Mar 26, 2020

lukesteensen Mar 25, 2020

MOZGIII Mar 26, 2020

binarylogic Mar 26, 2020

MOZGIII Mar 26, 2020 •

edited

Loading

binarylogic Apr 4, 2020

lukesteensen Mar 25, 2020

binarylogic Mar 26, 2020

MOZGIII Mar 26, 2020

lukesteensen Mar 25, 2020

MOZGIII Mar 26, 2020 •

edited

Loading

lukesteensen Mar 25, 2020

binarylogic Mar 26, 2020

lukesteensen Mar 25, 2020

MOZGIII Mar 26, 2020

MOZGIII Mar 26, 2020

lukesteensen Mar 25, 2020

MOZGIII left a comment

MOZGIII Mar 26, 2020

MOZGIII Mar 26, 2020

MOZGIII Mar 26, 2020

binarylogic left a comment

binarylogic Mar 26, 2020

		region = "us-east-1"
		bucket = "file-to-s3-reliability-test-data"

Add s3 reliability test #44

Are you sure you want to change the base?

Add s3 reliability test #44

Conversation

lukesteensen commented Mar 25, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MOZGIII Mar 26, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MOZGIII Mar 26, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MOZGIII left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

binarylogic left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MOZGIII Mar 26, 2020 •

edited

Loading

MOZGIII Mar 26, 2020 •

edited

Loading