Skip to content

Commit

Permalink
Update docs to include new probe information along with other cleanup (
Browse files Browse the repository at this point in the history
  • Loading branch information
AlexVulaj authored Aug 26, 2024
1 parent ec18609 commit 060ba01
Show file tree
Hide file tree
Showing 3 changed files with 74 additions and 115 deletions.
27 changes: 23 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,10 @@ The recommended workflow of diagnostic use of ONV is shown in the following flow

## Terraform Scripts (AWS)

The Terraform scripts in this repository allow you to set up a secure and scalable network infrastructure in AWS for testing. It will create a VPC with public, private, and firewall(optinal) subnets, an Internet Gateway, a NAT Gateway, and a network firewall(optinal).
The Terraform scripts in this repository allow you
to set up a secure and scalable network infrastructure in AWS for testing.
It will create a VPC with public, private, and firewall (optional) subnets,
an Internet Gateway, a NAT Gateway, and a network firewall(optional).

### Getting Started

Expand All @@ -52,13 +55,29 @@ It is also possible to pass in a custom list of egress endpoints by using the `-

Newly-added lists should be registered as "platform types" in [`helpers.go`](pkg/helpers/helpers.go#L94) using the list file's extensionless name as the value (e.g., abc.yaml should be registered as `PlatformABC string = "abc"`). Finally, the `--platform` help message and value handling logic in [`cmd.go`](cmd/egress/cmd.go) should also be updated.

### Image Selection
### Probes
Probes within the verifier are responsible for a number of important tasks.
These include the following:
- determining which machine images are to be used
- parsing cloud instance console output
- configuring instructions to the cloud instance

The list of images (RHEL base images) that osd-network-verifier selects from to run in is maintained in [pkg/probes/curl/machine_images.go](https://github.com/openshift/osd-network-verifier/tree/main/pkg/probes/curl/machine_images.go). Which image is selected is based on the platform, region and cpu architecture type. By default, "X86" is used unless manually overriden by the `--cpu-arch` flag.
Probes are cloud-platform-agnostic by design,
meaning that their implementations are not specific to any one cloud provider.
All probes must honor the contract defined by the [base probe interface](./pkg/probes/package_probes.go).
By default, the verifier uses the [curl probe](./pkg/probes/curl/curl_json.go).

#### Image Selection

Each probe is responsible for determining its list of approved machine images.
The list of images (RHEL base images) that osd-network-verifier selects
from to run in is maintained in `pkg/probes/<probe_name>/machine_images.go`.
Which image is selected is based on the platform, region and cpu architecture type.
By default, "X86" is used unless manually overridden by the `--cpu-arch` flag.

### IAM Permission Requirement List

Version ID [required for IAM support role](docs/aws/aws.md#iam-support-role) may need update to match specification in [AWS docs](https://docs.aws.amazon.com/IAM/latest/UserGuide/reference_policies_elements_version.html).
Version ID [required for IAM permissions](https://github.com/openshift/osd-network-verifier/blob/main/docs/aws/aws.md#iam-permissions) may need update to match specification in [AWS docs](https://docs.aws.amazon.com/IAM/latest/UserGuide/reference_policies_elements_version.html).

## Release Process

Expand Down
12 changes: 7 additions & 5 deletions RELEASE.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,23 +7,25 @@ osd-network-verifier will follow semantic versioning
These changes don't automatically mean a change is a breaking or significant change, but should be taken into consideration:

* The various input verifier structs in [./pkg/verifier/package_verifier.go](./pkg/verifier/package_verifier.go) are exported and consumed downstream. Breaking changes to that input struct should be considered breaking changes for osd-network-verifier.
* New AMIs in [./pkg/verifier/aws/aws_verifier.go](./pkg/verifier/aws/aws_verifier.go), especially as the result of security fixes.
* New AMIs in [./pkg/probes/curl/machine_images.go](./pkg/probes/curl/machine_images.go), especially as the result of security fixes.
* New cloud IAM requirements/new cloud infrastructure to provision

## Testing changes

For now, this is mostly manual. It's important to validate that these scenarios are working before making a new release:

* The `defaultAmi` mapping values in [/pkg/verifier/aws/aws_verifier.go](./pkg/verifier/aws/aws_verifier.go) should be updated to match what is in the console output for the [most recent golden-ami build](https://ci.int.devshift.net/job/gl-build-master-osd-network-verifier-golden-ami-packer/).
* The `cloudMachineImageMap` values in [./pkg/probes/curl/machine_images.go](./pkg/probes/curl/machine_images.go) should be updated to match what is in the console output for the [most recent golden-ami build](https://ci.int.devshift.net/job/gl-build-master-osd-network-verifier-golden-ami-packer/).
* Note - if the above build is broken due to `ResourceLimitExceeded` issues, you will have to clean up the AMI image repository by running the [cleangoldenami module](./cleangoldenami/README.md), and then re-running the Jenkins build.
* The `networkValidatorImage` in [./pkg/verifier/aws/aws_verifier.go](./pkg/verifier/aws/aws_verifier.go) is the same image that is pre-baked on the `defaultAMI`'s. This can be found by looking at the latest tagged image in the [osd-network-verifier quay repository](https://quay.io/repository/app-sre/osd-network-verifier?tab=tags&tag=latest).
* Build the `integration` binary by running `go build` from the `/integration` folder. Then use this binary to test both the `aws` and `hostedcluster` configurations as shown below. For more information on setting up integration tests, see the [integration README](./integration/README.md).
* `./integration --platform aws-classic`
* `./integration --platform aws-hcp`
* egress test in AWS with a cluster-wide proxy
* ~~egress test on GCP~~ This should be added back when GCP support is functional again

After a new release has been created, please create an MR for the downstream projects to use the latest verifier version:
After a new release has been created, please create an MR for the downstream projects to use the latest verifier
version.
The latest version can be fetched with `go get github.com/openshift/osd-network-verifier@<the new tag>`

* Cluster Service (https://gitlab.cee.redhat.com/service/uhc-clusters-service): After cloning the repo, do `go get github.com/openshift/osd-network-verifier@<the new tag>`
* Cluster Service (https://gitlab.cee.redhat.com/service/uhc-clusters-service)
* osdctl (https://github.com/openshift/osdctl)
* Configuration Anomaly Detection (https://github.com/openshift/configuration-anomaly-detection)
150 changes: 44 additions & 106 deletions docs/aws/aws.md
Original file line number Diff line number Diff line change
@@ -1,21 +1,25 @@
### Table of Contents ###

- [Setup](#setup)
- [AWS Environment](#aws-environment)
- [VPC](#vpc)
- [IAM Permissions](#iam-permissions)
- [Available tools](#available-tools)
- [1. Egress Verification](#1-egress-verification)
- [1.1 Usage](#11-usage)
- [1.1.1 CLI Executable](#111-cli-executable)
- [1.1.2 Go Implementation Examples](#112-go-implementation-examples)
- [1.2 Interpreting Output](#12-interpreting-output)
- [1.3 Workflow](#13-workflow)
- [2. VPC DNS Verification](#2-vpc-dns-verification)
- [2.1 Usage](#21-usage)
- [2.1.1 CLI Executable](#211-cli-executable)
- [2.1.2 Golang API](#212-golang-api)
- [3. BYOVPC Configurations Verification](#3-byovpc-configurations-verification)
<!-- TOC -->
* [Table of Contents](#table-of-contents-)
* [Setup](#setup-)
* [AWS Environment](#aws-environment-)
* [VPC](#vpc-)
* [IAM permissions](#iam-permissions-)
* [Available Tools](#available-tools-)
* [1. Egress Verification](#1-egress-verification-)
* [1.1 Usage](#11-usage-)
* [1.1.1 CLI Executable](#111-cli-executable-)
* [Egress Validations Under Proxy](#egress-validations-under-proxy-)
* [Force Temporary Security Group Creation](#force-temporary-security-group-creation-)
* [1.1.2 Go implementation Examples](#112-go-implementation-examples-)
* [1.2 Interpreting Output](#12-interpreting-output-)
* [1.3 Workflow](#13-workflow-)
* [2. VPC DNS Verification](#2-vpc-dns-verification-)
* [2.1 Usage](#21-usage-)
* [2.1.1 CLI Executable](#211-cli-executable-)
* [2.1.2 Golang API](#212-golang-api-)
<!-- TOC -->

## Setup ##
### AWS Environment ###
Expand Down Expand Up @@ -73,40 +77,16 @@ Ensure that the AWS credentials being used have the following permissions. (This
]
}
```

The SRE only needs below permissions because we should supply Security Group ID by running `./osd-network-verifier egress --security-group-id <SG_ID>`:
```json
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"ec2:CreateTags",
"ec2:RunInstances",
"ec2:DescribeInstances",
"ec2:DescribeInstanceTypes",
"ec2:GetConsoleOutput",
"ec2:TerminateInstances",
"ec2:DescribeVpcAttribute",
],
"Resource": "*"
}
]
}

```

## Available Tools ##

### 1. Egress Verification ###
#### 1.1 Usage ####
The processes below describe different ways of using egress verifier on a single subnet.
In order to verify entire VPC,
repeat the verification process for each subnet ID.
To verify the entire VPC, repeat the verification process for each subnet ID.

##### 1.1.1 CLI Executable #####
1. Ensure correct [environment setup](#setup).
1. Ensure correct [environment setup](#setup-).

2. Clone the source:
```shell
Expand All @@ -119,16 +99,8 @@ repeat the verification process for each subnet ID.
This generates `osd-network-verifier` executable in project root directory.

4. Obtain params:
1. subnet_id: Obtain the subnet id to be verified.
2. image_id: Select an optional image id parameter (ami-xxxxxxxxxxxx) to run on ec2 instance.

You may use the following public image ID as :
```bash
--image-id=resolve:ssm:/aws/service/ami-amazon-linux-latest/amzn2-ami-hvm-x86_64-gp2
```
If the image id is not provided, it is defaulted to an image id from [AWS account olm-artifacts-template.yaml](https://github.com/openshift/aws-account-operator/blob/17be7a41036e252d59ab19cc2ad1dcaf265758a2/hack/olm-registry/olm-artifacts-template.yaml#L75),
for the same region where your subnet is.
3. platform: This parameter dictates for which set of endpoints the verifier should test. If testing a subnet that hosts (or will host) a traditional OSD/ROSA cluster, set this to `aws` (or leave blank). If you're instead testing a subnet hosting a HyperShift Hosted Cluster (*not* a hosted control plane/management cluster) on AWS, set this to `hostedcluster`.
1. subnet-id: the subnet id to be verified.
2. platform: This parameter dictates for which set of endpoints the verifier should test. If testing a subnet that hosts (or will host) a traditional OSD/ROSA cluster, set this to `aws` (or leave blank). If you're instead testing a subnet hosting a HyperShift Hosted Cluster (*not* a hosted control plane/management cluster) on AWS, set this to `hostedcluster`.
5. Execute:
Expand All @@ -141,43 +113,15 @@ repeat the verification process for each subnet ID.
./osd-network-verifier egress --platform aws-hcp --subnet-id $SUBNET_ID
```
Additional optional flags for overriding defaults:
```shell
--cacert string (optional) path to cacert file to be used upon https requests being made by verifier
--cloud-tags stringToString (optional) comma-seperated list of tags to assign to cloud resources e.g. --cloud-tags key1=value1,key2=value2 (default [])
--cpu-arch string (optional) compute instance CPU architecture. Ignored if valid instance-type specified
--debug (optional) if true, enable additional debug-level logging
--egress-list-location string (optional) the location of the egress URL list to use. Can either be a local file path or an external URL starting with http(s). This value is ignored for the legacy probe.
--force-temp-security-group (optional) Enforces creation of Temporary SG creation even if --security-group-ids flag is used
--http-proxy string (optional) http-proxy to be used upon http requests being made by verifier, format: http://user:pass@x.x.x.x:8978
--https-proxy string (optional) https-proxy to be used upon https requests being made by verifier, format: https://user:pass@x.x.x.x:8978
--image-id string (optional) cloud image for the compute instance
--import-keypair string (optional) Takes the path to your public key used to connect to Debug Instance. Automatically skips Termination
--instance-type string (optional) compute instance type
--kms-key-id string (optional) ID of KMS key used to encrypt root volumes of compute instances. Defaults to cloud account default key
--no-tls (optional) if true, skip client-side SSL certificate validation
--platform string (optional) infra platform type, which determines which endpoints to test. Either 'aws-classic', 'gcp-classic', or 'hosted-cp' (hypershift) (default "aws-hcp")
--probe string (optional) select the probe to be used for egress testing. Either 'Curl' (default) or 'Legacy' (default "Curl")
--profile string (optional) AWS profile. If present, any credentials passed with CLI will be ignored
--region string (optional) compute instance region. If absent, environment var AWS_REGION = us-east-2 and GCP_REGION = us-east1 will be used
--security-group-ids strings (optional) comma-separated list of sec. group IDs to attach to the created EC2 instance. If absent, one will be created
--skip-termination (optional) Skip instance termination to allow further debugging
--subnet-id string source subnet ID
--terminate-debug string (optional) Takes the debug instance ID and terminates it
--timeout duration (optional) timeout for individual egress verification requests (default 2s)
--vpc-name string (optional unless --platform='gcp-classic') VPC name where GCP cluster is installed
```
Get cli help:
Additional optional flags for overriding defaults can be found with:
```shell
./osd-network-verifier egress --help
```
##### Egress Validations Under Proxy #####
* Follow the similar flow above, till execute
* Pass proxy config to be used to egress subcommand
* Follow the same flow shown above, until execution
* Pass the proxy config to the egress subcommand
```shell
./osd-network-verifier egress \
Expand All @@ -187,52 +131,46 @@ repeat the verification process for each subnet ID.
--cacert path-to-ca.pem \
--no-tls # optional, used to bypass ca.pem validation (https)
```
##### Force Temporary Security Group Creation #####
* Follow the similar flow above, till execute
* Use the --force-temp-security-group
* Use the `--force-temp-security-group` flag
```shell
./osd-network-verifier egress \
--subnet-id <subnet_id> \
--force-temp-security-group true \
--force-temp-security-group \
--security-group-ids=<securityGroupID-1, ..., securityGroupID-N> # To add extra security Groups in addtion to the temporary one.
```
##### 1.1.2 Go implementation Examples #####
- [AWS Go SDK v1](../../examples/aws/verify_egressv1.go)
- [AWS Go SDK v2](../../examples/aws/verify_egressv2.go)
- [Verify Egress Example](../../examples/aws/verify_egress.go)
#### 1.2 Interpreting Output ###
(TODO: add errors)
#### 1.3 Workflow ####
Pictorial representation of workflow of the egress test tool:
Pictorial representation of the egress test tool workflow:
![egress](https://user-images.githubusercontent.com/87340776/168323176-af0c8a37-2bdc-4747-82f0-f464970d5373.jpg)
Description:
The AWS client creates a test EC2 instance in the target VPC/subnet and waits until the instance is ready.
The actual network verification is automated
by using the `USERDATA` param [available for ec2 instances](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/user-data.html)
which is run by ec2 on the instance on creation.
1. AWS client creates a test ec2 instance in the target vpc/subnet and wait till the instance gets ready
2. The actual network verification is automated by using the `USERDATA` param [available for ec2 instances](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/user-data.html) which is run by ec2 on the instance on creation.
3. The [`USERDATA`](../../pkg/helpers/config/userdata.yaml) script is in the form of base64-encoded text, and does the following -
The instance's console output is redirected to the AWS cloud client SDK.
The active probe then parses this output before the verifier prints it to the user's terminal.
If debug logging is enabled, the verifier prints this output is printed in full; otherwise it only prints errors.
1. installs docker
2. runs [validator's docker image](https://gitlab.cee.redhat.com/service/osd-network-verifier-golden-ami/-/blob/master/build/bin/network-validator.go). Firstly, the image of the validator is tried to be pulled. If it fails, then the docker image baked into the AMI is used.
(The image is also published at: https://quay.io/repository/app-sre/osd-network-verifier)
3. The entry point of the osd-network-verifier docker image then executes the main egress verification script
```shell
network-validator --timeout=2s --config=config/config.yaml
```
- **This entrypoint is where the actual egress endpoint verification is performed.** `build/bin/network-validator.go` makes `curl` requests to each other endpoint in the [egress list](../../README.md#egress-list) (i.e. list of all essential domains for OSD clusters).
- During development, the verifier docker image can be tested locally as:
```shell
docker run --env "AWS_REGION=us-east-1" quay.io/app-sre/osd-network-verifier:latest --timeout=2s
```

4. `USERDATA` script then redirects the instance's console output to the AWS cloud client SDK. The end of this output message is signified with a special End Verification string.
5. If debug logging is enabled, this output is printed in full, otherwise only errors are printed, if any.
---
**NOTE**
For more information on probes, see [the README](../../README.md#probes).
---
### 2. VPC DNS Verification ###
#### 2.1 Usage ####
Expand Down

0 comments on commit 060ba01

Please sign in to comment.