Skip to content

Commit

Permalink
Merge branch 'develop' into release-9.0.1
Browse files Browse the repository at this point in the history
  • Loading branch information
rizbihassan committed Nov 15, 2023
2 parents 15f2532 + aa6d836 commit 15cee8c
Show file tree
Hide file tree
Showing 4 changed files with 115 additions and 42 deletions.
19 changes: 8 additions & 11 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,8 @@ and includes an additional section for migration notes.

### Changed

- *ORCA-780* Updated ORCA "Deployment with Cumulus" documentation with instructions and examples to run ORCA recovery and archive workflows.

- *ORCA-778* Upgraded Docusaurus to version 2.4.3 to fix snyk vulnerabilities and security issues.

- *ORCA-765* Updated ORCA "Creating the Glacier Bucket" documentation with instructions to deploy ORCA DR buckets using cloudformation.
Expand Down Expand Up @@ -70,7 +72,7 @@ and includes an additional section for migration notes.
- *ORCA-709* Updated terraform AWS provider to version 5. This is to support Cumulus and CIRRUS changes.
- *ORCA-714* Fixed new deployment errors with API Gateway by adding an IAM policy and tying it to the GW.
- *ORCA-716* Fixed Deployment issues with GraphQL tasks by adding permission and health check.
- *ORCA-726* Updated Docusarus and Node version to latest LTS releases to fix security issues.
- *ORCA-726* Updated Docusaurus and Node version to latest LTS releases to fix security issues.

### Migration Notes

Expand All @@ -81,7 +83,8 @@ and includes an additional section for migration notes.
- `collectionId` properties have been added to [Recovery Jobs](https://nasa.github.io/cumulus-orca/docs/developer/api/orca-api#recovery-jobs-api) and [Recovery Granules](https://nasa.github.io/cumulus-orca/docs/developer/api/orca-api#recovery-granules-api) API.
- For Recovery Jobs, it is only added to [output](https://nasa.github.io/cumulus-orca/docs/developer/api/orca-api#recovery-jobs-api-output).
- For Recovery Granules, it is now required on [input](https://nasa.github.io/cumulus-orca/docs/developer/api/orca-api#recovery-granules-api-input) and will be returned on [output](https://nasa.github.io/cumulus-orca/docs/developer/api/orca-api#recovery-granules-api-output).
- Update the `orca.tf` file to include `aws_region`. See example below.
- Update the `orca.tf` file to include `aws_region`.
See example below.
```terraform
## ORCA Module
## =============================================================================
Expand Down Expand Up @@ -174,16 +177,10 @@ and includes an additional section for migration notes.

### Migration Notes

- Remove the `workflow_config` variable from `orca.tf` otherwise terraform deployment will throw an error.
- The output format of `copy_to_archive` lambda and step-function has been simplified. If accessing these resources outside of a Cumulus perspective, instead of accessing `output["payload"]["granules"]` you now use `output["granules"]`.
- Cumulus is not currently compatible with the changes to copy_to_archive.
- This section will be updated when a compatible version is created.
- deployment-with-cumulus.md will also be updated.
- copy_to_archive_adapter/README.md will also be updated.
- restore-to-orca.mdx will also be updated.
- Cumulus is not currently compatible with the changes to the Recovery Workflow step-function.
- This section will be updated when a compatible version is created.
- deployment-with-cumulus.md will also be updated.
- orca_recovery_adapter/README.md will also be updated.
- Due to Cumulus-ORCA decoupling efforts, users will now need to update the existing `CopyToArchive` workflow configuration to point to Cumulus [copy_to_archive_adapter lambda](https://github.com/nasa/cumulus/tree/master/tasks/orca-copy-to-archive-adapter) which then runs our `copy_to_archive` lambda. See [deployment documentation](https://nasa.github.io/cumulus-orca/docs/developer/deployment-guide/deployment-with-cumulus#add-the-copytoarchive-step-to-an-ingest-workflow) for details.
- Due to Cumulus-ORCA decoupling efforts, users will now need to deploy a `recovery_workflow_adapter` workflow that triggers the Cumulus`recovery_adapter` lambda which then runs our existing orca recovery workflow. See [deployment documentation](https://nasa.github.io/cumulus-orca/docs/developer/deployment-guide/deployment-with-cumulus#modify-the-recovery-workflow) for details.
- Update the bucket policy for your `system-bucket` to allow load balancer to post server access logs to the bucket. See the instructions [here](https://nasa.github.io/cumulus-orca/docs/developer/deployment-guide/deployment-s3-bucket#bucket-policy-for-load-balancer-server-access-loging).
- InternalReconcileReport [Phantom](https://nasa.github.io/cumulus-orca/docs/developer/api/orca-api#internal-reconcile-report-phantom-api) and [Mismatch](https://nasa.github.io/cumulus-orca/docs/developer/api/orca-api#internal-reconcile-report-mismatch-api) reports are now available via GraphQL.
- API Gateway access is now deprecated, and will be removed in a future update.
Expand Down
55 changes: 51 additions & 4 deletions tasks/copy_to_archive_adapter/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,11 +4,58 @@ Visit the [Developer Guide](https://nasa.github.io/cumulus-orca/docs/developer/d

## Description

The `copy_to_archive` module is meant to be deployed as a lambda function that takes a Cumulus message, extracts a list of files, and copies those files from their current storage location into an ORCA archive bucket.
It also sends additional metadata attributes to metadata SQS queue needed for Cumulus reconciliation.
::important
This adapter was created as a proof of concept and should not be used by end users. Use [Cumulus copy_to_archive_adapter](https://github.com/nasa/cumulus/tree/master/tasks/orca-copy-to-archive-adapter) instead.
:::

This lambda calls copy_to_archive synchronously, returning results and raising errors as appropriate.
This provides an injection seam to contact the ORCA managed copy_to_archive lambda with ORCA's formatting.
Since ORCA is decoupling from Cumulus starting in ORCA v8.0, users will now run the same [ORCA `copy_to_archive` workflow](https://github.com/nasa/cumulus-orca/tree/master/modules/workflows/OrcaCopyToArchiveWorkflow) but must need to update the existing workflow configuration to point to Cumulus[copy_to_archive_adapter lambda](https://github.com/nasa/cumulus/tree/master/tasks/orca-copy-to-archive-adapter) which then runs our existing `copy_to_archive` lambda.

:::note
Make sure to replace `<CUMULUS_COPY_TO_ARCHIVE_ADAPTER_ARN>` under `Resource` property below. See [cumulus terraform modules](https://github.com/nasa/cumulus/blob/master/tf-modules/cumulus/outputs.tf#L86) for additional details on how to add this.
:::

```json
"CopyToArchive":{
"Parameters":{
"cma":{
"event.$":"$",
"task_config": {
"excludedFileExtensions": "{$.meta.collection.meta.orca.excludedFileExtensions}",
"s3MultipartChunksizeMb": "{$.meta.collection.meta.s3MultipartChunksizeMb}",
"providerId": "{$.meta.provider.id}",
"providerName": "{$.meta.provider.name}",
"executionId": "{$.cumulus_meta.execution_name}",
"collectionShortname": "{$.meta.collection.name}",
"collectionVersion": "{$.meta.collection.version}",
"defaultBucketOverride": "{$.meta.collection.meta.orca.defaultBucketOverride}"
}
}
}
},
"Type":"Task",
"Resource":"<CUMULUS_COPY_TO_ARCHIVE_ADAPTER_ARN>",
"Catch":[
{
"ErrorEquals":[
"States.ALL"
],
"ResultPath":"$.exception",
"Next":"WorkflowFailed"
}
],
"Retry": [
{
"ErrorEquals": [
"States.ALL"
],
"IntervalSeconds": 2,
"MaxAttempts": 3,
"BackoffRate": 2
}
],
"Next":"WorkflowSucceeded"
},
```

## Exclude files by extension.

Expand Down
7 changes: 3 additions & 4 deletions tasks/orca_recovery_adapter/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,11 +4,10 @@ Visit the [Developer Guide](https://nasa.github.io/cumulus-orca/docs/developer/d

## Description

The `orca_recovery_adapter` module is meant to be deployed as a lambda function that takes a Cumulus message,
extracts a list of files, and initiates recovery of those files from their ORCA archive location.
::important
This adapter was created as a proof of concept and should not be used by end users. Use [Cumulus recovery_adapter](https://github.com/nasa/cumulus/tree/master/tasks/orca-recovery-adapter) instead.
:::

This lambda calls the ORCA recovery step-function, returning results and raising errors as appropriate.
This provides an injection seam to contact the ORCA recovery step-function with ORCA's formatting.

## Build

Expand Down
76 changes: 53 additions & 23 deletions website/docs/developer/deployment-guide/deployment-with-cumulus.md
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,7 @@ If you wish to deploy code cloned locally from [Github](https://github.com/nasa/
## ORCA Module
## =============================================================================
module "orca" {
source = "https://github.com/nasa/cumulus-orca/releases/download/v3.0.2/cumulus-orca-terraform.zip"
source = "https://github.com/nasa/cumulus-orca/releases/download/v9.0.0/cumulus-orca-terraform.zip"
## --------------------------
## Cumulus Variables
## --------------------------
Expand All @@ -73,7 +73,6 @@ module "orca" {
prefix = var.prefix
system_bucket = var.system_bucket
vpc_id = var.vpc_id
workflow_config = module.cumulus.workflow_config
## OPTIONAL
tags = var.tags
Expand All @@ -89,6 +88,8 @@ module "orca" {
orca_default_bucket = var.orca_default_bucket
orca_reports_bucket_name = var.orca_reports_bucket_name
rds_security_group_id = var.rds_security_group_id
s3_access_key            = var.s3_access_key
s3_secret_key            = var.s3_secret_key
## OPTIONAL
# archive_recovery_queue_message_retention_time_seconds = 777600
Expand Down Expand Up @@ -136,6 +137,8 @@ optional variables can be found in the [variables section](#orca-variables).
- db_host_endpoint
- rds_security_group_id
- dlq_subscription_email
- s3_access_key
- s3_secret_key

#### Required Values Retrieved from Cumulus Variables

Expand All @@ -158,16 +161,6 @@ deployment.

:::

#### Required Values Retrieved from Other Modules

The following variables are set by retrieving output from other modules. This is
done so that the user does not have to lookup and set these variables after a
deployment. More information about these variables can be found in the
[Cumulus variable definitions](https://github.com/nasa/cumulus/blob/master/tf-modules/cumulus/variables.tf).

- workflow_config - Retrieved from the cumulus module in `main.tf`.


### Creating `cumulus-tf/orca_variables.tf`

In the `cumulus-tf` directory create the `orca_variables.tf` file. Copy the
Expand Down Expand Up @@ -211,8 +204,17 @@ variable "rds_security_group_id" {
type = string
description = "Cumulus' RDS Security Group's ID."
}
```
variable "s3_access_key" {
type = string
description = "AWS Access key for communicating with Orca S3 buckets."
}
variable "s3_secret_key" {
type = string
description = "AWS Secret key for communicating with Orca S3 buckets."
}
```

### Modifying `cumulus-tf/terraform.tfvars`

Expand Down Expand Up @@ -256,6 +258,13 @@ rds_security_group_id = "sg-01234567890123456"
## Dead letter queue SNS topic subscription email.
dlq_subscription_email = "test@email.com"
## AWS access key to connect to AWS account.
s3_access_key = "xxxxxxxxxxxxxxxxxxx"
## AWS secret access key to connect to AWS account
s3_secret_key = "xxxxxxxxxxxx/xxxxx/xxxxxxxxxxxxxxxxxxxxx"
```

Below describes the type of value expected for each variable.
Expand All @@ -269,6 +278,8 @@ Below describes the type of value expected for each variable.
* `orca_default_bucket` (string) - Default S3 archive bucket to use for ORCA data.
* `orca_reports_bucket_name` (string) - The name of the bucket to store s3 inventory reports.
* `rds_security_group_id`(string) - Cumulus' RDS Security Group's ID. Output as `security_group_id` from the rds-cluster deployment.
* `s3_access_key` (string) - AWS Access key for communicating with Orca S3 buckets.
* `s3_secret_key`(string) - AWS Secret key for communicating with Orca S3 buckets.

Additional variable definitions can be found in the [ORCA variables](#orca-variables)
section of the document.
Expand Down Expand Up @@ -393,6 +404,11 @@ the ingest workflow.

:::

Since ORCA is decoupling from Cumulus starting in ORCA v8.0, users will now run the same [ORCA `copy_to_archive` workflow](https://github.com/nasa/cumulus-orca/tree/master/modules/workflows/OrcaCopyToArchiveWorkflow) but must need to update the existing workflow configuration to point to [copy_to_archive_adapter lambda](https://github.com/nasa/cumulus/tree/master/tasks/orca-copy-to-archive-adapter) (owned by Cumulus) which then runs our existing `copy_to_archive` lambda.

:::note
Make sure to replace `<CUMULUS_COPY_TO_ARCHIVE_ADAPTER_ARN>` under `Resource` property below. See [cumulus terraform modules](https://github.com/nasa/cumulus/blob/master/tf-modules/cumulus/outputs.tf#L86) for additional details on how to add this.
:::

```json
"CopyToArchive":{
Expand All @@ -413,7 +429,7 @@ the ingest workflow.
}
},
"Type":"Task",
"Resource":"module.orca.orca_lambda_copy_to_archive_arn",
"Resource":"<CUMULUS_COPY_TO_ARCHIVE_ADAPTER_ARN>",
"Catch":[
{
"ErrorEquals":[
Expand All @@ -436,15 +452,31 @@ the ingest workflow.
"Next":"WorkflowSucceeded"
},
```
See the copy_to_archive json schema [configuration file](https://github.com/nasa/cumulus-orca/blob/master/tasks/copy_to_archive/schemas/config.json), [input file](https://github.com/nasa/cumulus-orca/blob/master/tasks/copy_to_archive/schemas/input.json) and [output file](https://github.com/nasa/cumulus-orca/blob/master/tasks/copy_to_archive/schemas/output.json) for more information.

### Modify the Recovery Workflow (*OPTIONAL*)
As part of the [Cumulus Message Adapter configuration](https://nasa.github.io/cumulus/docs/workflows/input_output#cma-configuration)
for `copy_to_archive`, the `excludedFileExtensions`, `s3MultipartChunksizeMb`, `providerId`, `executionId`, `collectionShortname`, `collectionVersion`, `defaultBucketOverride`, and `defaultStorageClassOverride` keys must be present under the
`task_config` object as seen above.
Per the [config schema](https://github.com/nasa/cumulus/blob/master/tasks/orca-copy-to-archive-adapter/schemas/config.json),
the values of the keys are used the following ways.
The `provider` key should contain an `id` key that returns the provider id from Cumulus.
The `cumulus_meta` key should contain an `execution_name` key that returns the step function execution ID from AWS.
The `collection` key value should contain a `name` key and a `version` key that return the required collection shortname and collection version from Cumulus respectively.
The `collection` key value should also contain a `meta` key that includes an `orca` key having an optional `excludedFileExtensions` key that is used to determine file patterns that should not be
sent to ORCA. In addition, the `orca` key also contains optional `defaultBucketOverride` key that overrides the `ORCA_DEFAULT_BUCKET` set on deployment and optional `defaultStorageClassOverride` key that overrides the storage class to use when storing files in Orca.
The optional `s3MultipartChunksizeMb` is used to override the default setting for the lambda s3 copy maximum multipart chunk size value when copying large files to ORCA.
These settings can often be derived from the collection configuration in Cumulus.
See the copy_to_archive_adapter json schema [configuration file](https://github.com/nasa/cumulus/blob/master/tasks/orca-copy-to-archive-adapter/schemas/config.json), [input file](https://github.com/nasa/cumulus/blob/master/tasks/orca-copy-to-archive-adapter/schemas/input.json) and [output file](https://github.com/nasa/cumulus/blob/master/tasks/orca-copy-to-archive-adapter/schemas/output.json) for more information.

### Modify the Recovery Workflow

It is not recommended to modify the ORCA Recovery Workflow. The workflow JSON
file is located in the `modules/workflows/OrcaRecoveryWorkflow` of the repository.
The workflow file name is `orca_recover_workflow.asl.json`. To change the
behavior of the workflow, it is recommended to modify or replace the
`copy_from_archive` lambda.
Since ORCA is decoupling from Cumulus starting in ORCA v8.0, users will now need to deploy a `recovery_workflow_adapter` workflow that triggers [the recovery_adapter lambda](https://github.com/nasa/cumulus/tree/master/tasks/orca-recovery-adapter) (owned by Cumulus) which then runs our existing orca recovery workflow.
As part of the [Cumulus Message Adapter configuration](https://nasa.github.io/cumulus/docs/workflows/input_output/#cma-configuration), several properties must be passed into the adapter lambda. See [input and config schemas](https://github.com/nasa/cumulus/tree/master/tasks/orca-recovery-adapter/schemas) for more information.

Here is an example of a [recovery adapter workflow step function definition](https://github.com/nasa/cumulus/blob/master/example/cumulus-tf/orca_recovery_adapter_workflow.asl.json) and the [terraform code](https://github.com/nasa/cumulus/blob/master/example/cumulus-tf/orca_recovery_adapter_workflow.tf) provided by Cumulus that can be used to deploy the step function workflow in AWS. Once deployed, you can run that workflow to test ORCA recovery.

:::note
Users should reach out to Cumulus team if they want to automate this adapter workflow in Cumulus deployment since Cumulus owns the adapter lambdas.
:::

### Workflow Failures

Expand Down Expand Up @@ -476,8 +508,6 @@ file. The variables must be set with proper values for your environment in the
| `prefix` | Prefix that will be pre-pended to resource names created by terraform. | "daac-sndbx" |
| `system_bucket` | Cumulus system bucket used to store internal files and configurations for deployments. | "PREFIX-internal" |
| `vpc_id` | ID of VPC to place resources in - recommended that this be a private VPC (or at least one with restricted access). | "vpc-abc123456789" |
| `workflow_config` | Configuration object with ARNs for workflow integration (Role ARN for executing workflows and Lambda ARNs to trigger on workflow execution). | module.cumulus.workflow_config |


#### ORCA Required Variables

Expand Down

0 comments on commit 15cee8c

Please sign in to comment.