Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support natively processing s3 events with lambda deployments #478

Open
mallikvala opened this issue Jul 20, 2020 · 5 comments
Open

Support natively processing s3 events with lambda deployments #478

mallikvala opened this issue Jul 20, 2020 · 5 comments
Labels
enhancement serverless Issues relating to serverless deployments of Benthos

Comments

@mallikvala
Copy link

Hey @Jeffail , this can probably be an enhancement but any suggestions on how to trigger an s3 input in lambda in a scenario where a lambda is triggered during a putObject operation on an s3 bucket.

currently, when invoked a single part is created with the event and processor chain is invoked without any input. the event as such wont make much sense until the file is downloaded. looking for suggestions.

@Jeffail
Copy link
Collaborator

Jeffail commented Jul 21, 2020

Hey @mallikvala, if I understand your goals correctly you can trigger a download from the event by using a cache processor with an S3 cache resource. It would look something like this:

pipeline:
  processors:
  - cache:
      cache: bucket
      operator: get
      key: ${! json("path.to.s3.item") }
      
resources:
  caches:
    bucket:
      s3:
        bucket: foo

My mind's a little fuzzy on what the event looks like that gets sent to the lambda function but if you can log it and post it here I can update this example.

@Jeffail
Copy link
Collaborator

Jeffail commented Jul 21, 2020

Labelling this as documentation as it would be cool to have some examples like this in the lambda section of the docs.

@mallikvala
Copy link
Author

@Jeffail wont using cache.get make the whole file a single message part rather than each line like how it happens when we use an input.file processor?

@Jeffail
Copy link
Collaborator

Jeffail commented Jul 22, 2020

Hey @mallikvala, if you want to process the S3 files as line delimited messages then you can use the unarchive processor in order to cut the file into a batch, and then optionally follow it with a split processor if you want to dispatch the batch as individual messages:

pipeline:
  processors:
  - cache:
      cache: bucket
      operator: get
      key: ${! json("path.to.s3.item") }
  - unarchive:
      format: lines
  - split: {}
      
resources:
  caches:
    bucket:
      s3:
        bucket: foo

Another thing to keep in mind is that the cache processor might fail (if the key is not found, etc) and the message will continue through the pipeline, you can choose from a range of options for handling the errors: https://www.benthos.dev/docs/configuration/error_handling

@Jeffail Jeffail added enhancement serverless Issues relating to serverless deployments of Benthos and removed documentation question labels Nov 9, 2020
@Jeffail Jeffail changed the title Question - Download and process file in S3 event in lambda Support natively processing s3 events with lambda deployments Nov 9, 2020
@Jeffail
Copy link
Collaborator

Jeffail commented Nov 9, 2020

I've repurposed this issue as an enhancement because using caches is a pretty weak user experience here and since this is a somewhat common pattern with AWS there ought to be a better way to do this, ideally with minimal effort on the configuration side. However, this still needs to be explicit in order to preserve backwards compatibility.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement serverless Issues relating to serverless deployments of Benthos
Projects
None yet
Development

No branches or pull requests

2 participants