Support natively processing s3 events with lambda deployments #478

mallikvala · 2020-07-20T17:10:30Z

Hey @Jeffail , this can probably be an enhancement but any suggestions on how to trigger an s3 input in lambda in a scenario where a lambda is triggered during a putObject operation on an s3 bucket.

currently, when invoked a single part is created with the event and processor chain is invoked without any input. the event as such wont make much sense until the file is downloaded. looking for suggestions.

The text was updated successfully, but these errors were encountered:

Jeffail · 2020-07-21T20:52:02Z

Hey @mallikvala, if I understand your goals correctly you can trigger a download from the event by using a cache processor with an S3 cache resource. It would look something like this:

pipeline:
  processors:
  - cache:
      cache: bucket
      operator: get
      key: ${! json("path.to.s3.item") }
      
resources:
  caches:
    bucket:
      s3:
        bucket: foo

My mind's a little fuzzy on what the event looks like that gets sent to the lambda function but if you can log it and post it here I can update this example.

Jeffail · 2020-07-21T20:52:40Z

Labelling this as documentation as it would be cool to have some examples like this in the lambda section of the docs.

mallikvala · 2020-07-22T14:26:55Z

@Jeffail wont using cache.get make the whole file a single message part rather than each line like how it happens when we use an input.file processor?

Jeffail · 2020-07-22T19:47:52Z

Hey @mallikvala, if you want to process the S3 files as line delimited messages then you can use the unarchive processor in order to cut the file into a batch, and then optionally follow it with a split processor if you want to dispatch the batch as individual messages:

pipeline:
  processors:
  - cache:
      cache: bucket
      operator: get
      key: ${! json("path.to.s3.item") }
  - unarchive:
      format: lines
  - split: {}
      
resources:
  caches:
    bucket:
      s3:
        bucket: foo

Another thing to keep in mind is that the cache processor might fail (if the key is not found, etc) and the message will continue through the pipeline, you can choose from a range of options for handling the errors: https://www.benthos.dev/docs/configuration/error_handling

Jeffail · 2020-11-09T18:39:42Z

I've repurposed this issue as an enhancement because using caches is a pretty weak user experience here and since this is a somewhat common pattern with AWS there ought to be a better way to do this, ideally with minimal effort on the configuration side. However, this still needs to be explicit in order to preserve backwards compatibility.

Jeffail added question documentation labels Jul 21, 2020

Jeffail added enhancement serverless Issues relating to serverless deployments of Benthos and removed documentation question labels Nov 9, 2020

Jeffail changed the title ~~Question - Download and process file in S3 event in lambda~~ Support natively processing s3 events with lambda deployments Nov 9, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support natively processing s3 events with lambda deployments #478

Support natively processing s3 events with lambda deployments #478

mallikvala commented Jul 20, 2020

Jeffail commented Jul 21, 2020

Jeffail commented Jul 21, 2020

mallikvala commented Jul 22, 2020

Jeffail commented Jul 22, 2020

Jeffail commented Nov 9, 2020

Support natively processing s3 events with lambda deployments #478

Support natively processing s3 events with lambda deployments #478

Comments

mallikvala commented Jul 20, 2020

Jeffail commented Jul 21, 2020

Jeffail commented Jul 21, 2020

mallikvala commented Jul 22, 2020

Jeffail commented Jul 22, 2020

Jeffail commented Nov 9, 2020