Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support output level batching with kafka/kinesis inputs #350

Open
Jeffail opened this issue Dec 30, 2019 · 2 comments
Open

Support output level batching with kafka/kinesis inputs #350

Jeffail opened this issue Dec 30, 2019 · 2 comments
Labels
annoying Benthos is mildly annoying but not quite a bug inputs Any tasks or issues relating specifically to inputs

Comments

@Jeffail
Copy link
Collaborator

Jeffail commented Dec 30, 2019

The kafka and kinesis inputs do not support batching at the output level. This is currently covered in documentation but there's nothing stopping a user from missing it and falling into this hole.

Before the next major release (where output batching is more prominent and obvious) I need a solution. Two currently off the top of my head:

  • Allow output level batching by tracking offsets
  • Inputs that don't support output batching can add context to each message, the output batcher then detects this and refuses to batch it (with intermittent warning logs).

Tracking offsets would mean recording all offsets consumed and acked and allowing them to be processed out of sync. It's not impossible but it's a headache to ensure messages aren't duplicated during restarts (if a single message fails to propagate during shutdown then we reprocess any number of backlogged offsets).

@Jeffail Jeffail added inputs Any tasks or issues relating specifically to inputs annoying Benthos is mildly annoying but not quite a bug labels Dec 30, 2019
@Jeffail
Copy link
Collaborator Author

Jeffail commented Jan 5, 2020

I have what could be a working implementation of offset tracking to allow out-of-order acks of a partition: https://github.com/Jeffail/benthos/tree/master/lib/util/checkpoint.

However, I'm not particularly happy with it as it could break functionality for users relying on in-order processing. I'd therefore need to at least expose it as a config field with a default being no out-of-order processing. That would position us squarely back where we started (by default we stall on output batching).

So the alternative is to expose the problem to users. I'm planning to update the Kafka, Kafka Balanced and Kinesis inputs to set a context field that the output batchers can spot, and when they do they will log (exactly one time per service execution) a message to explain the (potential) stalling. It is possible for these sequential inputs and output level batching to live in harmony so I don't want to be too aggressive with the messaging here.

@Jeffail
Copy link
Collaborator Author

Jeffail commented Nov 6, 2020

The latest kafka input now supports checkpointing, if this trial goes well then I'll add it to kinesis inputs and we'll be close. Once this is done the documentation around batching and outputs in general should be redone as it'll be much simpler now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
annoying Benthos is mildly annoying but not quite a bug inputs Any tasks or issues relating specifically to inputs
Projects
None yet
Development

No branches or pull requests

1 participant