You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The kafka and kinesis inputs do not support batching at the output level. This is currently covered in documentation but there's nothing stopping a user from missing it and falling into this hole.
Before the next major release (where output batching is more prominent and obvious) I need a solution. Two currently off the top of my head:
Allow output level batching by tracking offsets
Inputs that don't support output batching can add context to each message, the output batcher then detects this and refuses to batch it (with intermittent warning logs).
Tracking offsets would mean recording all offsets consumed and acked and allowing them to be processed out of sync. It's not impossible but it's a headache to ensure messages aren't duplicated during restarts (if a single message fails to propagate during shutdown then we reprocess any number of backlogged offsets).
The text was updated successfully, but these errors were encountered:
Jeffail
added
inputs
Any tasks or issues relating specifically to inputs
annoying
Benthos is mildly annoying but not quite a bug
labels
Dec 30, 2019
However, I'm not particularly happy with it as it could break functionality for users relying on in-order processing. I'd therefore need to at least expose it as a config field with a default being no out-of-order processing. That would position us squarely back where we started (by default we stall on output batching).
So the alternative is to expose the problem to users. I'm planning to update the Kafka, Kafka Balanced and Kinesis inputs to set a context field that the output batchers can spot, and when they do they will log (exactly one time per service execution) a message to explain the (potential) stalling. It is possible for these sequential inputs and output level batching to live in harmony so I don't want to be too aggressive with the messaging here.
The latest kafka input now supports checkpointing, if this trial goes well then I'll add it to kinesis inputs and we'll be close. Once this is done the documentation around batching and outputs in general should be redone as it'll be much simpler now.
The
kafka
andkinesis
inputs do not support batching at the output level. This is currently covered in documentation but there's nothing stopping a user from missing it and falling into this hole.Before the next major release (where output batching is more prominent and obvious) I need a solution. Two currently off the top of my head:
Tracking offsets would mean recording all offsets consumed and acked and allowing them to be processed out of sync. It's not impossible but it's a headache to ensure messages aren't duplicated during restarts (if a single message fails to propagate during shutdown then we reprocess any number of backlogged offsets).
The text was updated successfully, but these errors were encountered: