[testbed] Add batcher performance tests #36206

swiatekm · 2024-11-05T12:13:37Z

Description

Add basic batching benchmarks. The primary intent of these is to help verify that we're not introducing performance regressions with open-telemetry/opentelemetry-collector#8122.

I've taken the the 10k DPS benchmark for logs, and ran it in different configurations:

Batching either enabled or disabled
In-memory queue enabled or disabled
Using the batch processor at the start of the pipeline or the new exporter batcher
All of this either with no processors or with some basic filtering and transformation

I've reduced the input batch size to 10 to better capture the effect of having no batching at the start of the pipeline on processor performance, which is one of the concerns with moving batching to the exporter.

For now, I'd like to get some comments on whether this is sufficient, or if I should expand my scope to other signal types and/or different parameters for the benchmarks.

Current results

Test	Result	Duration	CPU Avg%	CPU Max%	RAM Avg MiB	RAM Max MiB	Sent Items	Received Items
Log10kDPSNoProcessors/No_batching,_no_queue	PASS	15s	20.7	22.7	73	103	150100	150100
Log10kDPSNoProcessors/No_batching,_queue	PASS	15s	19.1	21.7	73	103	150100	150100
Log10kDPSNoProcessors/Batch_size_1000_with_batch_processor,_no_queue	PASS	15s	13.1	14.3	77	109	150100	150100
Log10kDPSNoProcessors/Batch_size_1000_with_batch_processor,_queue	PASS	15s	12.7	14.0	75	108	150100	150100
Log10kDPSNoProcessors/Batch_size_1000_with_exporter_batcher,_no_queue	PASS	15s	15.5	17.3	72	101	150100	150100
Log10kDPSNoProcessors/Batch_size_1000_with_exporter_batcher,_queue	PASS	15s	14.3	15.7	72	102	150100	150100
Log10kDPSWithProcessors/No_batching,_no_queue	PASS	15s	21.5	23.0	72	103	150100	150100
Log10kDPSWithProcessors/No_batching,_queue	PASS	15s	22.2	23.0	72	102	150100	150100
Log10kDPSWithProcessors/Batch_size_1000_with_batch_processor,_no_queue	PASS	15s	22.5	26.0	75	107	150100	150100
Log10kDPSWithProcessors/Batch_size_1000_with_batch_processor,_queue	PASS	15s	18.0	19.7	73	104	150100	150100
Log10kDPSWithProcessors/Batch_size_1000_with_exporter_batcher,_no_queue	PASS	16s	18.3	20.7	75	106	150100	150100
Log10kDPSWithProcessors/Batch_size_1000_with_exporter_batcher,_queue	PASS	15s	17.1	17.7	73	102	150100	150100

It looks like the new batcher is a bit less performant if the pipeline doesn't contain any processors, but is in fact faster if processors are present, which is surprising to me. But this does assuage our fears that we'd tank processor performance by moving batching to the end of the pipeline.

Link to tracking issue

Fixes open-telemetry/opentelemetry-collector#10836

swiatekm · 2024-11-05T12:16:10Z

@dmitryax @sfc-gh-sili I'd love some feedback as to whether this is useful for your purposes, and if I should broaden the scope.

jmacd · 2024-11-07T17:21:16Z

By the way, I'm not convinced that the testbed is appropriately sophisticated to measure and describe the differences between the two batchers. The amount of work being done is the same, so we expect the same amount of compute resource. There are still a few salient differences between the two forms of batching that would not be teased apart by this benchmark, for example the way that exporter-batching has to serialize multiple requests and can't benefit from the queue for concurrency when batches are reduced in size. The effect is on individual request latency, which the testbed doesn't measure.

I'm working on an RFC to describe ideal batching behavior, and there are performance arguments you could test here. When there are processors that do CPU-intensive work, there is likely to be an positive impact from the batch processor because larger batches will perform better due to CPU and memory caches. I.e., if there is compute-intensive work, the batch processor is likely to lead to an optimization because we will invoke the subsequent processors fewer times w/ the same amount of data.

I also claim we'll never get rid of the batch processor and should fix it. As a shining example, the groupbyattr processor absolutely benefits from the batch processor for the reason stated above (in addition to other benefits). https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/processor/groupbyattrsprocessor

swiatekm · 2024-11-10T15:34:32Z

I'm working on an RFC to describe ideal batching behavior, and there are performance arguments you could test here. When there are processors that do CPU-intensive work, there is likely to be an positive impact from the batch processor because larger batches will perform better due to CPU and memory caches. I.e., if there is compute-intensive work, the batch processor is likely to lead to an optimization because we will invoke the subsequent processors fewer times w/ the same amount of data.

@jmacd any idea on how I could simulate a CPU-heavy processor here? The processors I did add to my benchmark were intended to simulate the average use case, but I'd be happy to have a more skewed benchmark there as well.

atoulme · 2024-11-12T16:53:14Z

Please rebase, fix conflicts and add changelog.

sfc-gh-sili · 2024-11-22T23:36:13Z

@swiatekm Thank you, Mikołaj! This is great.
Here are a couple of things that I'd be curious to know:

Sanity check that batching in exporter is about the same resource consumption as batching in processor, which this PR does already.
Like Joshua mentioned, I am too curious if not batching earlier causes higher resource usage for more expensive processing down in the pipeline.
(3. I'd be curious how size of the goroutine pool used by exporter would affects the performance. seems it's not something we've looked into so far)

github-actions bot added cmd/oteltestbedcol testbed labels Nov 5, 2024

swiatekm force-pushed the test/testbed-batcher branch from 35c4618 to 89f3353 Compare November 5, 2024 14:07

jmacd approved these changes Nov 6, 2024

View reviewed changes

swiatekm force-pushed the test/testbed-batcher branch from 89f3353 to ace990a Compare November 13, 2024 14:00

swiatekm marked this pull request as ready for review November 13, 2024 14:00

swiatekm requested a review from a team as a code owner November 13, 2024 14:00

swiatekm requested a review from ChrsMark November 13, 2024 14:00

github-actions bot assigned MovieStoreGuy Nov 13, 2024

swiatekm force-pushed the test/testbed-batcher branch 2 times, most recently from c57c82c to e6a49ca Compare November 13, 2024 15:04

MovieStoreGuy approved these changes Nov 14, 2024

View reviewed changes

[testbed] Add batcher performance tests

7061eeb

swiatekm force-pushed the test/testbed-batcher branch from e6a49ca to 7061eeb Compare November 14, 2024 10:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[testbed] Add batcher performance tests #36206

[testbed] Add batcher performance tests #36206

swiatekm commented Nov 5, 2024

swiatekm commented Nov 5, 2024

jmacd commented Nov 7, 2024 •

edited

Loading

swiatekm commented Nov 10, 2024

atoulme commented Nov 12, 2024 •

edited

Loading

sfc-gh-sili commented Nov 22, 2024

[testbed] Add batcher performance tests #36206

Are you sure you want to change the base?

[testbed] Add batcher performance tests #36206

Conversation

swiatekm commented Nov 5, 2024

Description

Current results

Link to tracking issue

swiatekm commented Nov 5, 2024

jmacd commented Nov 7, 2024 • edited Loading

swiatekm commented Nov 10, 2024

atoulme commented Nov 12, 2024 • edited Loading

sfc-gh-sili commented Nov 22, 2024

jmacd commented Nov 7, 2024 •

edited

Loading

atoulme commented Nov 12, 2024 •

edited

Loading