Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[testbed] Add batcher performance tests #36206

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

swiatekm
Copy link
Contributor

@swiatekm swiatekm commented Nov 5, 2024

Description

Add basic batching benchmarks. The primary intent of these is to help verify that we're not introducing performance regressions with open-telemetry/opentelemetry-collector#8122.

I've taken the the 10k DPS benchmark for logs, and ran it in different configurations:

  • Batching either enabled or disabled
  • In-memory queue enabled or disabled
  • Using the batch processor at the start of the pipeline or the new exporter batcher
  • All of this either with no processors or with some basic filtering and transformation

I've reduced the input batch size to 10 to better capture the effect of having no batching at the start of the pipeline on processor performance, which is one of the concerns with moving batching to the exporter.

For now, I'd like to get some comments on whether this is sufficient, or if I should expand my scope to other signal types and/or different parameters for the benchmarks.

Current results

Test Result Duration CPU Avg% CPU Max% RAM Avg MiB RAM Max MiB Sent Items Received Items
Log10kDPSNoProcessors/No_batching,_no_queue PASS 15s 20.7 22.7 73 103 150100 150100
Log10kDPSNoProcessors/No_batching,_queue PASS 15s 19.1 21.7 73 103 150100 150100
Log10kDPSNoProcessors/Batch_size_1000_with_batch_processor,_no_queue PASS 15s 13.1 14.3 77 109 150100 150100
Log10kDPSNoProcessors/Batch_size_1000_with_batch_processor,_queue PASS 15s 12.7 14.0 75 108 150100 150100
Log10kDPSNoProcessors/Batch_size_1000_with_exporter_batcher,_no_queue PASS 15s 15.5 17.3 72 101 150100 150100
Log10kDPSNoProcessors/Batch_size_1000_with_exporter_batcher,_queue PASS 15s 14.3 15.7 72 102 150100 150100
Log10kDPSWithProcessors/No_batching,_no_queue PASS 15s 21.5 23.0 72 103 150100 150100
Log10kDPSWithProcessors/No_batching,_queue PASS 15s 22.2 23.0 72 102 150100 150100
Log10kDPSWithProcessors/Batch_size_1000_with_batch_processor,_no_queue PASS 15s 22.5 26.0 75 107 150100 150100
Log10kDPSWithProcessors/Batch_size_1000_with_batch_processor,_queue PASS 15s 18.0 19.7 73 104 150100 150100
Log10kDPSWithProcessors/Batch_size_1000_with_exporter_batcher,_no_queue PASS 16s 18.3 20.7 75 106 150100 150100
Log10kDPSWithProcessors/Batch_size_1000_with_exporter_batcher,_queue PASS 15s 17.1 17.7 73 102 150100 150100

It looks like the new batcher is a bit less performant if the pipeline doesn't contain any processors, but is in fact faster if processors are present, which is surprising to me. But this does assuage our fears that we'd tank processor performance by moving batching to the end of the pipeline.

Link to tracking issue

Fixes open-telemetry/opentelemetry-collector#10836

@swiatekm
Copy link
Contributor Author

swiatekm commented Nov 5, 2024

@dmitryax @sfc-gh-sili I'd love some feedback as to whether this is useful for your purposes, and if I should broaden the scope.

@jmacd
Copy link
Contributor

jmacd commented Nov 7, 2024

By the way, I'm not convinced that the testbed is appropriately sophisticated to measure and describe the differences between the two batchers. The amount of work being done is the same, so we expect the same amount of compute resource. There are still a few salient differences between the two forms of batching that would not be teased apart by this benchmark, for example the way that exporter-batching has to serialize multiple requests and can't benefit from the queue for concurrency when batches are reduced in size. The effect is on individual request latency, which the testbed doesn't measure.

I'm working on an RFC to describe ideal batching behavior, and there are performance arguments you could test here. When there are processors that do CPU-intensive work, there is likely to be an positive impact from the batch processor because larger batches will perform better due to CPU and memory caches. I.e., if there is compute-intensive work, the batch processor is likely to lead to an optimization because we will invoke the subsequent processors fewer times w/ the same amount of data.

I also claim we'll never get rid of the batch processor and should fix it. As a shining example, the groupbyattr processor absolutely benefits from the batch processor for the reason stated above (in addition to other benefits). https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/processor/groupbyattrsprocessor

@swiatekm
Copy link
Contributor Author

I'm working on an RFC to describe ideal batching behavior, and there are performance arguments you could test here. When there are processors that do CPU-intensive work, there is likely to be an positive impact from the batch processor because larger batches will perform better due to CPU and memory caches. I.e., if there is compute-intensive work, the batch processor is likely to lead to an optimization because we will invoke the subsequent processors fewer times w/ the same amount of data.

@jmacd any idea on how I could simulate a CPU-heavy processor here? The processors I did add to my benchmark were intended to simulate the average use case, but I'd be happy to have a more skewed benchmark there as well.

@atoulme
Copy link
Contributor

atoulme commented Nov 12, 2024

Please rebase, fix conflicts and add changelog.

@swiatekm swiatekm marked this pull request as ready for review November 13, 2024 14:00
@swiatekm swiatekm requested a review from a team as a code owner November 13, 2024 14:00
@swiatekm swiatekm force-pushed the test/testbed-batcher branch 2 times, most recently from c57c82c to e6a49ca Compare November 13, 2024 15:04
@sfc-gh-sili
Copy link

@swiatekm Thank you, Mikołaj! This is great.
Here are a couple of things that I'd be curious to know:

  1. Sanity check that batching in exporter is about the same resource consumption as batching in processor, which this PR does already.
  2. Like Joshua mentioned, I am too curious if not batching earlier causes higher resource usage for more expensive processing down in the pipeline.
    (3. I'd be curious how size of the goroutine pool used by exporter would affects the performance. seems it's not something we've looked into so far)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[exporterhelper] Run performance tests to compare exporter batching with the batcher processor
5 participants