Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(parser.prometheusremotewrite, serializer.prometheusremotewrite): Native histogram support end-to-end #16121

Open
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

Reimirno
Copy link

@Reimirno Reimirno commented Oct 31, 2024

Summary

Add a configuration field to prometheusremotewrite parser to change its behavior when parsing Prometheus native histogram.

Previously prometheusremotewrite parser parses a native histogram into multiple Telegraf metric (the conversion is not lossless and is irreversible); now it parses it into one single Telegraf metric. Correspondingly, add a code path to prometheusremotewrite serializer to be able to handle this single Telegraf "native histogram" metric and convert back to Prometheus native histogram. NOTE: logic that handles classic histogram is orthogonal and not changed.

This treatment of native histograms is able to preserve its benefits in terms of correctness guarantee (atomicity, no correctness issue due to write batching) and its performance gain (low cardinality, sparse data structure). More importantly, this means Telegraf can supports native histogram end-to-end: ingest a native histogram and write out a native histogram.

Detailed rationale, including a diagram, please see #16120

Test

Previously, using a prometheusremotewrite HTTP listener v2 input to accept prometheus remote write, and a prometheusremotewrite HTTP output, Telegraf ingests prometheus native histogram but writes out several counters, as if this were a classic histogram.

In the query here, a native histogram get translated into several "bucket" metrics (but without _bucket suffix and with a <metric_name>_le tag.

image

Now, after this PR:

This native histogram now gets correctly written out as a native histogram.

image

I can add unit test / integration test when the community reaches some agreement on this implementation.

Checklist

  • No AI generated code was used in this PR

Related issues

resolves #16120

@telegraf-tiger
Copy link
Contributor

Thanks so much for the pull request!
🤝 ✒️ Just a reminder that the CLA has not yet been signed, and we'll need it before merging. Please sign the CLA when you get a chance, then post a comment here saying !signed-cla

@Reimirno
Copy link
Author

!signed-cla

@Reimirno Reimirno changed the title Native histogram support PoC feat(prometheusremotewrite): Native histogram support PoC Oct 31, 2024
@telegraf-tiger telegraf-tiger bot added the feat Improvement on an existing feature such as adding a new setting/mode to an existing plugin label Oct 31, 2024
@Reimirno Reimirno changed the title feat(prometheusremotewrite): Native histogram support PoC feat(prometheusremotewrite): Native histogram support end-to-end Nov 4, 2024
@Reimirno Reimirno marked this pull request as ready for review November 4, 2024 18:45
@telegraf-tiger
Copy link
Contributor

telegraf-tiger bot commented Nov 4, 2024

@Reimirno Reimirno changed the title feat(prometheusremotewrite): Native histogram support end-to-end feat(parser.prometheusremotewrite, serializer.prometheusremotewrite): Native histogram support end-to-end Nov 4, 2024
Copy link
Member

@srebhan srebhan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Reimirno thanks for your contribution! I do have some comments in the code. Additionally, please split the PR into one part for the parser and one part for the serializer so we do one thing per PR!

@@ -16,6 +16,9 @@ additional configuration options for Prometheus Remote Write Samples.

## Data format to consume.
data_format = "prometheusremotewrite"

## Whether to parse a native histogram into one Telegraf metric
keep_native_histograms_atomic = false
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we please use metric_version similar to what is done in the prometheus parser?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

switching metric_version from prometheus parser seems to have more implications other than affecting how histogram is parsed.

Now, on the prometheusremotewrite side, it seems the current implementation is equivalent to v2. I could emulate a v1 here - where the atomic parsing of both classic and native would be implemented.

@@ -9,13 +9,16 @@ import (
"github.com/prometheus/common/model"
"github.com/prometheus/prometheus/prompb"

"github.com/gogo/protobuf/proto"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you need this instead of the upstream and well maintained github.com/golang/protobuf package?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is what prometheus uses to serialize their protobuf in remote write.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just found out the rationale for them to use this protobuf lib: prometheus/prometheus#14668.
It seem that they do have plan to migrate away from this and to a more maintained lib. So we should probably just do that.

Comment on lines +71 to +80
// If keeping it atomic, we serialize the histogram into one single Telegraf metric
// For now we keep the histogram as a serialized proto
// Another option is to convert it to multi-field Telegraf metric
serialized, err := proto.Marshal(&hp)
if err != nil {
return nil, fmt.Errorf("failed to marshal histogram: %w", err)
}
fields := map[string]any{
metricName: string(serialized),
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You really should go for the multi-field option as you cannot do anything in Telegraf with the serialized format but only can pass this through to the prometheusremotewrite serializer...

Copy link
Author

@Reimirno Reimirno Nov 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Totally agree, I wanted to do that too, but it's actually not straightforward.
Prometheus native histogram protobuf has quite a few fields being array, an even array of struct. There is no good way of representing that in Telegraf metric field (Value data type: Float | Integer | UInteger | String | Boolean)
(Same thing goes with otel exponential histogram)
Maybe I am missing something - could use some advice :)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In particular, the problematic fields are:

  • count (and zero_count): oneof in protobuf
    This exist to hande both inthistogram and floathistogram. We can break up into two fields countInt or countFloat and potentially add a flag isFloatHistogram to achieve information lossless conversion. Or we can convert all to floathistogram, and only store a count float. This is what the current implementation in Telegraf prometheusremotewrite parser is doing.

  • negative_counts negative_deltas etc: int or float array.
    We break down to index-suffixed fields like negative_counts_0=... negative_counts_1=...

  • negative_spans and positive_spans: array of BucketSpan which has two fields offset and length.
    We can break down to index-and-field-suffixed fields like negative_spans_0_offset=xxx negative_spans_0_length=xxx ...

@srebhan srebhan self-assigned this Nov 5, 2024
@srebhan srebhan added area/prometheus plugin/parser 1. Request for new parser plugins 2. Issues/PRs that are related to parser plugins labels Nov 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/prometheus feat Improvement on an existing feature such as adding a new setting/mode to an existing plugin plugin/parser 1. Request for new parser plugins 2. Issues/PRs that are related to parser plugins
Projects
None yet
Development

Successfully merging this pull request may close these issues.

prometheusremotewrite: End-to-end Prometheus Native Histogram Support
3 participants