feat(parser.prometheusremotewrite, serializer.prometheusremotewrite): Native histogram support end-to-end #16121

Reimirno · 2024-10-31T22:04:58Z

Summary

Add a configuration field to prometheusremotewrite parser to change its behavior when parsing Prometheus native histogram.

Previously prometheusremotewrite parser parses a native histogram into multiple Telegraf metric (the conversion is not lossless and is irreversible); now it parses it into one single Telegraf metric. Correspondingly, add a code path to prometheusremotewrite serializer to be able to handle this single Telegraf "native histogram" metric and convert back to Prometheus native histogram. NOTE: logic that handles classic histogram is orthogonal and not changed.

This treatment of native histograms is able to preserve its benefits in terms of correctness guarantee (atomicity, no correctness issue due to write batching) and its performance gain (low cardinality, sparse data structure). More importantly, this means Telegraf can supports native histogram end-to-end: ingest a native histogram and write out a native histogram.

Detailed rationale, including a diagram, please see #16120

Test

Previously, using a prometheusremotewrite HTTP listener v2 input to accept prometheus remote write, and a prometheusremotewrite HTTP output, Telegraf ingests prometheus native histogram but writes out several counters, as if this were a classic histogram.

In the query here, a native histogram get translated into several "bucket" metrics (but without _bucket suffix and with a <metric_name>_le tag.

Now, after this PR:

This native histogram now gets correctly written out as a native histogram.

I can add unit test / integration test when the community reaches some agreement on this implementation.

Checklist

No AI generated code was used in this PR

Related issues

resolves #16120

telegraf-tiger · 2024-10-31T22:05:05Z

Thanks so much for the pull request!
🤝 ✒️ Just a reminder that the CLA has not yet been signed, and we'll need it before merging. Please sign the CLA when you get a chance, then post a comment here saying !signed-cla

Reimirno · 2024-10-31T22:07:27Z

!signed-cla

telegraf-tiger · 2024-11-04T18:52:46Z

Download PR build artifacts for linux_amd64.tar.gz, darwin_arm64.tar.gz, and windows_amd64.zip.
Downloads for additional architectures and packages are available below.

☺️ This pull request doesn't significantly change the Telegraf binary size (less than 1%)

📦 Click here to get additional PR build artifacts

Artifact URLs

DEB	RPM	TAR GZ	ZIP
amd64.deb	aarch64.rpm	darwin_amd64.tar.gz	windows_amd64.zip
arm64.deb	armel.rpm	darwin_arm64.tar.gz	windows_arm64.zip
armel.deb	armv6hl.rpm	freebsd_amd64.tar.gz	windows_i386.zip
armhf.deb	i386.rpm	freebsd_armv7.tar.gz
i386.deb	ppc64le.rpm	freebsd_i386.tar.gz
mips.deb	riscv64.rpm	linux_amd64.tar.gz
mipsel.deb	s390x.rpm	linux_arm64.tar.gz
ppc64el.deb	x86_64.rpm	linux_armel.tar.gz
riscv64.deb		linux_armhf.tar.gz
s390x.deb		linux_i386.tar.gz
		linux_mips.tar.gz
		linux_mipsel.tar.gz
		linux_ppc64le.tar.gz
		linux_riscv64.tar.gz
		linux_s390x.tar.gz

srebhan

@Reimirno thanks for your contribution! I do have some comments in the code. Additionally, please split the PR into one part for the parser and one part for the serializer so we do one thing per PR!

srebhan · 2024-11-05T13:49:26Z

plugins/parsers/prometheusremotewrite/README.md

@@ -16,6 +16,9 @@ additional configuration options for Prometheus Remote Write Samples.

  ## Data format to consume.
  data_format = "prometheusremotewrite"
+
+  ## Whether to parse a native histogram into one Telegraf metric
+  keep_native_histograms_atomic = false


Can we please use metric_version similar to what is done in the prometheus parser?

switching metric_version from prometheus parser seems to have more implications other than affecting how histogram is parsed.

Now, on the prometheusremotewrite side, it seems the current implementation is equivalent to v2. I could emulate a v1 here - where the atomic parsing of both classic and native would be implemented.

srebhan · 2024-11-05T13:50:29Z

plugins/parsers/prometheusremotewrite/parser.go

@@ -9,13 +9,16 @@ import (
 	"github.com/prometheus/common/model"
 	"github.com/prometheus/prometheus/prompb"

+	"github.com/gogo/protobuf/proto"


Why do you need this instead of the upstream and well maintained github.com/golang/protobuf package?

this is what prometheus uses to serialize their protobuf in remote write.

I just found out the rationale for them to use this protobuf lib: prometheus/prometheus#14668.
It seem that they do have plan to migrate away from this and to a more maintained lib. So we should probably just do that.

srebhan · 2024-11-05T13:52:04Z

plugins/parsers/prometheusremotewrite/parser.go

+				// If keeping it atomic, we serialize the histogram into one single Telegraf metric
+				// For now we keep the histogram as a serialized proto
+				// Another option is to convert it to multi-field Telegraf metric
+				serialized, err := proto.Marshal(&hp)
+				if err != nil {
+					return nil, fmt.Errorf("failed to marshal histogram: %w", err)
+				}
+				fields := map[string]any{
+					metricName: string(serialized),
+				}


You really should go for the multi-field option as you cannot do anything in Telegraf with the serialized format but only can pass this through to the prometheusremotewrite serializer...

Totally agree, I wanted to do that too, but it's actually not straightforward.
Prometheus native histogram protobuf has quite a few fields being array, an even array of struct. There is no good way of representing that in Telegraf metric field (Value data type: Float | Integer | UInteger | String | Boolean)
(Same thing goes with otel exponential histogram)
Maybe I am missing something - could use some advice :)

In particular, the problematic fields are:

count (and zero_count): oneof in protobuf
This exist to hande both inthistogram and floathistogram. We can break up into two fields countInt or countFloat and potentially add a flag isFloatHistogram to achieve information lossless conversion. Or we can convert all to floathistogram, and only store a count float. This is what the current implementation in Telegraf prometheusremotewrite parser is doing.

negative_counts negative_deltas etc: int or float array.
We break down to index-suffixed fields like negative_counts_0=... negative_counts_1=...

negative_spans and positive_spans: array of BucketSpan which has two fields offset and length.
We can break down to index-and-field-suffixed fields like negative_spans_0_offset=xxx negative_spans_0_length=xxx ...

Reimirno changed the title ~~Native histogram support PoC~~ feat(prometheusremotewrite): Native histogram support PoC Oct 31, 2024

telegraf-tiger bot added the feat Improvement on an existing feature such as adding a new setting/mode to an existing plugin label Oct 31, 2024

Reimirno changed the title ~~feat(prometheusremotewrite): Native histogram support PoC~~ feat(prometheusremotewrite): Native histogram support end-to-end Nov 4, 2024

yulong-db added 7 commits November 4, 2024 10:15

PoC single Telegraf metrics for prom native histogram

3739be4

Add a config to toggle this

e2aced7

fix type assertion err

283389a

hotfix break up long line

e57fafd

use proto encoding (w/o snappy compression)

774eda7

rebase master

166075a

fix timestamp

f7beae8

Reimirno force-pushed the native-histogram branch from 498d7d6 to 0be4c14 Compare November 4, 2024 18:16

rebase upstream master

f9d5ac0

Reimirno force-pushed the native-histogram branch from 0be4c14 to f9d5ac0 Compare November 4, 2024 18:17

clean up imports and comments

e363ef3

Reimirno marked this pull request as ready for review November 4, 2024 18:45

Reimirno changed the title ~~feat(prometheusremotewrite): Native histogram support end-to-end~~ feat(parser.prometheusremotewrite, serializer.prometheusremotewrite): Native histogram support end-to-end Nov 4, 2024

srebhan reviewed Nov 5, 2024

View reviewed changes

srebhan self-assigned this Nov 5, 2024

srebhan added area/prometheus plugin/parser 1. Request for new parser plugins 2. Issues/PRs that are related to parser plugins labels Nov 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(parser.prometheusremotewrite, serializer.prometheusremotewrite): Native histogram support end-to-end #16121

feat(parser.prometheusremotewrite, serializer.prometheusremotewrite): Native histogram support end-to-end #16121

Reimirno commented Oct 31, 2024 •

edited

Loading

telegraf-tiger bot commented Oct 31, 2024

Reimirno commented Oct 31, 2024

telegraf-tiger bot commented Nov 4, 2024

Artifact URLs

srebhan left a comment

srebhan Nov 5, 2024

Reimirno Nov 5, 2024

srebhan Nov 5, 2024

Reimirno Nov 5, 2024

Reimirno Nov 5, 2024

srebhan Nov 5, 2024

Reimirno Nov 5, 2024 •

edited

Loading

Reimirno Nov 5, 2024

feat(parser.prometheusremotewrite, serializer.prometheusremotewrite): Native histogram support end-to-end #16121

Are you sure you want to change the base?

feat(parser.prometheusremotewrite, serializer.prometheusremotewrite): Native histogram support end-to-end #16121

Conversation

Reimirno commented Oct 31, 2024 • edited Loading

Summary

Test

Checklist

Related issues

telegraf-tiger bot commented Oct 31, 2024

Reimirno commented Oct 31, 2024

telegraf-tiger bot commented Nov 4, 2024

Artifact URLs

srebhan left a comment

Choose a reason for hiding this comment

srebhan Nov 5, 2024

Choose a reason for hiding this comment

Reimirno Nov 5, 2024

Choose a reason for hiding this comment

srebhan Nov 5, 2024

Choose a reason for hiding this comment

Reimirno Nov 5, 2024

Choose a reason for hiding this comment

Reimirno Nov 5, 2024

Choose a reason for hiding this comment

srebhan Nov 5, 2024

Choose a reason for hiding this comment

Reimirno Nov 5, 2024 • edited Loading

Choose a reason for hiding this comment

Reimirno Nov 5, 2024

Choose a reason for hiding this comment

Reimirno commented Oct 31, 2024 •

edited

Loading

Reimirno Nov 5, 2024 •

edited

Loading