Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal for Native Histogram Text Format #32

Merged
merged 9 commits into from
Oct 22, 2024

Conversation

csmarchbanks
Copy link
Member

Create an initial prosal for supporting native histograms in the text format. It mainly links to the design doc for ease of comments while we decide on a path forward. After a proposal from the design doc has some support the rest of the markdown proposal will be filled out.

Leaving the PR as a draft until the markdown is ready to be published, but comments here or in the Google doc are welcome.

Create an initial prosal for supporting native histograms in the text
format. It mainly links to the design doc for ease of comments while we
decide on a path forward. After a proposal from the design doc has some
support the rest of the markdown proposal will be filled out.

Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com>
@krajorama krajorama self-requested a review February 2, 2024 08:45
Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com>
@csmarchbanks csmarchbanks force-pushed the csmarchbanks/native-histogram-text-format branch 3 times, most recently from 894def9 to e06d3d7 Compare March 22, 2024 02:55
@csmarchbanks csmarchbanks marked this pull request as ready for review March 22, 2024 02:56
@csmarchbanks
Copy link
Member Author

I have updated this proposal with the option from the design doc that gained consensus, and described the proposed format in specific detail. All reviews here are now welcome :)

@csmarchbanks csmarchbanks force-pushed the csmarchbanks/native-histogram-text-format branch from e06d3d7 to 0ef8516 Compare March 22, 2024 02:59
Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com>
@csmarchbanks csmarchbanks force-pushed the csmarchbanks/native-histogram-text-format branch from 0ef8516 to b1283f1 Compare March 22, 2024 03:01
Copy link
Member

@ArthurSens ArthurSens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey 👋

Do I understand correctly that the breaking change here is because you are using the same metric type as the classic histogram, but a different format for the exposed measurements?

If we're moving forward with OpenMetrics 2.0, we might want to align different workforces that are happening towards that, e.g. different format for created timestamps

@csmarchbanks
Copy link
Member Author

I think this could be done with a minor version bump of OpenMetrics, but if it requires going to 2.0 then yes we would coordinate with the other changes. Until then it would be behind feature flags and we would need a version like version=1.1.0-alpha.1 for content negotiation with clients.

Copy link
Member

@ArthurSens ArthurSens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to approve the PR to give you the green light to experiment with things, but as mentioned in the dev summit I don't have the confidence to approve changes to OM.

My initial concerns are:

  • Possible breaking change that requires OM 2.0 - You're using # TYPE <metric name> histogram followed by a line that has a different format from the current histogram format. Current scrapers will break when seeing this.

We could use # TYPE <metric name> native_histogram instead, and change it back to histogram in 2.0, if the scraper silently ignores unknown metric types, but I'm almost certain that's not the current behavior 😬

  • What feature flag are you planning to release this under? The existing native-histogram flag? A new one?

@csmarchbanks
Copy link
Member Author

Reasonable concerns! I added a section on backwards compatibility/versioning with the reason I believe this change to be backwards-compatible and releasable as part of OpenMetrics 1.1. Does that section help?

As far as feature flags go, I think reusing the same native-histogram flag would be the best option (plus content negotiation). I am ok with a new one as well though if others would like to see a new flag. That decision could probably happen at PR review time?

@csmarchbanks csmarchbanks force-pushed the csmarchbanks/native-histogram-text-format branch 2 times, most recently from dce5799 to 8e19761 Compare May 24, 2024 01:24
Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com>
@csmarchbanks csmarchbanks force-pushed the csmarchbanks/native-histogram-text-format branch from 8e19761 to f80fafa Compare May 24, 2024 01:25
Copy link
Member

@beorn7 beorn7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great. Especially the clarifying part about the meaning of forward and backward compatibility in OM makes me happy.

I have just a few optional suggestions for additions.

Let's get a few more thumbs up on this and then approve.

proposals/2024-01-29_native_histograms_text_format.md Outdated Show resolved Hide resolved
proposals/2024-01-29_native_histograms_text_format.md Outdated Show resolved Hide resolved
proposals/2024-01-29_native_histograms_text_format.md Outdated Show resolved Hide resolved
proposals/2024-01-29_native_histograms_text_format.md Outdated Show resolved Hide resolved
proposals/2024-01-29_native_histograms_text_format.md Outdated Show resolved Hide resolved
@beorn7
Copy link
Member

beorn7 commented May 30, 2024

  • Possible breaking change that requires OM 2.0 - You're using # TYPE <metric name> histogram followed by a line that has a different format from the current histogram format. Current scrapers will break when seeing this.

We could use # TYPE <metric name> native_histogram instead, and change it back to histogram in 2.0, if the scraper silently ignores unknown metric types, but I'm almost certain that's not the current behavior 😬

I'm pretty sure that OM is fairly strict with this. While you are free to implement a tolerant parser, it is not required. A fully OM 1.0 compliant parser may (and maybe even should) complain about any unknown metric types. A behavior like "new metric types maybe added, and old parsers should just ignore them" would need to be clearly specified in the spec. Cf. how protobuf is doing it.

The explanation of how forward and backward compatibility works as it is part of this proposal IMHO makes a lot of sense (and I wish it would have been explained that way in the OM spec). To paraphrase: A minor version bump is sufficient if a v1.1 scraper can read a v1.0 exposition without any precaution, i.e. it doesn't even have to know that it is reading v1.0 and not v1.1. It is OK and expected that a v1.0 scraper will be unable to parse a v1.1 exposition.

The linked explanation of content negotiation directly from the OM spec supports that: The scraper can ask for v1.1, but it might just get v1.0 back, which is fine. If on the other hand the scraper asks for v1.0, it must not get v1.1 back, but the producer must be capable of generating valid v1.0 and serve it as such (or, I guess, return an error).

  • What feature flag are you planning to release this under? The existing native-histogram flag? A new one?

Since this is just about the protocol, and there are many possible implementers (scrapers as well as producers), I won't discuss feature flags (which are specific to selected implementers, like in this case the Prometheus server).

However, it might be good to sketch out a plan how to version the implementation as long as there is no official OM v1.1. (Something like v1.1-pre or v1.1-alpha? Not sure how those things are handled in network protocols generally.)

Copy link
Member

@ArthurSens ArthurSens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's do it :)

fstab
fstab previously requested changes May 30, 2024
proposals/2024-01-29_native_histograms_text_format.md Outdated Show resolved Hide resolved
@csmarchbanks csmarchbanks force-pushed the csmarchbanks/native-histogram-text-format branch from 0627868 to a3463fa Compare May 30, 2024 19:42
@csmarchbanks
Copy link
Member Author

However, it might be good to sketch out a plan how to version the implementation as long as there is no official OM v1.1. (Something like v1.1-pre or v1.1-alpha? Not sure how those things are handled in network protocols generally.)

I added an option to the doc, happy to iterate on this now or later. It's always possible to amend these proposals as the work becomes more concrete.

The significant changes are:
1. Add additional examples.
2. Add a proposal for how content negotiation will be done until OM 1.1
   is released.
3. Plural form of arrays in the schema.
4. Added support for multiple exemplars as a non-goal.

Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com>
@csmarchbanks csmarchbanks force-pushed the csmarchbanks/native-histogram-text-format branch from a3463fa to 2920fb2 Compare May 30, 2024 19:53
Copy link
Member

@krajorama krajorama left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, easy to convert to internal data. I added some minor comments.

Copy link
Member

@bwplotka bwplotka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM - modulo versioning. Added some suggestions though. Good work!

proposals/2024-01-29_native_histograms_text_format.md Outdated Show resolved Hide resolved
proposals/2024-01-29_native_histograms_text_format.md Outdated Show resolved Hide resolved
proposals/2024-01-29_native_histograms_text_format.md Outdated Show resolved Hide resolved
Comment on lines +86 to +92
```
Accept: application/openmetrics-text;version=1.1.0-nativehistogram.*,application/openmetrics-text;version=1.0.0,text/plain;version=0.0.4
```
would mean the consumer can accept any nativehistogram enabled pre-release version of OpenMetrics 1.1.0, the base 1.0.0 version of OpenMetrics, or the 0.0.4 version of the classic Prometheus text format. Producers must include the proper content type for their version, such as the first nativehistogram pre-release:
```
Content-Type: application/openmetrics-text;version=1.1.0-nativehistogram.0
```
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, this is interesting. What's "release identifier? I assume you want to version SPEC and native histogram feature separately? Why? Can we make it simpler?

I think I would expect something like:

Accept: application/openmetrics-text;version=1.1.0;feature=nativehistogram,application/openmetrics-text;version=1.0.0,text/plain;version=0.0.4

.. then we could simply do 1.1.1 or 1.2.0 if you have significant histogram change. Let's not create sub versions of every feature, this will make it bit painful to use and maintain. Speaking from experience when we evaled different options for remote write v2: prometheus/docs#2462

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See https://semver.org/#spec-item-9, it is just a pre-release, IMO if version=1.1.0 is there without any sort of prerelease suffix native histograms must be fully supported and there shouldn't be extra feature tags. I would also be ok with something like alpha.0, alpha.1, etc... if we don't want to specifically include nativehistogram in the version.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to put some thoughts into versioning. The current proposal does not cover all of native histograms, Exemplars, created timestamps are out-of-scope.

Now, how will content negotiation work if we add these in the future? How will a client library know whether to include a list of exemplars in native histogram text format or not?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would opt for some way of opting into the feature as an experimental one, without any changes to the standard yet. Based on the experience made, we can make the cut what should land in v1.1 later (including maybe other changes being discussed). In other words: I don't think this design doc has to exactly describe v1.1 already.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am happy to implement some other versioning scheme, the important part to me is that any exposers add some sort of pre-release identifier until OM 1.1 is actually released, and force ingesters to specify that they are ok with a pre-release version. A features section alongside a pre-release version seems reasonable to me, then becomes unnecessary once 1.1 is released. That allows us some flexibility to make breaking changes to the unreleased features. E.g. if we change around where the list of exemplars are stored, or choose to use a less compact format after feedback.

Content negotiation for features added in 1.2, 1.3, etc... will require client libraries to keep track of the features added in each minor version and leave them out if the requested version is lower. That could mean fall all the way back to 1.0, or maybe have support for a few versions (1.0, 1.1, 1.4) and choose the highest version less than or equal to what is requested by the ingester. E.g. if Prometheus requests 1.3, the client would return 1.1 in that example. 1.0 is the only version that all producers MUST support.

Also note that created timestamps have always been in scope (no change from current text format), and I just added a proposal for exemplar support.

Extend the OpenMetrics text format to allow structured values instead of only float values. This structured value will be used to encode a structure with the same fields as is exposed using the [protobuf exposition format](https://github.com/prometheus/client_model/blob/master/io/prometheus/client/metrics.proto). Starting with examples and then breaking up the format:
```
# TYPE nativehistogram histogram
nativehistogram {count:24,sum:100,schema:0,zero_threshold:0.001,zero_count:4,positive_spans:[0:2,1:2],negative_spans:[0:2,1:2],positive_deltas:[2,1,-2,3],negative_deltas:[2,1,-2,3]}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we establish some simple forward compatibility rules for this JSON e.g. Parsers MUST ignore unknown fields? Would help in future.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would be ok with that, but I am also not sure it is necessary with content negotiation as new fields would require part of a new version of OpenMetrics. I.e. I am unsure if we want to impose the level of strictness a given parser needs to implement, and it might be ok for different parsers to make different choices.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW: It's not JSON.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I like a way of adding stuff that old parsers will just ignore, this concept doesn't exist in OM so far, so it might be weird to introduce it in a sub-set of the protocol. Let's maybe solve that more generally in OM v2.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, thanks. Let's make it clear it's not a JSON somewhere please 🙈

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any suggestions as to how to make it more clear that it is not JSON? In the format definition I have "The value for each series of a native histogram is a custom struct format" and I specifically compare it to JSON with the phrase "Compared to JSON this avoids consistently repeating keys and curly braces" after the type definitions.

* Tightened up some language of how to tell classic and native
  histograms apart.
* Added that whitespace is not allowed inside of a value.

Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com>
@csmarchbanks csmarchbanks force-pushed the csmarchbanks/native-histogram-text-format branch from 202deb7 to cec73ef Compare June 4, 2024 21:08
Illustrate that ordering matters for this new format to aid parsing.

Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com>
Provide a paragraph describing support for multiple exemplars as part of
this proposal. Currently it is left ambiguous if a list of exemplars can
be included on more than native histograms, we can define that
explicitly when writing the new OpenMetrics spec.

Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com>
Explicitly state that changing created timestamp semantics and allowing
custom buckets as part of the native histogram structure are not goals
for this proposal.

Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com>
@csmarchbanks
Copy link
Member Author

I have added this as an agenda item for a dev summit. It seems like there are a couple to be decided points still, mainly around minor or major version bump. Hopefully having the sync discussion will allow us to come to a conclusion.

@brancz
Copy link
Member

brancz commented Sep 26, 2024

We had a few consensus items at the dev summit that I think will allow this to move forward:

  • We intend to add native histograms as a part of OpenMetrics 1.1. A major version bump is not required.
  • We agree that the understanding of backwards compatibility is that a A.y parser can also parse an A.x format. But the A.x parser doesn’t need to parse the A.y format. Where y > x, A, x, y being integers
  • OpenMetrics doesn’t need a major version bump if the changes are backwards compatible.

Looking forward to seeing this becoming a reality!

@csmarchbanks csmarchbanks dismissed fstab’s stale review September 26, 2024 14:49

Talked offline and the recent updates have alleviated concerns.

@csmarchbanks
Copy link
Member Author

🎉

With the consensus items from the dev summit, are there any objections to merging this as is? We can continue to tweak it as development work proceeds. If I don't hear any objections I will plan to merge sometime next week.

@beorn7
Copy link
Member

beorn7 commented Oct 1, 2024

I think the only somewhat open point is how to mark pre-release features in content negotiation. At the same time, I think that's an aspect where we don't have to be terribly descriptive and precise in the first place, because it is just for pre-releases and precisely not for the stable standard. @csmarchbanks rightfully pointed to https://semver.org/#spec-item-9 , which essentially allow almost anything after the hyphen, and what's proposed here in the doc is semver compliant and also makes sense to me. As a design doc is not describing the final implementation, we have some wiggle room here, so I would propose to merge this as-is and see how it shakes out in the implementation.

@csmarchbanks csmarchbanks merged commit c47a744 into main Oct 22, 2024
2 checks passed
@csmarchbanks csmarchbanks deleted the csmarchbanks/native-histogram-text-format branch October 22, 2024 21:24
@bwplotka
Copy link
Member

bwplotka commented Nov 25, 2024

Wonder, what's relation of this to OpenMetrics 2.0? 🤔 Do we plan to change both Prometheus text AND OM 2.0 with this?

@beorn7
Copy link
Member

beorn7 commented Nov 26, 2024

I don't think we should change the Prometheus text format, if we can avoid it.

As we concluded above, this could be part of OM 1.1. If it is part of OM 1.1, it should also be part of OM 2.0. If OM 2.0 happens soon, we might never have an OM 1.1, then this would only go into OM 2.0.

@bwplotka
Copy link
Member

bwplotka commented Nov 28, 2024

FYI: prometheus/OpenMetrics#279 - I propose this should be in 2.0, no 1.1 is planned at the moment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants