Define instrumentation configuration API #4128

jack-berg · 2024-07-03T18:59:45Z

Resolves #3535.

This introduces an API component to file configuration, which has been limited to SDK (i.e. end user facing) up until this point.

The configuration model recently added the first surface area related to instrumentation configuration properties in open-telemetry/opentelemetry-configuration#91.

The API proposed in this PR is collectively called the "Instrumentation config API", and provides a mechanism for instrumentation libraries to participate in file configuration and read relevant properties during initialization. The intent is for both OpenTelemetry-authored and native instrumentation alike to be able to be configured by users in a standard way. New API surface area is necessary to accomplish this to avoid instrumentation libraries from needing to take a dependency on SDK artifacts.

The following summarizes the additions:

Introduce ConfigProvider, the instrumentation config API analog of TracerProvider, MeterProvider, LoggerProvider. This is the entry point to the API.
Define "Get instrumentation config" operation for ConfigProvider. This returns something called ConfigProperties, which is a programmatic representation of a YAML mapping node. The ConfigProperties returned by "Get instrumentation config" represents the .instrumentation node defined in a config file.
Rebrand "file configuration" to "declarative configuration". This expresses the intent without coupling to the file representation, which although will be the most popular way to consume these features is just one possible way to represent the configuration model and use these tools.
Break out dedicated api.md, data-model.md, and sdk.md files for respective API, data model, and SDK portions of declarative configuration. This aligns with other portions of the spec. The separation should improve clarity regarding what should and should not be exposed in the API.

I've prototyped this new API in opentelemetry-java here: open-telemetry/opentelemetry-java#6549

cc @open-telemetry/configuration-maintainers, @open-telemetry/specs-semconv-maintainers

specification/configuration/file-configuration.md

yurishkuro

I think the text needs to be sanitized to avoid references to "file", especially in the API sections. API has nothing to do with files, it is only concerned with the configuration data model. Whether that configuration comes from a file or from a REST endpoint is irrelevant to the API.

specification/configuration/file-configuration.md

lmolkova

Left some specific comments on configuration API.

I believe instrumentation API does not belong in file-configuration.md and should be moved to api-configuration.md and should be written without any assumption on where configuration properties come from.

specification/configuration/file-configuration.md

jack-berg · 2024-07-08T17:23:14Z

I think the text needs to be sanitized to avoid references to "file", especially in the API sections. API has nothing to do with files, it is only concerned with the configuration data model. Whether that configuration comes from a file or from a REST endpoint is irrelevant to the API.

What do folks think about breaking up file-configuration.md into 3 files to mirror the structure of the other signals?

./configuration/data-model.md - talk about the configuration data model and the file representation of it
./configuration/api.md - talk about the new instrumentation config API introduced in this PR
./configuration/sdk.md - talk about the implementation of the API, and SDK tooling including parse, create operations, ComponentProvider

Do we think the spec would benefit from this decoupling? Or does the benefit of having all the information co-located in a single page outweigh the benefit of breaking out separate pages?

lmolkova · 2024-07-16T20:09:29Z

What do folks think about breaking up file-configuration.md into 3 files to mirror the structure of the other signals?

I support breaking things down with a caveat:

./configuration/data-model.md - talk about the configuration data model and the file representation of it

talks about property structure/names without specifying where properties came from (file or not) in the same way as Spring configuration properties are not tight to any specific source.

Having all things (data model, api, sdk) in one file could also be an option if file-based configuration details are defined in a different file.

…del docs

…y-specification into config-provider

specification/configuration/README.md

jack-berg · 2024-07-23T17:48:47Z

I've pushed a few commits which restructure the config docs. See updated PR description for details.

specification/configuration/sdk.md

specification/configuration/README.md

specification/configuration/api.md

codefromthecrypt

This feature is near and dear to the LLM work to me, as there are some configuration some vendors would like to be able to set, knowing there is a moratorium on OTEL_ values. For example, disabling of prompt/completion (think request/response) logging

I'm not an approver, but I'm going to play as one. I haven't reviewed fully the implementation, but I agree with the direction and consider my comments in the nit/as you wish category

spec-compliance-matrix.md

specification/configuration/README.md

specification/configuration/api.md

specification/configuration/sdk.md

codefromthecrypt · 2024-07-26T05:14:23Z

After reviewing the impl something came to mind. We have our semantics, makes sense. Do we consider things that aren't generic enough to have their own semantics but ought to be common for all langs? e.g. openai specific config which may be an extension of genai/llm? or postgres config which ought to be the same regardless of java or whatever?

Just clarifying end state scope interest.

jack-berg · 2024-07-29T20:48:02Z

I'm not an approver, but I'm going to play as one.

This is highly encouraged and very helpful 🙂

A handful of the comments were on sections of code which were relocated rather than introduced in this PR, but I addressed them anyhow because they were editorial.

We have our semantics, makes sense. Do we consider things that aren't generic enough to have their own semantics but ought to be common for all langs? e.g. openai specific config which may be an extension of genai/llm? or postgres config which ought to be the same regardless of java or whatever?

I think so. Another example of this is configuration for opentelemetry-sqlcommenter - #3560 is tracking adding standardizing configuration for the various language implementations. Over in opentelemetry-configuration we define the scope of the schema here. We may need to rephrase to capture these types of scenarios. Or perhaps semantic-conventions can hold definitions can define these more specific instrumentation configuration concerns.

github-actions · 2024-08-06T03:17:21Z

This PR was marked stale due to lack of activity. It will be closed in 7 days.

zeitlinger

great work 😄

specification/configuration/README.md

specification/configuration/api.md

specification/configuration/data-model.md

specification/configuration/sdk.md

zeitlinger · 2024-08-06T10:13:59Z

I wonder about the autoconfig bridge from the java sdk to the spring starter and how it relates to this spec.

Is such bridge function a may or should - or maybe something else entirely?

jack-berg · 2024-08-07T17:40:04Z

I wonder about the autoconfig bridge from the java sdk to the spring starter and how it relates to this spec.
Is such bridge function a may or should - or maybe something else entirely?

My gut feeling is that this is a java specific concern, but I think we'll need to do some prototyping and see if anything emerges that seems like it will recur across languages.

zeitlinger · 2024-08-08T07:16:21Z

I wonder about the autoconfig bridge from the java sdk to the spring starter and how it relates to this spec.
Is such bridge function a may or should - or maybe something else entirely?

My gut feeling is that this is a java specific concern, but I think we'll need to do some prototyping and see if anything emerges that seems like it will recur across languages.

@brunobat is quarkus planning to use the config bridge?

…y-specification into config-provider

jack-berg · 2024-08-13T14:32:29Z

This PR has been open for over a month now and has enough approvals to merge. Will merge tomorrow unless someone communicates an intent to provide additional review / feedback.

Thanks.

brunobat · 2024-08-16T11:32:20Z

I wonder about the autoconfig bridge from the java sdk to the spring starter and how it relates to this spec.
Is such bridge function a may or should - or maybe something else entirely?

My gut feeling is that this is a java specific concern, but I think we'll need to do some prototyping and see if anything emerges that seems like it will recur across languages.

@brunobat is quarkus planning to use the config bridge?

No, Quarkus has no plans to use the yaml based config

yurishkuro · 2024-08-19T18:34:43Z

specification/configuration/data-model.md

+undefined_key: ${UNDEFINED_KEY}                       # Invalid reference, UNDEFINED_KEY is not defined and is replaced with ""
+${STRING_VALUE}: value                                # Invalid reference, substitution is not valid in mapping keys and reference is ignored
+recursive_key: ${REPLACE_ME}                          # Valid reference to REPLACE_ME
+# invalid_identifier_key: ${STRING_VALUE:?error}      # If uncommented, this is an invalid identifier, it would fail to parse


@jack-berg why isn't :?error syntax supported? It's standard shell syntax, just like :-default. Was there a discussion?

Bigger point - if we're using some convention, such as shell var syntax, it's very surprising when that convention is only partially supported. Context: https://github.com/open-telemetry/opentelemetry-collector/pull/10907/files#r1722093779

Let's open an issue to track. This syntax wasn't introduced in this PR, it was just moved around. At the time when env var substitution syntax was needed, we were solving a very targeted problem and used shell syntax prior art to avoid reinventing something. You make a good point that partially supporting shell syntax would be surprising to some users.

chalin · 2024-09-27T18:12:28Z

specification/configuration/sdk-configuration.md

@@ -1,56 +0,0 @@
-<!--- Hugo front matter used to generate the website version of this page:
-linkTitle: SDK
-aliases: [/docs/reference/specification/sdk-configuration]


This Hugo front matter should have been moved to the top of the new sdk.md file, otherwise we lose the alias, which can result in 404s.

/cc @open-telemetry/docs-maintainers

Resolves open-telemetry#3535. This introduces an API component to file configuration, which has been limited to SDK (i.e. end user facing) up until this point. The configuration model recently added the first surface area related to instrumentation configuration properties in open-telemetry/opentelemetry-configuration#91. The API proposed in this PR is collectively called the "Instrumentation config API", and provides a mechanism for instrumentation libraries to participate in file configuration and read relevant properties during initialization. The intent is for both OpenTelemetry-authored and native instrumentation alike to be able to be configured by users in a standard way. New API surface area is necessary to accomplish this to avoid instrumentation libraries from needing to take a dependency on SDK artifacts. The following summarizes the additions: - Introduce ConfigProvider, the instrumentation config API analog of TracerProvider, MeterProvider, LoggerProvider. This is the entry point to the API. - Define "Get instrumentation config" operation for ConfigProvider. This returns something called ConfigProperties, which is a programmatic representation of a YAML mapping node. The ConfigProperties returned by "Get instrumentation config" represents the [`.instrumentation`](https://github.com/open-telemetry/opentelemetry-configuration/blob/670901762dd5cce1eecee423b8660e69f71ef4be/examples/kitchen-sink.yaml#L438-L439) node defined in a config file. - Rebrand "file configuration" to "declarative configuration". This expresses the intent without coupling to the file representation, which although will be the most popular way to consume these features is just one possible way to represent the configuration model and use these tools. - Break out dedicated `api.md`, `data-model.md`, and `sdk.md` files for respective API, data model, and SDK portions of declarative configuration. This aligns with other portions of the spec. The separation should improve clarity regarding what should and should not be exposed in the API. I've prototyped this new API in `opentelemetry-java` here: open-telemetry/opentelemetry-java#6549 cc @open-telemetry/configuration-maintainers, @open-telemetry/specs-semconv-maintainers

Define instrumentation configuration API

650bb7e

jack-berg requested review from a team July 3, 2024 18:59

github-actions bot assigned yurishkuro Jul 3, 2024

carlosalberto approved these changes Jul 4, 2024

View reviewed changes

yurishkuro reviewed Jul 4, 2024

View reviewed changes