Clarify that exporter timeout settings must be positive #4283

jack-berg · 2024-11-05T20:45:31Z

Related to: open-telemetry/opentelemetry-java#6850

The env var configuration interface defines a variety of options related to exporter timeout settings:

OTEL_EXPORTER_OTLP_TIMEOUT
OTEL_EXPORTER_ZIPKIN_TIMEOUT
OTEL_BSP_EXPORT_TIMEOUT
OTEL_BLRP_EXPORT_TIMEOUT
OTEL_METRIC_EXPORT_TIMEOUT

Any value that represents a duration, for example a timeout, MUST be an integer representing a number of milliseconds. The value is non-negative - if a negative value is provided, the implementation MUST generate a warning, gracefully ignore the setting and use the default value if it is defined.

Implying that zero is a valid duration in addition to positive values. However, there is no explicit mention of zero, and how to interpret it.

Does zero mean indefinite? If so, that's a really important piece of information for implementers.
Does zero actually mean zero, and represent a degenerate case which is valid but always ends up with timeout? This seems useless.

I think the more likely case is that the spec overlooked zero for exporter timeouts.

I propose we clarify that zero is an invalid duration for all exporter timeout settings, and characterize this as a bugfix. Any language implementation which doesn't validate exporter timeout at all, or accepts zero as valid can fix their implementation as a bugfix. If a language implementation is insistent that switching from no validation to validating that timeout is positive is a breaking change (I disagree but this is a hypothetical), then they can continue without adding the new validation. This would be similar to how we changed the default protocol from grpc to http/protobuf, carving out an exception for backwards compatibility.

The text was updated successfully, but these errors were encountered:

trask · 2024-11-06T03:23:36Z

I'm kind of used to timeout 0 meaning "never timeout", e.g.

https://linux.die.net/man/7/socket

If the timeout is set to zero (the default) then the operation will never timeout.

That said, I'm not sure there's a practical use case for supporting "never timeout" in OpenTelemetry pipelines(?)

I could see "retry forever" when retrying from disk, but I think that would be a "retry setting" as opposed to a "timeout setting".

svrnm · 2024-11-25T09:52:30Z

@open-telemetry/technical-committee PTAL

MrAlias · 2024-11-25T18:56:22Z

I'm kind of used to timeout 0 meaning "never timeout"

This is how Go interprets a value of zero for many (maybe all?) of these timeouts.

pellared · 2024-11-25T19:37:48Z

In .NET -1 is often used to define "never timeout" (sic!).
References:

tigrannajaryan · 2024-11-25T21:27:45Z

If we can't agree on semantics of 0 vs -1 for "never timeout", here is an alternate: don't support it.

The point of "never timeout" logic is that typically you handle the timeouts yourself and have some other means to interrupt whatever operation is waiting on that "forever timeout". "Never timeout" does not mean wait until the heat death of the universe.

Given the above, is there a practical application of "never timeout" values for any of the Otel config settings? AFAIK, we don't provide any other means to interrupt it, so what's the point of waiting forever (until end of the process, I assume)?

For the hypothetical unknown use case where an actual forever timeout is needed I am guessing the largest acceptable number in milliseconds is enough (assuming 32bits signed, you get 50 years. I want to see a process that runs longer than that).

pellared · 2024-11-25T21:58:56Z

I do not say that using -1 for "never timeout" is good.
Maybe OTel .NET would agree to handle 0 as "never timeout". However, it may be considered as a breaking change.
CC @open-telemetry/dotnet-maintainers

However, I do agree with @tigrannajaryan that we should have a real use case when such setting is needed. Supporting unbounded values can be considered unsafe.

yurishkuro · 2024-11-25T22:38:37Z

+1 to change the spec to require positive values only

jack-berg · 2024-12-04T16:04:35Z

However, it may be considered as a breaking change.

I mentioned in the issue how I think we should handle this from a compatibility standpoint:

I propose we clarify that zero is an invalid duration for all exporter timeout settings, and characterize this as a bugfix. Any language implementation which doesn't validate exporter timeout at all, or accepts zero as valid can fix their implementation as a bugfix. If a language implementation is insistent that switching from no validation to validating that timeout is positive is a breaking change (I disagree but this is a hypothetical), then they can continue without adding the new validation. This would be similar to how we changed the default protocol from grpc to http/protobuf, carving out an exception for backwards compatibility.

Related to #4283. A [comment](#4331 (comment)) adding a "type" column to each env var, but didn't feel appropriate to extend scope of #4331. --------- Co-authored-by: Carlos Alberto Cortez <calberto.cortez@gmail.com> Co-authored-by: Reiley Yang <reyang@microsoft.com>

jack-berg added the spec:protocol Related to the specification/protocol directory label Nov 5, 2024

jack-berg mentioned this issue Nov 5, 2024

Require exporter timeouts to be positive open-telemetry/opentelemetry-java#6850

Open

trask added the triage:deciding:community-feedback Open to community discussion. If the community can provide sufficient reasoning, it may be accepted label Nov 5, 2024

github-actions bot added the triage:followup Needs follow up during triage label Nov 20, 2024

jack-berg added the triage:accepted:ready-with-sponsor Ready to be implemented and has a specification sponsor assigned label Dec 4, 2024

jack-berg self-assigned this Dec 4, 2024

github-project-automation bot added this to 🔭 Main Backlog Dec 4, 2024

jack-berg removed the triage:deciding:tc-inbox Needs attention from the TC in order to move forward label Dec 4, 2024

This was referenced Dec 9, 2024

Change various usages of minimum to exclusiveMinimum open-telemetry/opentelemetry-configuration#151

Open

Clarify boundaries of numeric env vars #4331

Open

pellared mentioned this issue Dec 18, 2024

Editorial: Add type column to all environment variables #4336

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clarify that exporter timeout settings must be positive #4283

Clarify that exporter timeout settings must be positive #4283

jack-berg commented Nov 5, 2024

trask commented Nov 6, 2024

svrnm commented Nov 25, 2024

MrAlias commented Nov 25, 2024

pellared commented Nov 25, 2024

tigrannajaryan commented Nov 25, 2024

pellared commented Nov 25, 2024

yurishkuro commented Nov 25, 2024

jack-berg commented Dec 4, 2024

Clarify that exporter timeout settings must be positive #4283

Clarify that exporter timeout settings must be positive #4283

Comments

jack-berg commented Nov 5, 2024

trask commented Nov 6, 2024

svrnm commented Nov 25, 2024

MrAlias commented Nov 25, 2024

pellared commented Nov 25, 2024

tigrannajaryan commented Nov 25, 2024

pellared commented Nov 25, 2024

yurishkuro commented Nov 25, 2024

jack-berg commented Dec 4, 2024