Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Support usage trackers for received and discarded bytes. #11840

Merged
merged 37 commits into from
Feb 24, 2024

Conversation

jeschkies
Copy link
Contributor

@jeschkies jeschkies commented Jan 31, 2024

What this PR does / why we need it:
Currently, Loki exposes the distributor_bytes_received_total and discarded_bytes_total metrics. These only expose the tenant and the discarded reason in the labels. In some cases it is useful to be specific to to certain streams. E.g. one might want to know how many bytes from a certain region were received and dropped. This is where usage trackers come in.

They receive the added or discarded bytes of a stream and its labels. This can be used to build a custom tracker such as Mimir's grafana/mimir#1188 which uses regex matchers and exports the active series as a Prometheus metric.

Checklist

  • Reviewed the CONTRIBUTING.md guide (required)
  • Documentation added
  • Tests updated
  • CHANGELOG.md updated
    • If the change is worth mentioning in the release notes, add add-to-release-notes label
  • Changes that require user attention or interaction to upgrade are documented in docs/sources/setup/upgrade/_index.md
  • For Helm chart changes bump the Helm chart version in production/helm/loki/Chart.yaml and update production/helm/loki/CHANGELOG.md and production/helm/loki/README.md. Example PR
  • If the change is deprecating or removing a configuration option, update the deprecated-config.yaml and deleted-config.yaml files respectively in the tools/deprecated-config-checker directory. Example PR

@github-actions github-actions bot added the type/docs Issues related to technical documentation; the Docs Squad uses this label across many repositories label Feb 2, 2024
@jeschkies jeschkies marked this pull request as ready for review February 2, 2024 15:59
@jeschkies jeschkies requested a review from a team as a code owner February 2, 2024 15:59
Copy link
Contributor

@cstyan cstyan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should have some mechanism for cleaning these metrics up if they haven't been updated recently, in which case they're just excess cardinality.

Also, it looks like disitrbutors processing Push requests is roughly 10% of their CPU time, it would be good to know what kind of additional CPU cost we're incurring when we have some of these custom trackers for a tenant or two.

pkg/distributor/distributor.go Outdated Show resolved Hide resolved
@jeschkies jeschkies requested a review from cstyan February 22, 2024 15:17
@jeschkies jeschkies removed the request for review from slim-bean February 22, 2024 15:22
pkg/ingester/limiter.go Outdated Show resolved Hide resolved
pkg/ingester/instance.go Outdated Show resolved Hide resolved
Comment on lines 708 to 712
type labelData struct {
ls labels.Labels
labels string
hash uint64
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we modify this struct to just have the ls? or do we call access the labels enough that ls.String() would be too expensive?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's really weird. We use it to override stream.Labels. I think this is because the parsing also sorts the labels.

return fmt.Errorf(validation.LineTooLongErrorMsg, maxSize, labels, len(entry.Line))
}

if len(entry.StructuredMetadata) > 0 {
if !ctx.allowStructuredMetadata {
validation.DiscardedSamples.WithLabelValues(validation.DisallowedStructuredMetadata, ctx.userID).Inc()
validation.DiscardedBytes.WithLabelValues(validation.DisallowedStructuredMetadata, ctx.userID).Add(float64(len(entry.Line)))
if v.customStreamsTracker != nil {
for _, matchedLbs := range ctx.customTrackerConfig.MatchTrackers(labels) {
v.customStreamsTracker.DiscardedBytesAdd(ctx.userID, matchedLbs, float64(len(entry.Line)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be useful to pass the reason why the bytes have been discarded

@@ -145,6 +145,18 @@ func otlpToLokiPushRequest(ld plog.Logs, userID string, tenantsRetention Tenants
labelsStr := streamLabels.String()

lbs := modelLabelsSetToLabelsList(streamLabels)

// Init custom streams tracking
trackedLabels := customTrackersConfig.MatchTrackers(lbs)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we take into account the labels from structured metadata to track them also?


for i := range trackedLabels {
stats.logLinesBytesCustomTrackers[i].Bytes[tenantsRetention.RetentionPeriodFor(userID, lbs)] += int64(len(entry.Line))
stats.structuredMetadataBytesCustomTrackers[i].Bytes[tenantsRetention.RetentionPeriodFor(userID, lbs)] += int64(labelsSize(entry.StructuredMetadata) - resourceAttributesAsStructuredMetadataSize - scopeAttributesAsStructuredMetadataSize)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we need to include in this metric everything that is added to structured metadata

Suggested change
stats.structuredMetadataBytesCustomTrackers[i].Bytes[tenantsRetention.RetentionPeriodFor(userID, lbs)] += int64(labelsSize(entry.StructuredMetadata) - resourceAttributesAsStructuredMetadataSize - scopeAttributesAsStructuredMetadataSize)
stats.structuredMetadataBytesCustomTrackers[i].Bytes[tenantsRetention.RetentionPeriodFor(userID, lbs)] += int64(labelsSize(entry.StructuredMetadata))

if err != nil {
return nil, nil, fmt.Errorf("couldn't parse labels: %w", err)
}
}
trackedLabels := customTrackers.MatchTrackers(lbs)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would call it trackers

Suggested change
trackedLabels := customTrackers.MatchTrackers(lbs)
trackers := customTrackers.MatchTrackers(lbs)

@jeschkies jeschkies changed the title feat: Support custom trackers for received and discarded bytes. feat: Support usage trackers for received and discarded bytes. Feb 23, 2024
@github-actions github-actions bot removed the type/docs Issues related to technical documentation; the Docs Squad uses this label across many repositories label Feb 23, 2024
@jeschkies
Copy link
Contributor Author

@vlad-diachenko and @cstyan I've changed the code to just inject a usage tracker. This is quite generic and allows us to encapsulate the labels filtering in summing in a few simple methods.

Copy link
Contributor

@cstyan cstyan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM but I still think the additional CPU usage is worth measuring

Also, it looks like disitrbutors processing Push requests is roughly 10% of their CPU time, it would be good to know what kind of additional CPU cost we're incurring when we have some of these custom trackers for a tenant or two.

@jeschkies
Copy link
Contributor Author

Thanks @cstyan. I'll measure the impact when I'm adding a concrete tracker.

@jeschkies jeschkies merged commit 6578a00 into grafana:main Feb 24, 2024
11 checks passed
Copy link
Contributor

@vlad-diachenko vlad-diachenko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great 💎

@jeschkies jeschkies deleted the karsten/custom-trackers branch February 27, 2024 09:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants