Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle backpressure earlier in pipeline #2371

Merged
merged 5 commits into from
Oct 30, 2024
Merged

Conversation

denrase
Copy link
Collaborator

@denrase denrase commented Oct 23, 2024

📜 Description

Move the task queue where we limit the number of consecutive un-awaited SDK tasks to the earliest possible point in the SDK. That way we don't run event processors on events/transactions that will get dropped eventually.

💡 Motivation and Context

Relates to #2360
Relates to #2368

💚 How did you test it?

CI

📝 Checklist

  • I reviewed submitted code
  • I added tests to verify changes
  • No new PII added or SDK only sends newly added PII if sendDefaultPii is enabled
  • I updated the docs if needed
  • All tests passing
  • No breaking changes

🔮 Next steps

@buenaflor I think we should also keep #2368, as it is debouncing based on time, and the task queue is about how many parallel tasks can be scheduled with sentry, independent from their duration. WDYT?

Copy link

codecov bot commented Oct 23, 2024

Codecov Report

Attention: Patch coverage is 88.88889% with 3 lines in your changes missing coverage. Please review.

Project coverage is 84.72%. Comparing base (7954fb3) to head (2e3680c).
Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
dart/lib/src/transport/task_queue.dart 71.42% 2 Missing ⚠️
dart/lib/src/sentry.dart 94.73% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2371      +/-   ##
==========================================
- Coverage   84.72%   84.72%   -0.01%     
==========================================
  Files         253      253              
  Lines        9083     9097      +14     
==========================================
+ Hits         7696     7707      +11     
- Misses       1387     1390       +3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@denrase denrase marked this pull request as ready for review October 23, 2024 09:52
Copy link
Contributor

github-actions bot commented Oct 23, 2024

Android Performance metrics 🚀

  Plain With Sentry Diff
Startup time 653.86 ms 707.64 ms 53.78 ms
Size 6.49 MiB 7.57 MiB 1.08 MiB

Baseline results on branch: main

Startup times

Revision Plain With Sentry Diff
8a111a9 496.27 ms 520.57 ms 24.31 ms
5f2f77b 429.06 ms 507.74 ms 78.68 ms
7ec9238 414.02 ms 513.94 ms 99.91 ms
abfcdb5 416.55 ms 498.88 ms 82.33 ms
26e955b 369.52 ms 458.60 ms 89.07 ms
7f75f32 347.36 ms 419.58 ms 72.22 ms
7d5e695 433.45 ms 485.11 ms 51.66 ms
cdf7172 348.54 ms 390.81 ms 42.27 ms
5aba417 355.78 ms 450.39 ms 94.61 ms
3334ac1 303.98 ms 366.65 ms 62.67 ms

App size

Revision Plain With Sentry Diff
8a111a9 6.49 MiB 7.56 MiB 1.07 MiB
5f2f77b 6.35 MiB 7.40 MiB 1.05 MiB
7ec9238 6.35 MiB 7.42 MiB 1.06 MiB
abfcdb5 6.35 MiB 7.40 MiB 1.05 MiB
26e955b 6.27 MiB 7.20 MiB 956.49 KiB
7f75f32 6.26 MiB 7.20 MiB 959.18 KiB
7d5e695 6.49 MiB 7.55 MiB 1.07 MiB
cdf7172 5.94 MiB 6.95 MiB 1.01 MiB
5aba417 5.94 MiB 6.96 MiB 1.02 MiB
3334ac1 6.06 MiB 7.03 MiB 993.54 KiB

Previous results on branch: enha/earlier-task-queue

Startup times

Revision Plain With Sentry Diff
42d6616 454.80 ms 486.77 ms 31.97 ms
05bc7c5 440.45 ms 477.56 ms 37.11 ms

App size

Revision Plain With Sentry Diff
42d6616 6.49 MiB 7.57 MiB 1.08 MiB
05bc7c5 6.49 MiB 7.57 MiB 1.08 MiB

Copy link
Contributor

@buenaflor buenaflor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@denrase I'm seeing this only now but we don't handle client reports here do we?

We have a type for that in the develop docs: queue_overflow: a SDK internal queue (eg: transport queue) overflowed

Copy link
Contributor

github-actions bot commented Oct 23, 2024

iOS Performance metrics 🚀

  Plain With Sentry Diff
Startup time 1267.69 ms 1287.02 ms 19.33 ms
Size 8.38 MiB 9.75 MiB 1.37 MiB

Baseline results on branch: main

Startup times

Revision Plain With Sentry Diff
613760b 1263.10 ms 1277.27 ms 14.16 ms
2966d88 1251.76 ms 1270.21 ms 18.46 ms
f172c4d 1350.66 ms 1408.49 ms 57.83 ms
0a23f98 1252.98 ms 1276.76 ms 23.78 ms
6daa837 1250.42 ms 1265.60 ms 15.18 ms
f79eecf 1210.25 ms 1221.65 ms 11.40 ms
7954fb3 1247.20 ms 1272.15 ms 24.94 ms
5112c69 1272.76 ms 1293.37 ms 20.61 ms
bf4aed7 1274.63 ms 1286.48 ms 11.85 ms
24f71aa 1267.47 ms 1272.00 ms 4.53 ms

App size

Revision Plain With Sentry Diff
613760b 8.15 MiB 9.13 MiB 1000.46 KiB
2966d88 8.32 MiB 9.38 MiB 1.06 MiB
f172c4d 8.33 MiB 9.62 MiB 1.29 MiB
0a23f98 8.10 MiB 9.18 MiB 1.08 MiB
6daa837 8.33 MiB 9.40 MiB 1.07 MiB
f79eecf 8.29 MiB 9.36 MiB 1.07 MiB
7954fb3 8.38 MiB 9.75 MiB 1.37 MiB
5112c69 8.16 MiB 9.17 MiB 1.01 MiB
bf4aed7 8.10 MiB 9.17 MiB 1.08 MiB
24f71aa 8.10 MiB 9.16 MiB 1.07 MiB

Previous results on branch: enha/earlier-task-queue

Startup times

Revision Plain With Sentry Diff
42d6616 1247.06 ms 1255.36 ms 8.30 ms
05bc7c5 1241.57 ms 1271.11 ms 29.54 ms

App size

Revision Plain With Sentry Diff
42d6616 8.38 MiB 9.75 MiB 1.37 MiB
05bc7c5 8.38 MiB 9.75 MiB 1.37 MiB

@buenaflor
Copy link
Contributor

buenaflor commented Oct 23, 2024

I think we should also keep #2368, as it is debouncing based on time, and the task queue is about how many parallel tasks can be scheduled with sentry, independent from their duration. WDYT?

I just wanna make sure that the screenshot is still attached to the events that do go through the task queue, other than that I'm not against leaving the debouncing

I think it's also not a good experience if the screenshot is 'inconsistent', some errors have it, some don't

@denrase
Copy link
Collaborator Author

denrase commented Oct 23, 2024

I think it's also not a good experience if the screenshot is 'inconsistent', some errors have it, some don't

Fair point, this would indeed be an issue and potentially confusing. Also users can change the SDK behaviour by setting a different maxQueue value. Lets close the other PR then. 👍

@denrase denrase mentioned this pull request Oct 23, 2024
6 tasks
Copy link
Contributor

@buenaflor buenaflor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

generally looks good, got one question

image

so these are not dropped since we await each call but according to the user this example can somewhat reproduce watchdog killing the app on ios.

can you reproduce this?

@buenaflor
Copy link
Contributor

probably also makes sense to use the memory profiler of the flutter devtool to check what's allocating so much in a tight loop

@denrase
Copy link
Collaborator Author

denrase commented Oct 29, 2024

@buenaflor The app in the simulator hangs if session replay is enabled. Probably because there are too many ViewHierarchyElements being created/destroyed in a short amount of time (~64k objects). Without, it works. Memory usage increases while the loop is running.

Bildschirmfoto 2024-10-29 um 10 59 12 Bildschirmfoto 2024-10-29 um 10 57 41 Bildschirmfoto 2024-10-29 um 11 28 06

@buenaflor
Copy link
Contributor

@denrase thx for investigating so if only screenshot is enabled the memory usage spikes in a 'tight' loop with await but would not be enough for watchdogs to terminate the app?

the user reported this is happening more often on older devices

@denrase
Copy link
Collaborator Author

denrase commented Oct 29, 2024

@buenaflor Still need to test this on an actual iOS device. In the meantime, can the user in question use something like exponential backoff until we come up with a solution? Think this PR here is good in itself, no need to do processing for dropped tasks, but it will not solve the issue of too many awaited tasks in a loop. Here we are still attaching screenshots.

Come to think of it, what if we cache the screenshot for some time? That way every event would still get a screenshot attached, but we could probably also relieve a bit of CPU/Memory pressure.

@buenaflor
Copy link
Contributor

@denrase I don't think they can do an exponential back-off, they have a specific use case where they need to retry aggresively because their users are typically in areas where internet connection is quite bad (think of in an airplane or in the mountains etc...)

Caching the screenshot for a period of time sounds interesting, might be worth exploring as an opt-in option

@buenaflor
Copy link
Contributor

it's for sure a special use case, normally we'd discourage this anyway but it's a good opportunity to work on some memory/cpu expensive operations in the SDK and get some insight of the overhead of certain features

@buenaflor
Copy link
Contributor

that being said, let's explore if we could cache the screenshot reliably in some way

can you create a new issue for that pls

@denrase
Copy link
Collaborator Author

denrase commented Oct 29, 2024

Ok, working on iPhone 16 Pro. Short spike in memory usage (Xcode), but goes through without issues. Can test later on an iPhone X.

Bildschirmfoto 2024-10-29 um 16 39 55

Copy link
Contributor

@buenaflor buenaflor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, since this PR is about un-awaited tasks

regarding improving caling await captureException in a tight loop let's talk in another thread

@denrase denrase merged commit cecd4ed into main Oct 30, 2024
135 of 136 checks passed
@denrase denrase deleted the enha/earlier-task-queue branch October 30, 2024 09:18
martinhaintz added a commit that referenced this pull request Nov 11, 2024
* main:
  release: 8.10.1
  fix: android build error when compiling (#2397)
  release: 8.10.0
  chore: prepare changelog for `8.10.0` release (#2391)
  chore(deps): update Cocoa SDK to v8.40.1 (#2394)
  fix: cocoa sdk version updater (#2392)
  Send Less Client Reports When Rate Limited (#2380)
  build(deps): bump ruby/setup-ruby from 1.197.0 to 1.199.0 (#2386)
  chore(deps): update Native SDK to v0.7.12 (#2390)
  chore(deps): update Android SDK to v7.16.0 (#2373)
  fix build error for latest flutter beta (3.27.0) (#2385)
  Remove duplicate tests in sentry_client_test.dart (#2378)
  Handle backpressure earlier in pipeline (#2371)
  Add screenshot to `SentryFeedbackWidget` (#2369)

# Conflicts:
#	flutter/lib/src/event_processor/screenshot_event_processor.dart
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants