Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suppress calls to cloud server for a Recovery Period after request failures. #134

Draft
wants to merge 47 commits into
base: main
Choose a base branch
from

Conversation

drasmart
Copy link
Contributor

@drasmart drasmart commented Oct 11, 2024

Changes

  • Introduce PipelineTemporarilyUnavailableException.
  • Introduce IRecoveryStrategy -- defines for how long to suppress calls after certain amount of "logical failures".
    • InstantRecoveryStrategy -- always allow new calls (pre-change behavior or no recovery period).
    • NoRecoveryStrategy -- does not allow any new calls ever after first failure.
    • SimpleRecoveryStrategy -- disallow new calls for RecoverySeconds period after then most recent failure.
  • Introduce IFailHandler -- defines when to suppress calls.
    • SimpleFailHandler -- simply wraps provided IRecoveryStrategy.
    • WindowedFailHandler -- wait for certain amount (failuresToEnterRecovery) to accumulate within certain amount of time (failuresWindow) before forwarding the failure to the underling IRecoveryStrategy. Also calls Reset upon first success after exiting Recovery Period.
  • Add new CloudRequestEngine constructor.
    • Endpoint- and key- related string parameters grouped into a single EndpointsAndKeys struct (saved to read-only field).
    • Required IFailHandler parameter.
    • Old constructor (without throttling strategy) now forwards to this (new) one.
    • CloudRequestEngineBuilder now also uses this (new) constructor.
  • Use [Lazy] for initialization and storing values for [IFlowElement.EvidenceKeyFilter] and [ICloudRequestEngine.PublicProperties].
    • Simplifies update to AsyncLazyFailable planned for v4.5
  • Introduce CloudRequestEngineTemporarilyUnavailableException (subclass of PipelineTemporarilyUnavailableException).
    • Thrown by CloudRequestEngine when IFailHandler prevents (likely temporarily) sending requests to cloud server due to recent failures.
  • Expose SimpleThrottlingStrategy.RecoverySeconds on CloudRequestEngineBuilder.
    • If non-positive value provided (default: 60), NoThrottlingStrategy is used.
  • Expose WindowedFailHandler.FailuresToEnterRecovery (min: 1, max: 100, default: 10) and WindowedFailHandler.FailuresWindowSeconds (required to be positive, default: 100) on CloudRequestEngineBuilder.
  • Mark constructor-initialized fields (_loggerFactory, _dataLogger, _httpClient) on CloudRequestEngineBuilder as read-only.
  • Add safety checks to PipelineCapabilities
    • Set _availableElementProperties to [ImmutableDictionary<TKey,TValue>.Empty] singleton if flowData has errors.
    • (Short-circuit) forward to _baseCaps[key] from string this[string key] indexer (like done in explicitly implemented methods).
  • Dispose of (not-to-be-returned) flowData when WebPipeline.Process throws.
  • Add new unit tests
    • for implementations of IFailThrottlingStrategy.
    • for using CloudRequestEngine with SimpleThrottlingStrategy.
  • Also includes formatting changes from original patch file.

Why

Test results


Manual tests of Cloud/Framework-Web example (v.4.4.110-alpha.1+pi.wip)

with .nupkg from 📦 https://github.com/postindustria-tech/pipeline-dotnet/actions/runs/11297040712


Block-listing in Charles

Reloading the page within recovery period ( "RecoveryMilliseconds": 60000 )

Screenshot 2024-10-11 141024

^ did not produce new requests (only the initial 2 blocked ones from the first page load attempt can be seen)

image

After the recovery period

Screenshot 2024-10-11 141049

^ only 2 new requests were made.

image


Timeout due to latency by Charles

Conditions:

"TimeOutSeconds": 2
Round-trip latency (ms): [2500]

Screenshot 2024-10-11 145702

It takes around 6~10 second for the page to show error on [4.4.103]

Screenshot 2024-10-11 150837

While it still takes more than full awaiting of TimeOutSeconds (e.g. ~2.5s) for an error to propagate when outside Recovery Period on [4.4.110-alpha.1+pi.wip] :

Screenshot 2024-10-11 151233

Within Recovery Period, error page loading time is much shorter (under 200ms locally) :

Screenshot 2024-10-11 151447

^ and the page does not fail, as though SuppressProcessExceptions is true.


Manual tests of Cloud/Framework-Web example (v.4.4.113-alpha.1+pi.wip)

with .nupkg from 📦 https://github.com/postindustria-tech/pipeline-dotnet/actions/runs/11335248015


Default settings

"FailuresToEnterRecovery": 10,
"FailuresWindowSeconds": 60,
Time Description Image
0 ms 6 requests fail (timed out). Exception page shown. Screenshot 2024-10-15 103808
17.84 s 4 requests fail (timed out). Recovery started. Exception page shown. Screenshot 2024-10-15 103828
37.16 s (Recovery) No requests sent. Default.aspx rendered. Screenshot 2024-10-15 103851
59.21 s (Recovery) No requests sent. Default.aspx rendered. Screenshot 2024-10-15 103904
1.28 min (Recovery) No requests sent. Default.aspx rendered. Screenshot 2024-10-15 103921
1.46 min (Recovery Period ended) 2 requests sent. Device detected. Default.aspx rendered. Screenshot 2024-10-15 103941

Modified FailuresToEnterRecovery

"FailuresToEnterRecovery": 4,
Description Image
4 requests fail (timed out). Recovery started. Exception page shown. Screenshot 2024-10-15 104702
(Recovery) No requests sent. Default.aspx rendered. Screenshot 2024-10-15 104727

Modified FailuresToEnterRecovery and FailuresWindowSeconds

"FailuresToEnterRecovery": 7,
"FailuresWindowSeconds": 30,
Time Description Image
0 6 requests fail (timed out). Exception page shown. Screenshot 2024-10-15 105014
51.07 s 5 requests fail (timed out). Exception page shown. Screenshot 2024-10-15 105114
1.38 min 5 requests fail (timed out). Exception page shown. Screenshot 2024-10-15 105141
1.68 min 2 requests fail (timed out). Recovery started. Exception page shown. Screenshot 2024-10-15 105156
1.94 min (Recovery) No requests sent. Default.aspx rendered. Screenshot 2024-10-15 105209

Manual tests of Cloud/Framework-Web example (v.4.4.115-alpha.1+pi.wip)

with .nupkg from 📦 https://github.com/postindustria-tech/pipeline-dotnet/actions/runs/11349595554


Modified RecoverySeconds

"RecoverySeconds": 20,
Time Description Image
41.41 s Within recovery due to error 19.1968s ago. Default.aspx rendered. Screenshot 2024-10-15 122133
51.07 s Recovery Period ended. Requests timed out. Exception page shown. Screenshot 2024-10-15 122155

Manual tests of Cloud/Framework-Web example (v.4.4.121-alpha.1+pi.wip)

ASP.NET Core integration

with

Unsuppressed CloudRequestException due to timeout :

Screenshot 2024-10-16 123721

Suppressed CloudRequestEngineTemporarilyUnavailableException displayed by Index.cshtml :

Screenshot 2024-10-16 124903

…get }` (via `LoadAspectProperties(ICloudRequestEngine)`).
…xception` as if `SuppressProcessExceptions = true` on `WebPipeline` level.
…lementAvailableProperties` (like `PropertiesNotYetLoadedException`).
…xception` as if `SuppressProcessExceptions = true` on `WebPipeline` level. pt.2.
@drasmart drasmart changed the title [DRAFT] Suppress calls to cloud server for a Recovery Period after request failure. Suppress calls to cloud server for a Recovery Period after request failure. Oct 11, 2024
@drasmart drasmart changed the title Suppress calls to cloud server for a Recovery Period after request failure. [DRAFT] Suppress calls to cloud server for a Recovery Period after request failures. Oct 15, 2024
@drasmart drasmart changed the title [DRAFT] Suppress calls to cloud server for a Recovery Period after request failures. Suppress calls to cloud server for a Recovery Period after request failures. Oct 15, 2024
@drasmart drasmart changed the title Suppress calls to cloud server for a Recovery Period after request failures. [DRAFT] Suppress calls to cloud server for a Recovery Period after request failures. Oct 16, 2024
@drasmart drasmart changed the title [DRAFT] Suppress calls to cloud server for a Recovery Period after request failures. Suppress calls to cloud server for a Recovery Period after request failures. Oct 16, 2024
@justadreamer
Copy link
Contributor

Will need to also port this functionality to Java, Python, PHP, Node

@justadreamer
Copy link
Contributor

@BohdanVV Let's change the default timeout from 100 seconds to 2 seconds.

@jwrosewell
Copy link
Contributor

@justadreamer @BohdanVV Please can you ensure that the error message and exception will be generated when in recovery mode is clear in the documentation and in the related specifications before this is marked as ready for review?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants