Skip to content

Commit

Permalink
Merge pull request #313 from Microsoft/master
Browse files Browse the repository at this point in the history
Master to develop
  • Loading branch information
abaranch authored Sep 19, 2016
2 parents 910da0d + 170432f commit 37cec52
Show file tree
Hide file tree
Showing 7 changed files with 157 additions and 0 deletions.
79 changes: 79 additions & 0 deletions docs/ServerTelemetryChannel error handling.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@
This document was last updated 7/14/2016 and is applicable to SDK version 2.2-beta1.

# Server Telemetry Channel Error Handling

* [Introduction](https://github.com/Microsoft/ApplicationInsights-dotnet/blob/master/docs/ServerTelemetryChannel%20error%20handling.md#introduction)
* [Supported Status codes](https://github.com/Microsoft/ApplicationInsights-dotnet/blob/master/docs/ServerTelemetryChannel%20error%20handling.md#supported-status-codes)
* [Partial success (206) response format](https://github.com/Microsoft/ApplicationInsights-dotnet/blob/master/docs/ServerTelemetryChannel%20error%20handling.md#partial-success-206-response-format)
* [ErrorHandlingTransmissionPolicy](https://github.com/Microsoft/ApplicationInsights-dotnet/blob/master/docs/ServerTelemetryChannel%20error%20handling.md#errorhandlingtransmissionpolicy)
* [ThrottlingTransmissionPolicy](https://github.com/Microsoft/ApplicationInsights-dotnet/blob/master/docs/ServerTelemetryChannel%20error%20handling.md#throttlingtransmissionpolicy)
* [PartialSuccessTransmissionPolicy](https://github.com/Microsoft/ApplicationInsights-dotnet/blob/master/docs/ServerTelemetryChannel%20error%20handling.md#partialsuccesstransmissionpolicy)
* [NetworkAvailabilityTransmissionPolicy](https://github.com/Microsoft/ApplicationInsights-dotnet/blob/master/docs/ServerTelemetryChannel%20error%20handling.md#networkavailabilitytransmissionpolicy)
* [ApplicationLifecycleTransmissionPolicy](https://github.com/Microsoft/ApplicationInsights-dotnet/blob/master/docs/ServerTelemetryChannel%20error%20handling.md#applicationlifecycletransmissionpolicy)


## Introduction

When channel finishes sending transmission (a serialized and compressed batch of telemetry items) out an event is generated.
There are several transmission policy classes that subscribe to this event. These policies get exception information and response from event's arguments. If policy decides to modify channel behaviour it sets sender, buffer or storage capacity to 0. (For example, if it changes sender and buffer capacity to 0 all new data will go to storage till we reach disk size limit).

There is also another set of policies that subscribe to a different events and also change sender, buffer and storage capacities to influence how channel behavies.

### Supported Status codes

* 206 - partial success (some items from the batch were not accepted, response contains more details)
* 408 - request timeout
* 429 - too many requests
* 439 - too many requests over extended time
* 500 - server error
* 503 - service unavailable

### Partial success (206) response format

```
{
"itemsReceived": 2,
"itemsAccepted": 1,
"errors": [
{
"index": 0,
"statusCode": 400,
"message": "109: Field 'startTime' on type 'RequestData' is required but missing or empty. Expected: string, Actual: undefined"
}
]
}
```

### [ErrorHandlingTransmissionPolicy](https://github.com/Microsoft/ApplicationInsights-dotnet/blob/master/src/TelemetryChannels/ServerTelemetryChannel/Shared/Implementation/ErrorHandlingTransmissionPolicy.cs)

Notes:
* This policy handles failures with status codes 408, 500, 503
* "Set timer to restore capacity using Retry-After or exponential backoff" means that
* We check that Retry-After header is present. In the header we expect to get TimeSpan. Timer is set to restore capacity after this interval. (Note that with current backend implementation Retry-After is never returned for 408, 500, 503).
* If Retry-After header is not present we check how many consecutive errors occured so far and use exponential backoff algorythm to set a timer to restore capacity. Exponential backoff algorythm description: http://en.wikipedia.org/wiki/Exponential_backoff
* We do not update number of consecutive errors if it was recently updated because we have mupliple sender that most likely to fail at the same time for intermittent issues.

![Img](./images/ErrorHandlingPolicy.PNG)

### [ThrottlingTransmissionPolicy](https://github.com/Microsoft/ApplicationInsights-dotnet/blob/master/src/TelemetryChannels/ServerTelemetryChannel/Shared/Implementation/ThrottlingTransmissionPolicy.cs)

Notes:
* This policy handles failures with status codes 429, 439
* With current backend implementation for 429 we get Retry-After header, and 439 is not used.

![Img](./images/ThrottlingPolicy.PNG)

### [PartialSuccessTransmissionPolicy](https://github.com/Microsoft/ApplicationInsights-dotnet/blob/master/src/TelemetryChannels/ServerTelemetryChannel/Shared/Implementation/PartialSuccessTransmissionPolicy.cs)

Notes:
* This policy handles status code 206 and case when there is no failure and no response (success case)

![Img](./images/PartialSuccessPolicy.PNG)

### [NetworkAvailabilityTransmissionPolicy](https://github.com/Microsoft/ApplicationInsights-dotnet/blob/master/src/TelemetryChannels/ServerTelemetryChannel/Shared/Implementation/NetworkAvailabilityTransmissionPolicy.cs)

This policy subscribes to the [network change event](https://msdn.microsoft.com/en-us/library/system.net.networkinformation.networkchange.networkaddresschanged%28v=vs.110%29.aspx?f=255&MSPPError=-2147217396). When network becomes unavailable sender and buffer capacity are set to 0. Note that consecutive errors count that affects exponental backoff logic is not changed.

### [ApplicationLifecycleTransmissionPolicy](https://github.com/Microsoft/ApplicationInsights-dotnet/blob/master/src/TelemetryChannels/ServerTelemetryChannel/Shared/Implementation/ApplicationLifecycleTransmissionPolicy.cs)

This policy subscribes uses [IRegisteredObject](https://msdn.microsoft.com/en-us/library/system.web.hosting.iregisteredobject(v=vs.110).aspx) to get notification when application is stopping. When application is stopping sender and buffer capacity are set to 0. Note that consecutive errors count that affects exponental backoff logic is not changed.
Binary file added docs/images/ErrorHandlingPolicy.PNG
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
24 changes: 24 additions & 0 deletions docs/images/ErrorHandlingPolicy.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
Image was created using http://flowchart.js.org/
Metadata:

st=>start: Transmission sending finished
e=>end: Finish
cond1=>condition: Failed
with WebException that has
HttpWebResponse
cond2=>condition: StatusCode
408/500/503
op1=>operation: Report Warning
op2=>operation: Buffer and sender capacity = 0
op3=>operation: Set timver to restore capacity. Interval
either from Retry-After or exponential backoff
op4=>operation: Number of errors +1 (used to
calculate exp. backoff) if it was not
updated in the last 10sec
op5=>operation: Enqueue failed transaction back

st->cond1
cond1(yes)->cond2
cond1(no)->op1->e
cond2(yes, bottom)->op2(right)->op3->op4->op5->e
cond2(no)->op1->e
Binary file added docs/images/PartialSuccessPolicy.PNG
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
26 changes: 26 additions & 0 deletions docs/images/PartialSuccessPolicy.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
Image was created using http://flowchart.js.org/
Metadata:

st=>start: Transmission sending finished
e=>end: Finish

cond1=>condition: No Exception, No Response
cond2=>condition: StatusCode 206 and
Response has list of errors

op1=>operation: Number of errors = 0 (used to
calculate exp. backoff)
op2=>operation: Buffer and sender capacity = 0
op3=>operation: Set timver to restore capacity. Interval
either from Retry-After or exponential backoff
op4=>operation: Number of errors +1 (used to
calculate exp. backoff) if it was not
updated in the last 10sec
op5=>operation: Enqueue new transaction
created from errors list

st->cond1
cond1(yes)->cond2
cond1(no)->op1->e
cond2(yes, bottom)->op2->op3->op4->op5->e
cond2(no)->op1->e
Binary file added docs/images/ThrottlingPolicy.PNG
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
28 changes: 28 additions & 0 deletions docs/images/ThrottlingPolicy.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
Image was created using http://flowchart.js.org/
Metadata:

st=>start: Transmission sending finished
e=>end: Finish

cond1=>condition: Failed
with WebException that has
HttpWebResponse
cond2=>condition: StatusCode 429
cond3=>condition: StatusCode 439

op1=>operation: Set storage capacity = 0
op2=>operation: Buffer and sender capacity = 0
op3=>operation: Set timver to restore capacity. Interval
either from Retry-After or exponential backoff
op4=>operation: Number of errors +1 (used to
calculate exp. backoff) if it was not
updated in the last 10sec
op5=>operation: Enqueue failed transaction back

st->cond1
cond1(yes)->cond2
cond1(no)->e
cond2(yes, bottom)->op2->op3->op4->op5->e
cond2(no)->cond3
cond3(yes)->op1->op2->op3->op4->op5->e
cond3(no)->e

0 comments on commit 37cec52

Please sign in to comment.