Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Coalesce poll events on queue eviction #33

Merged
merged 7 commits into from
Dec 15, 2024
Merged

Coalesce poll events on queue eviction #33

merged 7 commits into from
Dec 15, 2024

Conversation

vortigont
Copy link
Collaborator

@vortigont vortigont commented Dec 13, 2024

Coalesce poll events on queue eviction

Refer: mathieucarbou/ESPAsyncWebServer#165

Try to coalesce two (or more) consecutive poll events into one.
This usually happens with poor implemented user-callbacks that are runs too long and makes poll events to stack in the queue if consecutive user callback for a same connection runs longer that poll time then it will fill the queue with events until it deadlocks.
This is a workaround to mitigate such poor designs and won't let other events/connections to starve the task time.
It won't be effective if user would run multiple simultaneous long running callbacks due to message interleaving.

queue deadlock and congestion avoidance

  • do not let _remove_events_with_arg() call to block on queue pertubation, otherwise it could deadlock

  • in any case do not block on adding LWIP_TCP_POLL event to the queue, it does not make much sense anyway
    due to poll events are repetitive

  • _get_async_event() will discard LWIP_TCP_POLL events when q gets filled up
    this will work in combination with throttling poll events when adding to the queue
    Poor designed apps and multiple parallel connections could flood the queue with interleaved poll events
    that can't be properly throttled or coalesced. So we can discard it in the eviction task after coalescing
    It will work in this way:

    • queue is up to 1/4 full - all events are entering the queue and serviced
    • queue is from 1/4 and up to 3/4 full - new poll events are throttled on enqueue with linear-increasing probability
    • queue is from 3/4 up to full top - all new poll events are ignored on enqueue and existing poll events
      already in the queue are discarded on eviction with linear-increasing probability giving away priority for all other
      events to be serviced. It is expected that on a new poll timer connection polls could reenter the queue

Refer: mathieucarbou/ESPAsyncWebServer#165

Let's try to coalesce two (or more) consecutive poll events into one
this usually happens with poor implemented user-callbacks that are runs too long and makes poll events to stack in the queue
if consecutive user callback for a same connection runs longer that poll time then it will fill the queue with events until it deadlocks.
This is a workaround to mitigate such poor designs and won't let other events/connections to starve the task time.
It won't be effective if user would run multiple simultaneous long running callbacks due to message interleaving.
todo: implement some kind of fair dequeing or (better) simply punish user for a bad designed callbacks by resetting hog connections
@vortigont vortigont marked this pull request as draft December 13, 2024 08:38
@mathieucarbou
Copy link
Owner

mathieucarbou commented Dec 13, 2024

For discussion: => mathieucarbou/ESPAsyncWebServer#165 (comment)

Repository owner locked and limited conversation to collaborators Dec 13, 2024
@vortigont vortigont marked this pull request as ready for review December 14, 2024 15:24
src/AsyncTCP.cpp Outdated Show resolved Hide resolved
Repository owner unlocked this conversation Dec 14, 2024
@vortigont vortigont marked this pull request as draft December 14, 2024 16:34
@vortigont
Copy link
Collaborator Author

I reworked it.
Now WDT could just enabled or disabled despite of the core affinity. If enabled it would constantly guard the asycntcp task with periodic feeds on each event.
Actually with much improved queue control I'm no longer sure that WD could be of much use by default.

Also added more safeguards on queue operation. This should make a big difference in robustness and congestion handling.

@vortigont vortigont marked this pull request as ready for review December 15, 2024 05:46
 - do not let `_remove_events_with_arg()` call to block on queue pertubation, otherwise it could deadlock

 - in any case do not block on adding LWIP_TCP_POLL event to the queue, it does not make much sense anyway
   due to poll events are repetitive

 - `_get_async_event()` will discard LWIP_TCP_POLL events when q gets filled up
   this will work in combination with throttling poll events when adding to the queue
   Poor designed apps and multiple parallel connections could flood the queue with interleaved poll events
   that can't be properly throttled or coalesced. So we can discard it in the eviction task after coalescing
   It will work in this way:
    - queue is up to 1/4 full - all events are entering the queue and serviced
    - queue is from 1/4 and up to 3/4 full - new poll events are throttled on enqueue with linear probability
    - queue is from 3/4 up to full top - all new poll events are ignored on enqueue and existing poll events
      already in the queue are discarded on eviction with linear probability giving away priority for all other
      events to be serviced. It is expected that on a new poll timer connection polls could reenter the queue
@mathieucarbou
Copy link
Owner

I reworked it. Now WDT could just enabled or disabled despite of the core affinity. If enabled it would constantly guard the asycntcp task with periodic feeds on each event. Actually with much improved queue control I'm no longer sure that WD could be of much use by default.

Also added more safeguards on queue operation. This should make a big difference in robustness and congestion handling.

I think the WDT is mostly useful for the user ondata callback to make sure the user code runs fast and does not do some delay or long loop tasks. I think it has a value and should be activated by default.

@mathieucarbou mathieucarbou marked this pull request as draft December 15, 2024 16:45
@mathieucarbou
Copy link
Owner

moving to draft until testing is done (see mathieucarbou/ESPAsyncWebServer#171)

@mathieucarbou mathieucarbou merged commit 8e5c375 into main Dec 15, 2024
32 checks passed
@mathieucarbou mathieucarbou deleted the coalescedq branch December 15, 2024 17:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants