Refactor leader.c to fix stack growth in handle_exec_sql #697
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR is another attempt to fix the stack growth issue with handle_exec_sql (related to #679), this time in a more principled way that involves a significant refactor of leader.c. Details follow.
Recall that the issue arises with a call stack like this (callees at the top)
that is, indirect recursion that can generate a number of stack frames proportional to the number of
;
-separated statements in anEXEC_SQL
request. This happens because leader__barrier and leader__exec can invoke their callbacks synchronously when suspending is not required (respectively, because the FSM is up to date with the raft log and and becausesqlite3_step
generated no changed pages).This PR rewrites those two functions, establishing a new contract: the callback is only invoked if we yielded to the event loop at least once while processing the request; if the request was processed synchronously, a magic value
LEADER_NOT_ASYNC
is returned to indicate that the caller should invoke the callback. Existing callsites are updated to reflect this new contract.With these new implementations available, we can fix the original problem by rewriting
handle_exec_sql_next
to iteratively process as many statements as possible until one of them suspends or returns an error. The PR includes a regression test to validate this fix.The new implementations of the leader.c functions are intentionally different in style from the old ones. Effectively, each barrier request and exec request is a coroutine driven by a state machine. With this approach the
LEADER_NOT_ASYNC
feature falls out rather naturally, and we get the added benefits of more compact code and built-in observability.