-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: resolve indexer infinite loop #82
Conversation
on very slow drives or when run with limited resources, a node can have a delay between the block existing & being saved and the block_results getting saved. if the block exists, but the block_results do not, an infinite loop occurs. the indexer will repeatedly request the block and block_results until they both exist. the lack of delay can further constrain the node's resources and result in many calls for block_results before they are committed. this commit updates the condition for waiting to include whenever an error occurred during indexing. if the indexer fails to find the block_results it will bombard the node with requests for it without backing off. this change causes errors to trigger a wait. after waiting for either a new block or for the timeout, the block results are more likely to exist.
i discovered this bug when attempting to sync a node that had a very very slow drive (unwarmed & from a snapshot). after the chain service started, the node output 1000s of errors like
i first confirmed my understanding... stopping the node & starting it again, i saw block when the drive is slow (the save of block results has high latency) or the node has limited resources (the app's processing has high latency), there is a window of time in which the block is saved but the results are not. before this commit, the evm indexer would inundate the node with requests that fail (further limiting the resources the app has to process the block & for cometbft to save the results). now, if an error occurs during indexing, the indexer will wait either until a new block is received (the previous block's results are guaranteed to be saved) or after a timeout (1 minutes). the already-existing wait-for-new-blocks loop is used for the error condition i installed this on the box i was experiencing the problem on. instead of infinitely looping the erroring queries, it failed once, waited, and then continued successfully from that point forward |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for this fix! Could we upstream to main branch as well?
on very slow drives or when run with limited resources, a node can have a delay between the block existing & being saved and the block_results getting saved. if the block exists, but the block_results do not, an infinite loop occurs. the indexer will repeatedly request the block and block_results until they both exist. the lack of delay can further constrain the node's resources and result in many calls for block_results before they are committed. this commit updates the condition for waiting to include whenever an error occurred during indexing. if the indexer fails to find the block_results it will bombard the node with requests for it without backing off. this change causes errors to trigger a wait. after waiting for either a new block or for the timeout, the block results are more likely to exist.
on very slow drives or when run with limited resources, a node can have a delay between the block existing & being saved and the block_results getting saved. if the block exists, but the block_results do not, an infinite loop occurs. the indexer will repeatedly request the block and block_results until they both exist. the lack of delay can further constrain the node's resources and result in many calls for block_results before they are committed. this commit updates the condition for waiting to include whenever an error occurred during indexing. if the indexer fails to find the block_results it will bombard the node with requests for it without backing off. this change causes errors to trigger a wait. after waiting for either a new block or for the timeout, the block results are more likely to exist.
on very slow drives or when run with limited resources, a node can have a delay between the block existing & being saved and the block_results getting saved. if the block exists, but the block_results do not, an infinite loop occurs. the indexer will repeatedly request the block and block_results until they both exist. the lack of delay can further constrain the node's resources and result in many calls for block_results before they are committed.
this commit updates the condition for waiting to include whenever an error occurred during indexing. if the indexer fails to find the block_results it will bombard the node with requests for it without backing off. this change causes errors to trigger a wait. after waiting for either a new block or for the timeout, the block results are more likely to exist.