-
Notifications
You must be signed in to change notification settings - Fork 8.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fixed an error in use smartbft #5024
Conversation
…gning immediately before deletion Signed-off-by: Fedor Partanskiy <fredprtnsk@gmail.com>
d7c45ac
to
3cc8f0f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the very first stage of consensus, the smartBFT library takes the previous commit with signatures and checks for correctness.
If I remember correctly, we only add the previous signatures, when we have leader rotation enabled, and in Fabric we don't have leader rotation enabled. If the leader doesn't send the previous signatures, then the followers won't verify them.
However in the tests, the decisions per leader is 0... it is therefore odd that the previous signatures verification is the cause this behavior.
Can you perhaps add a stack trace dump in the place where we verify the consenter signature?
Just to give some background - we introduced the previous commit signatures in order to know how to prune the blacklist. However, we don't use leader rotation nor the blacklist in Fabric, so before we go ahead and fix something, let's be 100% sure of it's cause.
My suggestion for a fix:
- keep the previous set of nodes in memory
- if we don't find a signature in the current set and the block is equal and/or less than the current block number, check in the previous set
Regarding your fix - I don't think we should leniently just check the previous set if we don't find a signature in the current set. Moreover, in your commit, you check the previous set if you don't find an identity, and not a signature. This function is used also to collect commit messages, so we should not carelessly change it like this.
I think there is a more "holistic" fix:
- When verifying a block, we inspect the Fabric header number and look at the block number. If the last block we have committed (we check the runtime config) is the previous block to the block we're verifying, then we proceed as usual.
- Else, if it is a block we have not committed yet (I know this is not possible, but for completeness, I am writing this case) we return an error.
- Else, it's a block from the past, so we fetch the block with this number from the ledger, and look at its last config block. We then fetch that config block, and proceed to use it, to verify the requested signature.
The above "fix" is very easy to test, as it only changes a single function and preserves the intended verification logic.
easily
|
error itself:
|
Total:
|
I have the block number check. Do you want me to get the config and participant set in the same place? |
I don't think we should be sending the previous commit signatures if we don't need them. Why do that? It's just excess CPU cycles, network bandwidth and disk storage. I think we should not attach them if leader rotation is disabled. |
What if at the time the 5th node is removed, the leader is the 5th node?
and we'll be in the same situation again |
If we're not attaching signatures of the previous block then how can we have a problem in verifying the signatures of the previous block which we do not attach? |
during the change of leadership? |
Or is it only needed when rotation is on? |
exactly, it's only used when periodical leader rotation is active. A regular leader failover has nothing to do with it. |
Okay. right? |
I don't understand how decisions in view is related to what we're talking about. Did you mean DecisionsPerLeader? |
DecisionsPerLeader > 0 - indicator that rotation is enabled DecisionsInView==0 - indicator that the leader has just been changed |
The former is a configuration parameter. So we need to consider the former, not the latter. |
take a look, please. |
fixed an error after deleting a node when it participated in block signing immediately before deletion
Recently the TestAddAndRemoveNodeWithoutStop test has been terminating with an error very often in unit tests.
It went like this:
My suggestion for a fix:
Perhaps there is a better and more beautiful solution.