Risk of Validator double signing due to inconsistent / wrong behavior of istanbul.start()
and istanbul.stop()
#2056
Unanswered
erNail
asked this question in
Help (Q&A)
Replies: 1 comment 1 reply
-
Hi @erNail! |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Problem/Motivation
When running a Celo validator, you can use the commands
istanbul.start()
andistanbul.stop()
to make the validator start or stop validating. This is useful if you want to run redundant validators. If one validator is not running as expected, you stop him from validating, and you make your backup validator start validating.We might have found an issue with this approach.
istanbul.start()
andistanbul.stop()
don't always work as expected. In our case,istanbul.stop()
did not stop the validator from validating. This can cause double signing issues when you executeistanbul.start()
on another validator.Expected Behavior
istanbul.stop()
, a validator should stop validating.istanbul.replicaState.state
should show"Replica"
. Whenistanbul.stop()
throws an error, the validator should still be validating.istanbul.replicaState.state
should show"Primary"
istanbul.replicaState.state
showed"Primary"
before restarting, it should show"Primary"
after restartingCurrent Behavior
The following happened while trying to fix a validator that stopped validating. We are not sure why it stopped and the logs also didn't indicate an issue, so the first thing we did was restart the validator. This did not help, so we tried to activate our backup validator. We were able to see the following behavior:
istanbul.stop()
on the broken validator, an error was thrown (Sadly we don't have the exact error message anymore).istanbul.replicaState.state
was showing"Primary"
istanbul.replicaState.state
was showing"Replica"
. Even though we did not executeistanbul.stop()
again.istanbul.replicaState.state
again. This time, it showed"Primary"
. Even though we did not executeistanbul.start()
.TL;DR: Our validator did not become a
"Replica"
when we executedistanbul.stop()
. After a restart it showed that it is a"Replica"
however. After some minutes, it became a"Primary"
again.Steps to reproduce
We were not able to reproduce this behavior. Our guess is that we have an edge case on our hands, caused by the validator stopping to work correctly. If you were to somehow reproduce it, it would look something like this:
istanbul.stop()
on a"Primary"
validator that stopped working, receive an error, executeistanbul.replicaState.state
and see the state"Primary"
istanbul.replicaState.state
and see the state"Replica"
istanbul.replicaState.state
and see the state"Primary"
Additional information
We acknowledge that it might be impossible to debug/analyze this issue, since it can't be easily reproduced and more detailed information about the errors and logs are missing. We decided to create this issue anyway to make the validators and developers aware that there might be a bug that can increase the risk of double signing. We will add additional information if we see this issue again.
Beta Was this translation helpful? Give feedback.
All reactions