Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test flake in follower package #4264

Closed
tock-ibm opened this issue Jun 6, 2023 · 2 comments
Closed

Test flake in follower package #4264

tock-ibm opened this issue Jun 6, 2023 · 2 comments
Assignees
Labels

Comments

@tock-ibm
Copy link
Contributor

tock-ibm commented Jun 6, 2023

Description

2023-06-06 06:07:57.555 UTC 0001 INFO [common.tools.configtxgen.localconfig] completeInitialization -> orderer type: etcdraft
2023-06-06 06:07:57.559 UTC 0002 INFO [common.tools.configtxgen.localconfig] Load -> Loaded configuration: /home/runner/work/fabric/fabric/sampleconfig/configtx.yaml
2023-06-06 06:07:57.659 UTC 0003 INFO [common.tools.configtxgen.localconfig] completeInitialization -> orderer type: etcdraft
2023-06-06 06:07:57.659 UTC 0004 INFO [common.tools.configtxgen.localconfig] Load -> Loaded configuration: /home/runner/work/fabric/fabric/sampleconfig/configtx.yaml
2023-06-06 06:07:57.807 UTC 0005 INFO [common.tools.configtxgen.localconfig] completeInitialization -> orderer type: etcdraft
2023-06-06 06:07:57.808 UTC 0006 INFO [common.tools.configtxgen.localconfig] Load -> Loaded configuration: /home/runner/work/fabric/fabric/sampleconfig/configtx.yaml
2023-06-06 06:07:57.949 UTC 0007 INFO [common.tools.configtxgen.localconfig] completeInitialization -> orderer type: etcdraft
2023-06-06 06:07:57.950 UTC 0008 INFO [common.tools.configtxgen.localconfig] Load -> Loaded configuration: /home/runner/work/fabric/fabric/sampleconfig/configtx.yaml
2023-06-06 06:07:57.987 UTC 0009 INFO [follower.test] NewChain -> Created with join-block number: 10, ledger height: 0 channel=my-channel
2023-06-06 06:07:57.987 UTC 000a INFO [follower.test] NewChain -> Created with join-block number: 10, ledger height: 0 channel=my-channel
2023-06-06 06:07:57.987 UTC 000b INFO [follower.test] Start -> Started channel=my-channel
2023-06-06 06:07:57.987 UTC 000c INFO [follower.test] halt -> Stopped channel=my-channel
2023-06-06 06:07:57.988 UTC 000d WARN [follower.test] run -> Pull failed, error: failed to pull up to join block: chain stopped channel=my-channel
2023-06-06 06:07:57.989 UTC 000e INFO [follower.test] NewChain -> Created with a nil join-block, ledger height: 5 channel=my-channel
2023-06-06 06:07:57.997 UTC 000f INFO [follower.test] NewChain -> Created with join-block number: 10, ledger height: 0 channel=my-channel
2023-06-06 06:07:57.997 UTC 0010 INFO [follower.test] Start -> Started channel=my-channel
2023-06-06 06:07:57.998 UTC 0011 INFO [follower.test] pullUpToJoin -> Pulled blocks from 0 until 10 channel=my-channel
2023-06-06 06:07:57.998 UTC 0012 INFO [follower.test] pull -> Onboarding finished successfully, pulled blocks up to join-block channel=my-channel
2023-06-06 06:07:57.998 UTC 0013 INFO [follower.test] pullAfterJoin -> Pulling after join channel=my-channel
2023-06-06 06:07:57.998 UTC 0014 INFO [follower.test] pullAfterJoin -> Pulled after join channel=my-channel
2023-06-06 06:07:57.998 UTC 0015 INFO [follower.test] pull -> Block pulling finished successfully, going to switch from follower to a consensus.Chain channel=my-channel
2023-06-06 06:07:57.998 UTC 0016 INFO [follower.test] halt -> Stopped channel=my-channel
2023-06-06 06:07:58.004 UTC 0017 INFO [follower.test] NewChain -> Created with join-block number: 10, ledger height: 5 channel=my-channel
2023-06-06 06:07:58.004 UTC 0018 INFO [follower.test] Start -> Started channel=my-channel
2023-06-06 06:07:58.005 UTC 0019 INFO [follower.test] pullUpToJoin -> Pulled blocks from 5 until 10 channel=my-channel
2023-06-06 06:07:58.005 UTC 001a INFO [follower.test] pull -> Onboarding finished successfully, pulled blocks up to join-block channel=my-channel
2023-06-06 06:07:58.005 UTC 001b PANI [follower.test] pull -> Join block (10) we pulled mismatches block we joined with channel=my-channel
panic: Join block (10) we pulled mismatches block we joined with

goroutine 67 [running]:
go.uber.org/zap/zapcore.(*CheckedEntry).Write(0xc0004bed80, {0x0, 0x0, 0x0})
	/home/runner/work/fabric/fabric/vendor/go.uber.org/zap/zapcore/entry.go:232 +0x614
go.uber.org/zap.(*SugaredLogger).log(0xc0002c[401](https://github.com/hyperledger/fabric/actions/runs/5178911783/jobs/9344715104?pr=4263#step:5:402)8, 0x4, {0x11d2c6c, 0x39}, {0xc0004ede90, 0x1, 0x1}, {0x0, 0x0, 0x0})
	/home/runner/work/fabric/fabric/vendor/go.uber.org/zap/sugar.go:227 +0x13b
go.uber.org/zap.(*SugaredLogger).Panicf(...)
	/home/runner/work/fabric/fabric/vendor/go.uber.org/zap/sugar.go:159
github.com/hyperledger/fabric/common/flogging.(*FabricLogger).Panicf(...)
	/home/runner/work/fabric/fabric/common/flogging/zap.go:74
github.com/hyperledger/fabric/orderer/common/follower.(*Chain).pull(0xc000468120)
	/home/runner/work/fabric/fabric/orderer/common/follower/follower_chain.go:351 +0x478
github.com/hyperledger/fabric/orderer/common/follower.(*Chain).run(0xc000468120)
	/home/runner/work/fabric/fabric/orderer/common/follower/follower_chain.go:299 +0x156
created by github.com/hyperledger/fabric/orderer/common/follower.(*Chain).Start
	/home/runner/work/fabric/fabric/orderer/common/follower/follower_chain.go:221 +0x1de
FAIL	github.com/hyperledger/fabric/orderer/common/follower	1.559s

Steps to reproduce

The addition of the check that the join block is identical to the fetched block causes failures in tests.

github.com/hyperledger/fabric/orderer/common/follower.(*Chain).pull(0xc000468120)
	/home/runner/work/fabric/fabric/orderer/common/follower/follower_chain.go:351 +0x478

This could be fixed by changing the tests.

However, this is a more serious problem: we should not panic the orderer because of this, just fail joining / syncing to the channel. The right place to do this in BFT is in the block puller, where we can retry fetching it. Moreover, if we get a properly signed BFT block from the orderers, and it is different from the join-block delivered by the admin, maybe the admin got it wrong? Maybe he got it from one of the malicious orderers?

I am going to remove this check and revisit this when we implement a better policy when we integrate a BFT block puller to the orderer in #4240 , I added a comment there to remind us of this issue.

@yacovm
Copy link
Contributor

yacovm commented Jun 6, 2023

Moreover, if we get a properly signed BFT block from the orderers, and it is different from the join-block delivered by the admin, maybe the admin got it wrong? Maybe he got it from one of the malicious orderers?

If the join block is correct then it means the block the orderer got, is not signed properly, but it was duped into believing it was signed properly because it replicated the wrong genesis block (remember, we cannot verify the genesis block).

If the join block is incorrect then the admin didn't do its job properly, as it was supposed to retrieve the correct block.

I think it's more dangerous to have an orderer that has the wrong chain running than having it crash and not serve a fake chain.

@tock-ibm
Copy link
Contributor Author

Done

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants