Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Snapshot save error #228

Open
uber42 opened this issue Feb 7, 2022 · 7 comments
Open

Snapshot save error #228

uber42 opened this issue Feb 7, 2022 · 7 comments

Comments

@uber42
Copy link
Contributor

uber42 commented Feb 7, 2022

panic: /home/user/repos/storage-69397425/3/raft/dev/00000000000000000001/snapshot-part-1/snapshot-1-3 doesn't exist when creating /home/user/repos/storage-69397425/3/raft/dev/00000000000000000001/snapshot-part-1/snapshot-1-3/snapshot-00000000000003E9-3.generating

goroutine 350 [running]:
github.com/lni/dragonboat/v3/internal/fileutil.Mkdir({0xc004358280, 0x97}, {0x1639918, 0x1d79520})
        /home/user/go/pkg/mod/github.com/lni/dragonboat/v3@v3.3.1/internal/fileutil/utils.go:122 +0x2dc
github.com/lni/dragonboat/v3/internal/server.(*SSEnv).createDir(0xc01f9486f0, {0xc004358280, 0x97})
        /home/user/go/pkg/mod/github.com/lni/dragonboat/v3@v3.3.1/internal/server/snapshotenv.go:251 +0x86
github.com/lni/dragonboat/v3/internal/server.(*SSEnv).CreateTempDir(0xc01f9486f0)
        /home/user/go/pkg/mod/github.com/lni/dragonboat/v3@v3.3.1/internal/server/snapshotenv.go:200 +0x45
github.com/lni/dragonboat/v3.(*snapshotter).Save(_, {_, _}, {0x3, 0x3e9, 0x169, 0x3e9, {0x0, 0x0, {0x0, ...}, ...}, ...})
        /home/user/go/pkg/mod/github.com/lni/dragonboat/v3@v3.3.1/snapshotter.go:104 +0x125
github.com/lni/dragonboat/v3/internal/rsm.(*StateMachine).doSave(_, {0x3, 0x3e9, 0x169, 0x3e9, {0x0, 0x0, {0x0, 0x0}, 0x0, ...}, ...})
        /home/user/go/pkg/mod/github.com/lni/dragonboat/v3@v3.3.1/internal/rsm/statemachine.go:802 +0x193
github.com/lni/dragonboat/v3/internal/rsm.(*StateMachine).concurrentSave(_, {_, _, {_, _}, _, _})
        /home/user/go/pkg/mod/github.com/lni/dragonboat/v3@v3.3.1/internal/rsm/statemachine.go:758 +0x358
github.com/lni/dragonboat/v3/internal/rsm.(*StateMachine).Save(_, {_, _, {_, _}, _, _})
        /home/user/go/pkg/mod/github.com/lni/dragonboat/v3@v3.3.1/internal/rsm/statemachine.go:509 +0x2a5
github.com/lni/dragonboat/v3.(*node).doSave(0xc000420800, {0x0, 0x0, {0x0, 0x0}, 0x0, 0x0})
        /home/user/go/pkg/mod/github.com/lni/dragonboat/v3@v3.3.1/node.go:705 +0x2d6
github.com/lni/dragonboat/v3.(*node).save(0xc000420800, {0x0, 0x0, 0x0, 0x0, 0x0, 0x1, 0x0, 0x0, 0x0, ...})
        /home/user/go/pkg/mod/github.com/lni/dragonboat/v3@v3.3.1/node.go:684 +0x7b
github.com/lni/dragonboat/v3.(*ssWorker).save(0xc0003a9f60, {{0x0, 0x0, 0x0, 0x0, 0x0, 0x1, 0x0, 0x0, 0x0, ...}, ...})
        /home/user/go/pkg/mod/github.com/lni/dragonboat/v3@v3.3.1/engine.go:296 +0x78
github.com/lni/dragonboat/v3.(*ssWorker).handle(0xc0003a9f60, {{0x0, 0x0, 0x0, 0x0, 0x0, 0x1, 0x0, 0x0, 0x0, ...}, ...})
        /home/user/go/pkg/mod/github.com/lni/dragonboat/v3@v3.3.1/engine.go:279 +0xba
github.com/lni/dragonboat/v3.(*ssWorker).workerMain(0xc0003a9f60)
        /home/user/go/pkg/mod/github.com/lni/dragonboat/v3@v3.3.1/engine.go:265 +0x1bb
github.com/lni/dragonboat/v3.newSSWorker.func1()
        /home/user/go/pkg/mod/github.com/lni/dragonboat/v3@v3.3.1/engine.go:251 +0x25
github.com/lni/goutils/syncutil.(*Stopper).runWorker.func1()
        /home/user/go/pkg/mod/github.com/lni/goutils@v1.3.0/syncutil/stopper.go:79 +0x173
created by github.com/lni/goutils/syncutil.(*Stopper).runWorker
        /home/user/go/pkg/mod/github.com/lni/goutils@v1.3.0/syncutil/stopper.go:74 +0x133

Dragonboat version

v3.3.1

Steps to reproduce the behavior

Couldn't reproduce again

@lni
Copy link
Owner

lni commented Feb 11, 2022

hi @uber42 , thanks for reporting the above issue.

Could you please confirm what filesystem was used? It is a local file system or some networked file system like NFS?

@uber42
Copy link
Contributor Author

uber42 commented Feb 12, 2022

hi, I use ext4

@lni
Copy link
Owner

lni commented Feb 13, 2022

@uber42 thanks for the info.

As you can see from the error log -

/home/user/repos/storage-69397425/3/raft/dev/00000000000000000001/snapshot-part-1/snapshot-1-3 doesn't exist when creating /home/user/repos/storage-69397425/3/raft/dev/00000000000000000001/snapshot-part-1/snapshot-1-3/snapshot-00000000000003E9-3.generating

the dir "/home/user/repos/storage-69397425/3/raft/dev/00000000000000000001/snapshot-part-1/snapshot-1-3" is missing when a new snapshot is about to be created inside it.

this dir is created when the node is started in NodeHost.startCluster(). I don't think there is any code that would delete the dir.

any chance that it might be deleted by some of your code?

@uber42
Copy link
Contributor Author

uber42 commented Feb 13, 2022

The root raft directory cannot be deleted by our code.
This result was obtained while testing our project with various fault injections, including network partition between nodes.
Perhaps a change of leader may appear such behavior.
Logs unfortunately lost :(

@lni
Copy link
Owner

lni commented Feb 14, 2022

@uber42 thanks for the info.

I have the feeling that this issue is highly unlikely to be caused by Dragonboat's code. If you check the source code, node's snapshot dir is never deleted, dragonboat only deletes whats in the directory. Large scale fault injection tests are a part of dragonboat's development process for years, it was fine in all those tests.

Could you please try to re-run your tests and provide the full log when you can reproduce the issue? Really want to help you to get to the bottom of this. Thanks.

@uber42
Copy link
Contributor Author

uber42 commented Feb 14, 2022

I will try to reproduce, but so far this is an isolated case for a very large number of tests.

@lni
Copy link
Owner

lni commented Apr 5, 2022

@uber42 did you manage to get this reproduced?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants