Journal #70
Replies: 7 comments 6 replies
-
@PatrickNicholas This is what we discussed offline before. Maybe some details are missing, I will add more gradually. Welcome to help with this topic :) |
Beta Was this translation helpful? Give feedback.
-
I think the consensus algorithm from LogDevice looks great! LogDevice employs a decoupled architecture, so its design fits our journal very well. |
Beta Was this translation helpful? Give feedback.
-
Boki: Stateful Serverless Computing with Shared Logs another paper about logs. I haven't checked it carefully, just put it here as a reference. |
Beta Was this translation helpful? Give feedback.
-
https://bookkeeper.apache.org/docs/4.5.1/development/protocol/ maybe bookkeeper protocol also can be a reference - - |
Beta Was this translation helpful? Give feedback.
-
How about binding the one journal unit to one compute unit? In other words, a compute unit can only send updates to its corresponding journal unit. The journal unit is just used for persisting data, like a remote disk. I guess the only reason to use the method you mentioned is to reduce one network round-trip time. But if the compute unit is close to its corresponding journal unit or the compute unit is close to each other, the optimization effect is no longer obvious. |
Beta Was this translation helpful? Give feedback.
-
Do you mean the leader compute unit is elected by journal units? Why don't we elect leader compute unit by all of the compute units? |
Beta Was this translation helpful? Give feedback.
-
@huachaohuang I second to @levy5307 that replication is different from leader election in some way. If there is a metadata service in our system (the Kernel, perhaps), we can adopt a quorum base replication. Apache BookKeeper adopts this way as @zojw mentioned at #70 (comment). Here are more references:
... where ZK/etcd can act as the metadata service.
You can check out the talk "How We Build Firebolt" for how Firebolt employs a metadata service on cloud. We don't have to reimplement everything since we're aimed at cooperating on cloud ;-) (at the first place, at least) cc @w41ter-l |
Beta Was this translation helpful? Give feedback.
-
This discussion is about the design and implementation of Journal.
Quorum Journal
We are going to need a strongly-consistent journal to persist recent updates.
One possible implementation is a QuorumJournal. Let's say we have three compute units and three journal units. One of the compute units should be elected as the leader. The algorithm looks like this:
Some more details:
However, there are still a lot of details needed to be figured out. For example, how do we do membership changes?
We can not simply apply the Raft consensus algorithm here because we disaggregate the compute and journal.
References
Beta Was this translation helpful? Give feedback.
All reactions