Skip to content

Manifest Sibling Resolution

Scott Lystig Fritchie edited this page Feb 12, 2013 · 9 revisions

Manifest Resolution

I'm proposing a bit of a change in storage (a change from the previous proposal, either way is a change from what we currently do), where we store manifests in a dict instead of a set. The keys in the dict will be the manifest UUIDs.

The workflow is always:

  1. read

  2. resolve

    1. use dict:merge/3 with a fun that merges two manifests with the same UUID together.

      1. The merge function is detailed in the second half of this page, under the "merge algorithm heading"
  3. perform any pruning

    1. pruning is done when there is a UUID that has been marked as deleted and is still around after N seconds, where N is some configurable time. This also means that when manifests are marked as deleted, a delete timestamp is written as well. Note that there are two scenarios that can happen when two actors have the same view of a deleted manifest. Both are OK.

      1. Both actors decide that the manifest is old enough to be pruned. There are no issues here because the manifest will simply be gone.

      2. One actors clock says it time for the manifest to be pruned, and the other says it should still stick around. This is OK because if the manifest "comes back" after being deleted by one actor, we simply delete it again when our clock says it's time for it to be deleted.

  4. spawn any procs based on info in manifests (ie. cleanup after delete procs that crashed)

    1. Sometimes a deletion process will die, and we want to start the delete over again later once we realize this has happened. When we're performing a delete, the first thing we do is write to the manifest taking out a "lease" on deleting the manifest. We check-in occasionaly (every couple seconds) to show that we're still alive and deleting. So when we are examining the dict of manifests after doing a read and resolve, we might find that a deletion process has taken out the lease but hasn't checked in after N seconds. When this happens, we'll launch a deletion process ourselves and take out a new lease (maybe leases are a list?). There is a scenario here when two or more actors can decide to perform a delete. This isn't ideal from a resource point of view, but since deleting blocks is idempotent, it doesn't matter from a correctness point of view.
  5. add new information (that was stored in-memory), this new information also needs to be resolved with the manifest that was retrieved from Riak.

  6. write back to Riak

Merge Algorithm

Legend:

W = Writing A = Active P = Pending-Delete D = Deleted

{W, W} = W

BlocksRemainingToWrite = sets:intersection(Rem1, Rem2).
LastWrittenTime = helper:latest(Time1, Time2).

{W, A} = A

Simply choose the A manifest, as A must causally happen after W.

{W, P} = P

Simply choose the P manifest, as P must causally happen after W.

{W, D} = D

Simply choose the D manifest, as D must causally happen after W.

{A, A} = A

These must be equivalent, because only a single actor can take a given manifest from the W to A state. We should assert this.

Multipart uploads: There is one additional detail here. There is effectively a new manifest state (call it a "substate") during the transition from writing state to the active state. It is possible for an upload ID to have 5 parts (call them 1-5), but the completed object would only use parts 1, 4, and 5. In such a case, we must somehow garbage collect parts 2 and 3.

For merging purposes, if manifest M1 contains the multipart_clean property in the props list, but manifest M2 does not, then M1 is a newer manifest.

See also: Object-Chunking-and-Garbage-Collection, the "Multipart Upload Support" section.

{A, P} = P

Simply choose the P manifest, as P must causally happen after W.

{A, D} = D

Simply choose the D manifest, as D must causally happen after W.

{P, P} = P

This case happens in two scenarios, which we treat the same.

  1. Two actors are deleting the same manifest

  2. The last write from the deleting process didn't update every replica.

BlocksRemainingToDelete = sets:intersection(Rem1, Rem2).
LastDeletedTime = helper:latest(Time1, Time2).

{P, D} = D

Simply choose the D manifest, as D must causally happen after W.

{D, D} = D

Either the manifests are identical, or two actors have successfully deleted the same manifest. In this case, choose the earliest delete completion time.

EarliestDeletedTime = helper:earliest(Time1, Time2).