Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fs-storage: Conflicts resolution (Monoid/CRDTs) #19

Open
kirillt opened this issue Mar 16, 2024 · 11 comments
Open

fs-storage: Conflicts resolution (Monoid/CRDTs) #19

kirillt opened this issue Mar 16, 2024 · 11 comments
Assignees

Comments

@kirillt
Copy link
Member

kirillt commented Mar 16, 2024

Port Monoid structure used during conflicts resolution. It's naive approach but it's simpler than CRDT.

@twitu
Copy link
Collaborator

twitu commented Mar 25, 2024

https://github.com/ARK-Builders/arklib-android/blob/main/lib/src/main/java/dev/arkbuilders/arklib/data/storage/Monoid.kt#L3

The kotlin monoid implementation is functionally equivalent to what modern CRDT libraries are offering. While we can go for a custom implementation it is error prone and require extra dev effort for implementing and maintaining. I can look into existing CRDT rust implementations for a feature set that suits us.

@twitu
Copy link
Collaborator

twitu commented Mar 25, 2024

Rust has two popular libraries for CRDTs - crdts and automerge. It appears that automerge is the more popular, well-maintained and well-featured library.

Now for the actual structures I looked at the actual structures that implement monoids in the Kotlin library

Class Structure
Tags Set
Properties Set
Score Int

So it appears that for the current structures we don't need to pull in a CRDT implementation. In fact HashSet already implement a union function, in fact Kotlin too uses the standard library implementation for union. So only a special function for integers is needed. CRDTs can be considered later when we need to add structures that need more powerful combine semantics.

@kirillt
Copy link
Member Author

kirillt commented Apr 1, 2024

@twitu yes, if we can use a better structure we could throw Monoids away. You are right that HashSet is a monoid itself, I need to refresh my memory about how exactly Monoids been used in Kotlin implementation.

But I remember that there was a problem with Monoid-like structures that we cannot remove data:

  1. User has phone P and laptop L connected via Syncthing, so storages are synced as plain files.
  2. Both P and L run the same app, listening the storages on their disks.
  3. User adds entry X into the app on P, it is persisted, synced to L's disk and loaded into L's instance of the app.
  4. Now, the user deletes X from the app on P, we remove it from P's disk, app on L detects the change but overwrites the storage with older version because union(S+X, S)=S+X for any storage S and entry X.

Can we solve this problem using CRDTs?

@kirillt kirillt moved this from In Progress to Todo in Development Apr 3, 2024
@kirillt kirillt changed the title Port Monoid-based conflicts resolution from Kotlin fs-storage: Port Monoid-based conflicts resolution from Kotlin Apr 3, 2024
@kirillt
Copy link
Member Author

kirillt commented Apr 3, 2024

Btw, am I right that automerge has Rust core and JS wrapper? I just see that there are much more JS code in their repo.

tareknaser pushed a commit to tareknaser/ark-rust that referenced this issue Apr 6, 2024
tareknaser pushed a commit to tareknaser/ark-rust that referenced this issue Apr 6, 2024
tareknaser pushed a commit to tareknaser/ark-rust that referenced this issue Apr 6, 2024
tareknaser pushed a commit to tareknaser/ark-rust that referenced this issue Apr 6, 2024
tareknaser pushed a commit that referenced this issue Apr 8, 2024
tareknaser pushed a commit to tareknaser/ark-rust that referenced this issue Apr 9, 2024
tareknaser pushed a commit to tareknaser/ark-rust that referenced this issue Apr 9, 2024
kirillt pushed a commit that referenced this issue Apr 10, 2024
@twitu
Copy link
Collaborator

twitu commented Apr 10, 2024

Monoids cannot handle the delete case directly. There are ways to use monoids for delete by adding soft delete flags for records. That is a pretty good solution for most use cases. Monoids will be a good solution for the uses cases (HashSet/Int) that we have in the Kotlin app.

Yes automerge has a rust core and JS bindings for WASM. I guess they've strongly positioned for use with client side web apps so it has some JS to make it ergonomic to use on the client side. It's still a pretty well featured rust library we can use if needed.

My concern however is that it uses it's own data structures for Maps and Lists and the likes. So there will be a conversion overhead from the standard library BTreeMap/HashMap/HashSet to automerge's data structures so that different versions can be merged. I would put off pulling in such a complex dependencies until we really need it.

@kirillt kirillt moved this from Todo to In Progress in Development Apr 20, 2024
@kirillt
Copy link
Member Author

kirillt commented Apr 25, 2024

We'll be able to define a single "sync with disk" function, similarly to how it was in BaseStorage.kt, after the conflicts resolution is implemented.

@kirillt
Copy link
Member Author

kirillt commented Apr 28, 2024

Ishan's idea: no need to implement CRDT if we just do "soft-delete" instead of regular deletion. Adding second Monoid instance would help us in syncing deletions between devices.

@github-project-automation github-project-automation bot moved this from In Progress to Done in Development May 25, 2024
@kirillt kirillt moved this from Done to In Progress in Development May 25, 2024
@kirillt
Copy link
Member Author

kirillt commented May 25, 2024

Merged:

We also need:

  • implement a function for updating the storage with the disk (extract new mapping from fs, call merge_from)
  • discuss writing strategies:
    • how often we persist the updated model?
    • do we persist the updated model immediately after the sync?
    • do we allow manual syncing?
    • we might need something like a PersistenceStrategy:
      • struct with boolean fields corresponding to these questions
      • trait with hooks handling sync events, this one could also be used for other scenarios

@kirillt kirillt reopened this May 25, 2024
@kirillt kirillt changed the title fs-storage: Port Monoid-based conflicts resolution from Kotlin fs-storage: Conflicts resolution (Monoid/CRDTs) Jun 29, 2024
@kirillt
Copy link
Member Author

kirillt commented Jun 29, 2024

Basing on this article, it seems that we can use the following data types:

  • PN-counters for scores
  • U-sets for tags
  • U-sets for properties
  • U-sets for the storage tables themselves

It seems that U-set can be used for maps, not only sets, by interpreting a map as a set of entries.

@tareknaser @twitu

@twitu
Copy link
Collaborator

twitu commented Jul 4, 2024

One concern I see with U-set is that it won't preserve the order of insertion property of BTreeMaps. In #63 we implemented something very similar to a last writer wins (LWW) CRDT by merging the BTreeMap values based on the most recent timestamp.

The main question to consider is how we intend to use insertion order?

Perhaps, if we can relax the requirement for maintaining insertion order we can consider using the U-set or maybe the LWW-Element-Set which records the timestamp for each element which can be used as an approximation for insertion order.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In Progress
Development

No branches or pull requests

2 participants