Skip to content
This repository has been archived by the owner on Nov 15, 2023. It is now read-only.

Remove the child trie feature #12461

Open
tomaka opened this issue Oct 10, 2022 · 8 comments
Open

Remove the child trie feature #12461

tomaka opened this issue Oct 10, 2022 · 8 comments

Comments

@tomaka
Copy link
Contributor

tomaka commented Oct 10, 2022

I would suggest to remove child tries before they make it in production.

Conceptually this feature doesn't bring any advantage over a single trie.
It is overly complicated, with many different host functions, corner cases (w3f/polkadot-spec#540), and a weird concept of a "default" child trie.
It also pollutes many different other aspects of the client (e.g. the JSON-RPC API needs to know about them, the chain specs format needs to know about them).

It seems that its only advantage is that it is easier to do it this way than implement retrieving the hash of trie node. However the trade-off in terms of complexity is IMO hilariously in favor of removing child trie.

@cheme
Copy link
Contributor

cheme commented Oct 10, 2022

Child tries are already in production.

Removing them is still somehow partially doable: migrate some data, personally I think it makes sense to have them iff we want to have different kind of child trie in the future. But the host function are already quite called and would need to remain.

Conceptually over a single unbalance trie it forces what would otherwise be a branch to be at a given location (but this could be enforced by setting null value where we want to ensure there is a branch to be able to use its root).
I agree that it is somehow a weak advantage (no smaller proof from it, maybe a bit clearer if we want to split proof, but I wrote iff in my previous sentence).

"Default child trie" should be name differently as it seems confusing, this is the default kind of child trie :the child trie where the state is using the same merkle structure as its parent state.

@arkpar
Copy link
Member

arkpar commented Oct 10, 2022

We are going to eventually support amortized O(1) deletion of child tries at the database level.

@tomaka What do you mean by "retrieving the hash of trie node. " and how would that replace the child tree functionality?

@tomaka
Copy link
Contributor Author

tomaka commented Oct 11, 2022

What do you mean by "retrieving the hash of trie node. " and how would that replace the child tree functionality?

Correct me if I'm wrong, but when the runtime wants to write key K of child trie C, it could instead simply write the key concat(C, K).
The only feature that child tries provide and that can't be easily replicated right now is the ability to determine their trie root hash, but this could be replicated by adding a host function that returns the node value of a certain trie node.

As far as I know, child tries are used only for contracts, so that contracts are sandboxed. I don't exactly know how the sandboxing works, but I assume that we restrict the contract so that it can only read/write from/to a specific child trie. However I don't see how that couldn't be easily changed to restrict the contract to a certain trie prefix instead.

@bkchr
Copy link
Member

bkchr commented Oct 11, 2022

Child tries are already used by Polkadot for crowdloans. All contributions per crowdloan are stored in one child trie.

@cheme
Copy link
Contributor

cheme commented Oct 11, 2022

Also used by crowdloan in polkadot and kusama. I remember parachains team asking question about the api, so I would not be surprise if used by some parachain.
I also know that regularly people team are looking at using some specialized child trie for their projects, and I really support the idea of allowing different kind of child trie/state (which current CT implementation show the foundation).
But making it generic is quite challenging (I remember trying it a while back but not delivering).

IMO, out of the fact that CT are already used, CT should not be removed and quite the opposite: it should be improved to allow adding custom or additional states.

One could argue that what is needed is just for substrate to allow one customizable state, and let the users compose their state by themself only if strictly needed, not false, and pretty much well aligned with polkadot verification model. But having this top standard state ensure a common api for pallets (but yes you can always reduce the core code and move a lot for users to implement themselves, could indeed also make sense for child state).

The only feature that child tries provide and that can't be easily replicated right now is the ability to determine their trie root hash, but this could be replicated by adding a host function that returns the node value of a certain trie node.

and forcing a parent node to be at C - 1 nibble so the partial key at the start of node K is in the root of K. Or having a changing key C return with root. (just mentioning, not something that should be taken in consideration, the question is more api related I think)

@tomaka
Copy link
Contributor Author

tomaka commented Oct 11, 2022

it should be improved to allow adding custom or additional states.

Why should it? What's an example use case?

Personally I find it completely insane that we even consider that kind of insanely complex design.
Can we not declare the trie as "done", remove child tries as too weird and complex, and move on to something else? Do we really have to dig into the trie further and further just for the sake of justifying our jobs as engineers?

@cheme
Copy link
Contributor

cheme commented Oct 11, 2022

Personally I find it completely insane that we even consider that kind of insanely complex design.

On paper the design is just writing merkle proof of another state in a parent state and allow composing both state proof.
As people use to say api design is hard, maybe this is the issue. Or it is just the thing in place that are too complex and would
require an overhaul, or removal as you propose.
I really don't see composing two merkle state as 'insanely' complex, it is just put the merkle root of a second state in the parent one.

Can we not declare the trie as "done", remove child tries as too weird and complex, and move on to something else?

Sounds doable.

Do we really have to dig into the trie further and further just for the sake of justifying our jobs as engineers?

Sorry if I frighten you, I was just saying that improving child state could be good, I was not implying a massive task force was in place, for instance I think the last time someone look at this problematic was me and shortly to make a example branch for proposing an exercise for academy. Last change to child trie is really old I think (except the removal api change which is merely applying same constraint you have with remove prefix in trie).

What's an example use case?

Could attach some small in memory trie with different cost/weight model, could add mmr for indexed based api, could add state with zk proof (I know some are looking in this direction).
But as I mention you can always try to entirely switch the substrate state to something else, having child trie only allow to compose state in a easier way (could be fully under user responsability).

@burdges
Copy link

burdges commented Oct 11, 2022

Afaik we should do O(1) deletion below any node, not merely at child tries.

We discussed trie write-locks for runtime multi-threading too, but they too should happen anywhere, not just at child trie boundaries.

I think it makes sense to have them iff we want to have different kind of child trie in the future.

We do want alternative storage proof types, like MMR, KZG, and zk proof friendly Merkle trees, especially for zkUTXOs ala ZCash sapling, ZEXE, etc., and non-membership proof friendly nullifier sets. Ain't clear how these fit in practice though, like do we make them easily accessible from substrate code, or do we make people handle them manually. In particular, we've had trouble using trie proofs into older past state, which maybe fixed now, but likely not.. I think older past state shows up way more alongside alternative proofs, so maybe they wind up manually anyways. ¯_ (ツ)_/¯

Is the mental model that child tries work like mount points? If so, one could even attach them dynamically during initialization or whatever. At least conceptually, each parachain shares a small trie with the relay chain, into which we'll place things like XMP roots, and maybe simpler if you lean into the mount points model. I suppose the mount points model also simplifies one parachain sharing a subtrie with another parachain, just by attaching merkle proofs to a message.

Conceptually over a single unbalance trie it forces what would otherwise be a branch to be at a given location (but this could be enforced by setting null value where we want to ensure there is a branch to be able to use its root).

At least to me, this sounds like "domain separation" in hashing, which simply means "users cannot touch this part of the hash input". As as example, if you've a binary tree, then you could index via a slice of enum { Left, Right, Leaf, DomainSeparator } where user bits encode only to Left or Right, but only the application itself can encode a DomainSeparator (or a `Leaf).

You can always do domain separators manually by altering user inputs, but this brings mental costs elsewhere, like downstream standards efforts, or insecurity from broken domain separation. It's likely child tries wind up more complex than what one wants from domain seperation, but the domain seperation functionality likely matters.

tl;dr It sounds like child tries maybe conflating the role of "mount points" and "domain separators".

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants