Skip to content
This repository has been archived by the owner on Aug 11, 2021. It is now read-only.

Proposal: Changes to IPLD interface to support chunked nodes. #41

Closed
mikeal opened this issue Jun 26, 2018 · 15 comments
Closed

Proposal: Changes to IPLD interface to support chunked nodes. #41

mikeal opened this issue Jun 26, 2018 · 15 comments
Labels
api-review Tackle during the API review

Comments

@mikeal
Copy link
Contributor

mikeal commented Jun 26, 2018

Here's the problem I've got: I need nodes that are larger than 2MB.

This is not a "sharding" problem as it has been typically categorized. I still have a max size of 10MB, I just need to support nodes that are larger than the bitswap max block size. A 5MB cbor binary is a relatively small object that can fit into memory in programs.

In order to do this today you basically have to implement sharding semantics at every layer of a graph that ever might have just a few thousand links in it. That's not very reasonable, I have a data set which has millions of keys at the top level and I'm doing sharding at that layer, but closer to the leaf nodes there's typically less than 100 links, but every once in a while there will be a few thousand.

However, once you go down the rabbit hole of "how do I chunk a single dag node into a couple blocks" you reach some limitations in the current IPLD interface.

  • You need a way for the implementation to request additional blocks in order to deserialize.
  • The current interfaces that return a single block for serialization need to be able to return many. Same goes for the cid interface, it needs to be able to return multiple cid's for all the blocks necessary to construct the node.

For efficiency, there's a few other things we should be doing as well.

  • Interfaces that return many things (like tree) should return iterables instead of arrays. Arrays are still fine, as they are iterable, but the spec should allow any iterable. If those items are promises they should be resolved.

Once you're doing all this, you realize that you can also offload more of the path resolution to the implementation, since it now has a way to request additional blocks.

Putting all this together, you end up with something very different than the current spec.

I went ahead and documented all of this and wrote an implementation of dag-cbor that will chunk nodes larger than 1MB but still reject nodes larger than 10MB (although that is configurable). Please don't get caught up in the async/await aspects of the proposal and implementation. I'm not trying to have a big callbacks vs promises debate, this was just the most expedient way to write the reference implementation. However, without async iterables some of the interfaces would get pretty hairy to implement with something like streams :(

I'd like to treat this proposal as a discussion rather than an PR. I'm not strongly tied to any particular names or structure. I wrote a full implementation because my ideas needed to be flushed out a bit and it's much easier to show something working than just describe a bunch of ideas.

Proposal for alternative interface for IPLD.

ipld(getBlock)

Must return IPLD Interface.

getBlock is an async function that accepts a CID and returns a promise for
the binary blob of that CID.

IPLD.serialize(native object)

Takes a native object that can be serialized.

Returns an iterable. All items in iterable much be instances of Block or
promises that resolve instances of Block.

When returning multiple blocks the last block must be the root block.

IPLD.deserialize(buffer)

Takes a binary blob to be deserialized.

Returns a promise to a native object.

IPLD.tree(buffer)

Takes a binary blob of a serialzed node.

Returns an iterable. All item sin iterable must be either strings or promises that resolve to strings.

IPLD.resolve(buffer, path)

Takes a binary blob of a serialized node and a path to child links.

Returns a promise to an object with two properties: value and remaining.

value must be either a deserialized node or a CID instance.

remaining must be a string of the remaining path.

Throws an Error() when path cannot be resolved. Error instance should have a
.code attribute set to 404.

IPLD.cids(buffer)

Takes a binary blob of a serialize node.

Returns an iterator. All items in the iterator must be instances of CIDor promises that resolve to instances of CID.

Returns only the CID's required to deserialize this node. Must not contain CID's of named links.

@mikeal
Copy link
Contributor Author

mikeal commented Jun 26, 2018

A few more thoughts on "sharding vs chunking."

There is clearly a case where, when someone has millions of links, or even just thousands of links in a node they routinely need to update, the user will need some form of sharding. That case clearly needs to be handled in a layer above the IPLD dag implementation.

But, keep something in mind: a user has no way of knowing how large a new node they are creating will be until they try to create it.

  • Different serialization formats have different sizes of binary representation.
  • The internal memory a program uses for a native version of the object also varies wildly by language.
  • Theoretically, implementations could convert alternative native objects to these formats as well (image a JS dag-cbor implementation that used Map instead of object).

There is a threshold where an implementation will want to say, or potentially ask the user to configure, "I'm not going to load a node this size into memory." But that threshold varies wildly between different peers and in most cases is larger than the current 2MB limit we need to maintain for bitswap.

If we don't address this at this layer, if we lump all of these cases into the same higher level sharding bucket, we're making the lower level primitives unusable for most cases. I'm working with the github archive data, and there's just a few places where the edge nodes need to hold a few thousand links. Because of these few edge cases, among millions of edge nodes, I would have to implement sharding for all edge nodes in order to create predictable paths.

We would end up pushing a lot of applications onto higher order abstractions, like MFS, or potentially just push them into other stacks outside of IPLD all together.

@vmx
Copy link
Member

vmx commented Jun 27, 2018

Thanks for the proposal @mikeal.

The 2MB limitation from Bitswap is kind of arbitrary and not really related to IPLD. It's on a different level, the transport. So from the IPLD API perspective there should always be a 1:1 mapping between a CID and a block, no matter how big that block is.

So if you store a Git object, which is bigger than 2MB, that needs to be possible, without changing the hash of it. If you would chunk it on an IPLD level, then the hash would be different.

It might make sense to perhaps even chunk the data on a storage level. Though before putting thoughts into that, I would rather spend time solving the transport layer (Bitswap) problem on how to transfer large IPLD objects in a chunked manner.

@mikeal
Copy link
Contributor Author

mikeal commented Jun 27, 2018

That's fair, although I will note that the larger the block the less efficient any transport will be able to retrieve it. Since there is only one hash for the entire block the smallest unit you can validate is the whole block, so if you try to grab different parts of the block from different peers you can't validate anything until you get the whole block and if a peer gave you invalid data you don't know which peer did it. That's why bittorrent clients only get one part per peer, and if they decide to get a part from another peer they just request the whole part.

Aside from the "splitting node into parts" use case, is there a performance case for altering the tree() API to be a generator rather than having to load all the keys into memory?

@vmx
Copy link
Member

vmx commented Jun 27, 2018

Good points about the splitting. Storing objects of any size while keeping their hash needs to be supported for e.g. Git. But I agree there should be some layer for automatic splitting. And you raise valid points about where this splitting should be. Within IPLD or in a layer above. I need to put more thought into this.

Having tree() being a generator is a good idea.

@Stebalien
Copy link

So, we thought long and hard about adding sharding and chunking to IPLD but decided against it to try to try to keep the layer as simple as possible (it's already pretty complex).

However, we would like a tiny layer on-top-of IPLD that implements automatic sharding.

Since there is only one hash for the entire block that smallest unit you can validate is the whole block,

I've discussed some solutions here: https://discuss.ipfs.io/t/git-on-ipfs-links-and-references/730/4

However, for new systems, my ideal solution would be a second layer that transforms a sharded DAG into an unsharded DAG.

@mikeal
Copy link
Contributor Author

mikeal commented Jun 27, 2018

I think I'm getting a little confused about what "in IPLD" vs "on-top-of IPLD" means in this context.

To me, "a tiny layer on-top-of IPLD that implements automatic sharding" is basically what https://github.com/mikeal/alt-ipld-interface/blob/master/cbor.js#L187 does. It creates a valid CBOR node with a special property that points to a list of parts for a large CBOR node. The sharding is implemented transparently via the interface, but at a spec/protocol level it is a bunch of valid IPLD nodes (one dag-cbor node and many raw nodes).

You have to understand the convention for describing the larger node in order to know what it truly means, so other implementations of dag-cbor would have to support this "feature."

I interpreted this repo/project as "the way you describe an IPLD codec implementation for other libraries to consume." If the sharding is to be automated, what layer would that automation occur? If we only implement it in IPFS that means that the paths, which are defined at the IPLD layer, won't be consistent between IPFS and other implementations of IPLD that use the codec implementations.

@mikeal
Copy link
Contributor Author

mikeal commented Jun 27, 2018

Also, I'll just add that this is just a POC and there is a probably a better way to describe a node that should be constructed out of parts of other CIDs, similar to what we're doing with tagged CID's when describing links in dag-cbor today.

@Stebalien
Copy link

I think I'm getting a little confused about what "in IPLD" vs "on-top-of IPLD" means in this context.

"in IPLD" means that it's part of the IPLD spec. "on-top-of IPLD" means not in the IPLD spec. At the moment, the IPLD spec assumes that IPLD nodes are atomic.

Basically, it would be an entirely new system (kind of like unixfs) built on-top of IPLD. This would leave IPLD as a building block for describing arbitrary DAGs and allow users to build more complicated systems on-top of it.


Note: this may change slightly when/if we introduce a type system as we may want to put the type description in a separate object and we may need the type description to fully decode the object. However, the type system is just a vague idea for the moment.

@mikeal
Copy link
Contributor Author

mikeal commented Jun 27, 2018

@Stebalien got it.

"the IPLD spec assumes that IPLD nodes are atomic"

Where does that spec live and should we start a thread about this topic?

At the moment, we're sort of stuck.

  • It sounds like we want to constrain the interface to the assumptions in the spec (IPLD nodes are atomic).
  • We can't put this feature in the codec implementation because it conforms to the interface.
  • If the feature isn't in the codec implementation there isn't a way for others to easily support it.

I do see a moral hazard in supporting something like this though. Where does it end? Do we also start supporting a bunch of other random sharding semantics in all the codec implementations? On the other hand, I just don't yet see a place we can do it yet.

@vmx
Copy link
Member

vmx commented Jun 28, 2018

@Stebalien wrote:

I've discussed some solutions here: https://discuss.ipfs.io/t/git-on-ipfs-links-and-references/730/4

For me the discussion reads like there isn't really any practical solution, yet.

However, for new systems, my ideal solution would be a second layer that transforms a sharded DAG into an unsharded DAG.

I don't understand what you mean, can you please explain this a bit more? Do you mean with "new systems" "something that we can't do, but should have done", or "now is the chance to do it that way".

@ivan386
Copy link

ivan386 commented Jun 29, 2018

@mikeal
For blocks larger than 2MB or to allow check block parts need to use tree hash. To allow bitswap exchange block parts need little protocol upgrade.

@Stebalien
Copy link

@mikeal

Where does that spec live and should we start a thread about this topic?

Fair enough... There is no written down spec. At the moment, it's a bunch of implementations and a bunch of discussion.

Unfortunately, I can't find the relevant discussions.

On the other hand, I just don't yet see a place we can do it yet.

The idea is that sharding would be up to the application. We'd come up with a standard set of tools/types that applications could use to shard up their large datastructures.

For me the discussion reads like there isn't really any practical solution, yet.

Correct. Just a conclusion that it can be done at a layer above IPLD.

I don't understand what you mean, can you please explain this a bit more? Do you mean with "new systems" "something that we can't do, but should have done", or "now is the chance to do it that way".

By "new systems" I mean "not git". That is, we can't go back and shard git objects but we can come up with a thin sharding layer and build new systems (unixfsv2, etc.) on top of these sharding primitives.

@mikeal
Copy link
Contributor Author

mikeal commented Jul 23, 2018

By "new systems" I mean "not git". That is, we can't go back and shard git objects but we can come up with a thin sharding layer and build new systems (unixfsv2, etc.) on top of these sharding primitives.

Ah, this makes a lot more sense now. Thanks!

@vmx
Copy link
Member

vmx commented Nov 21, 2018

@mikeal This issue certainly doesn't belong to the interface-ipld-format spec. Rather to the js-ipld one, though I still think it's a layer on top. What do you think?

@vmx vmx added the api-review Tackle during the API review label Nov 21, 2018
@mikeal
Copy link
Contributor Author

mikeal commented Nov 21, 2018

Let's close this out. I think that we're all agreed that this interface layer should be atomic at the block level and that what we're discussing here should be a layer above.

@vmx vmx closed this as completed Nov 21, 2018
@ghost ghost removed the backlog label Nov 21, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
api-review Tackle during the API review
Projects
None yet
Development

No branches or pull requests

5 participants