Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

draft: LSP21 Metadata Discovery (Zero Data Key) #194

Draft
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

CJ42
Copy link
Member

@CJ42 CJ42 commented Mar 17, 2023

What does this PR introduce?

Proposition of a new standard to make ERC725Y Metadata publicly discoverable.

Title: LSP21 Metadata Discovery (Zero Data Key)
Authors: @CJ42 @samuel-videau @Hugoo @CallumGrindle

Background

Background of the discussion can be found in the #standards channel of the LUKSO public Discord server.

This includes discussion with raised by Samuel with feedbacks from Hugo on the current state of this issue in the ecosystem.

Link: https://discord.com/channels/359064931246538762/620552532602912769/930749248365015100

@CJ42 CJ42 changed the title draft LSP21 Metadata Discovery draft: LSP21 Metadata Discovery (Zero Data Key) Mar 17, 2023

Despite the benefits that LSP2 provides, a problem around metadata remains:

> _how does someone that does not know the set of ERC725Y JSON schemas used by a smart contract can read the data from the contract storage in the first place?_
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
> _how does someone that does not know the set of ERC725Y JSON schemas used by a smart contract can read the data from the contract storage in the first place?_
> _how does someone that does not know the set of ERC725Y JSON schemas used by a smart contract read the data from the contract storage in the first place?_


### Existing Solutions

Currently, the JSON schemas can be obtained through various ways, including public/private Github repositories, documentation websites, README, packages or Gist. There is no standard "rules" or recommendations on where and how these schemas should be shared, which leads to a need for a "LSP2 JSON sharing" model, a way to store the link of the JSON Metadat where the schemas can be retrieved from.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Currently, the JSON schemas can be obtained through various ways, including public/private Github repositories, documentation websites, README, packages or Gist. There is no standard "rules" or recommendations on where and how these schemas should be shared, which leads to a need for a "LSP2 JSON sharing" model, a way to store the link of the JSON Metadat where the schemas can be retrieved from.
Currently, the JSON schemas can be obtained through various ways, including public/private Github repositories, documentation websites, README, packages or Gist. There is no standard "rules" or recommendations on where and how these schemas should be shared, which leads to a need for a "LSP2 JSON sharing" model, a way to store the link of the JSON Metadata where the schemas can be retrieved from.

Copy link
Member

@YamenMerhi YamenMerhi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Cool standard!

I have a few questions:

  • I am not sure about the use of Zero Data Key, I agree that it makes it unique and easy to remember but I am not sure if the "easy to remember" argument is strong enough to reserve this data key for this use because when the developers want to query the storage it won't be hard for them to produce the hash of LSP21MetadatDiscovery string.

  • Another point is that now the LSP2 doesn't apply to this standard in terms of keyType, as dataKeys with Singleton keyType should have their name hashed to produce the bytes32 data Key.

  • I suggest having another choice for valueContent which is URL, while verifiability is important for some use cases, there could be cases where I want to add a dataKey to my ERC725Y contract storage and update the Metadata entry off-chain. JSONURL and JSON don't allow this case because, in the first one, there is a reference to the hash of the actual content of the JSON, and the second one is the JSON itself, so updating dataKeys off-chain will not suit these 2 value Contents.

  • It's worth thinking if there are some use cases that require the Zero Data Key more than this standard.

Let's discuss 🚀

@CJ42
Copy link
Member Author

CJ42 commented Mar 17, 2023

I am not sure about the use of Zero Data Key, I agree that it makes it unique and easy to remember but I am not sure if the "easy to remember" argument is strong enough to reserve this data key for this use because when the developers want to query the storage it won't be hard for them to produce the hash of LSP21MetadatDiscovery string.

One of the additional motivation for the zero data key is that although it is easy to generate a data key using a hashing algorithm of a string, it makes relying on keccak256 and therefore dependent on a library.

With the zero data key, you don't have to have battle against some culprits around hashing function that libraries might have.

In particular this one that web3.js had in the past that people used to be confused about (there are two sha3 function in web3.js, one of them does not mimic the behaviour of the solidity sha3).

image

https://web3js.readthedocs.io/en/v1.8.2/web3-utils.html#sha3

https://ethereum.stackexchange.com/questions/96697/soliditys-keccak256-hash-doesnt-match-web3-keccak-hash


Another point is that now the LSP2 doesn't apply to this standard in terms of keyType, as dataKeys with Singleton keyType should have their name hashed to produce the bytes32 data Key.

This should be discussed as how it is described on the LSP2 standard. If I remember @frozeman point, the data key of a Singleton is not necessary the hash of the key name.

For instance you can have the following:

data key name data key
MyStampsCollection:StampNb1 0x4d795374616d7073436f6c6c656374696f6e3a5374616d704e62310000000000
MyStampsCollection:StampNb2 0x4d795374616d7073436f6c6c656374696f6e3a5374616d704e62320000000000
MyStampsCollection:StampNb2 0x4d795374616d7073436f6c6c656374696f6e3a5374616d704e62330000000000

Where the data key is simply the UTF8 encoded of the string name.

I think this is for instance a valid use case. It should not be enforced to use keccak256 for generating the data key name, if some use case prefer alternative ways. Especially because if since keccak256 is pseudo random, you could never map some specific data keys subset to a key name.

So for instance, you could never use a data key that start with 0xaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa3cc0c943ed772c6af1254b06ec5f3c if the keccak256 hash of the key name would have to be enforced.

We could not use neither this other "pseudo example" given by Watchpug:

image

https://github.com/lukso-network/LIPs/blob/main/LSPs/LSP-2-ERC725YJSONSchema.md#data-key-hash

image

So we should clarify the rules for the key name <> key hash/identifier in LSP2 and document it accordingly.


I suggest having another choice for valueContent which is URL, while verifiability is important for some use cases, there could be cases where I want to add a dataKey to my ERC725Y contract storage and update the Metadata entry off-chain. JSONURL and JSON don't allow this case because, in the first one, there is a reference to the hash of the actual content of the JSON, and the second one is the JSON itself, so updating dataKeys off-chain will not suit these 2 value Contents.

I agree with this. URL is more relevant here actually compared to JSONURL.


It's worth thinking if there are some use cases that require the Zero Data Key more than this standard.

It's a hard to predict this and probably unpredictable imo. Implementations that want to use the zero data key for other purposes can decide to simply not adopt this standard, use something and document it in their project repo.

We could think further about the potential use case of the zero data key. I propose it because I think it is a good candidate for the problem @samuel-videau brought up and it came as a solution naturally (Discovering the metadata of an ERC725Y smart contract is something very generic that every one will do, so they will need something as generic and easy to use as possible).

https://discord.com/channels/359064931246538762/620552532602912769/930749248365015100

```

The data stored under the **zero data key** can be one of the following two options:
- **on-chain**: a `JSON` file as utf8 encoded string.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really like this idea of on-chain encoded JSON containing the schema. Could be a bit expensive, though it's for a full contract, so might be worth it in some usecases. Even though I'm not a big fan of that, lot of people are "on-chain maximalists" ahah, and would probably appreciate this "feature". On Ethereum, I saw a lot of on-chain NFT projects, where even the NFT visuals where stored on-chain (either pixels stored on chain or even vectors)

Copy link
Member Author

@CJ42 CJ42 Mar 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the moment, JSON does not exist as a valueContent in LSP2.

@frozeman

I think we should add JSON for valueContent in the LSP2 standard because of this proposal + @samuel-videau points.

With the requirements that JSON valueContent is a stringified JSON data.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

About adding new value content: since serialized JSON is just a JSON string represented as UTF-8 bytes I think we can add both JSON and String. The former will just hint that the content can be decoded as a JSON object.

@samuel-videau
Copy link
Contributor

It's worth thinking if there are some use cases that require the Zero Data Key more than this standard.

It's a hard to predict this and probably unpredictable imo. Implementations that want to use the zero data key for other purposes can decide to simply not adopt this standard, use something and document it in their project repo.

We could think further about the potential use case of the zero data key. I propose it because I think it is a good candidate for the problem @samuel-videau brought up and it came as a solution naturally (Discovering the metadata of an ERC725Y smart contract is something very generic that every one will do, so they will need something as generic and easy to use as possible).

https://discord.com/channels/359064931246538762/620552532602912769/930749248365015100

For me, the vision is quite clear: the blockchain should be public, autonomous, and decentralized, meaning it should not depend on centralized platforms. However, if the only way to read data from an ERC725Y contract is to search for the schema on a centralized platform like Github (where the standards are also described), we lose some of that decentralization. Apart from this vision, standardization is crucial because people will create custom contracts and use custom keys on top of ERC725Y. If there is no standardized way to get a schema, imagine the nightmare each time you try to fetch information from a contract.

Let me provide you with a simple example use case: LUKSO blockchain explorer. This is necessary, and it is what we are working on at LOOKSO. If you search for a contract on that explorer, it would be great if, for an ERC725Y contract, the explorer could fetch all the information stored on the contract, even if it is a custom contract using custom keys.


## Motivation
<!--The motivation is critical for LIPs that want to change the Lukso protocol. It should clearly explain why the existing protocol specification is inadequate to address the problem that the LIP solves. LIP submissions without sufficient motivation may be rejected outright.-->
The LSP2 standard provides a schema that enables to read and interpret the metadata of an ERC725Y smart contract in a human friendly. This is also useful for tools to automate encoding and decoding of standard entries in the storage of a ERC725Y smart contract.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the first sentence.
in a human friendly. sounds strange.
Maybe it should be as human friendly or even:

The LSP2 standard provides a JSON schema that makes the metadata of an ERC725Y smart contract human-readable. 
  • Changed schema to JSON schema as the first one is too broad of a term.
  • Changed human friendly to human-readable. I think it's more appropriate as friendly sounds like the metadata was previously aggressive 😄

In the second sentence.
(1)
... useful for tools to automate encoding and decoding ...
->
... useful for automation of encoding and decoding ...

(2)

  • What do you mean by standard entries?

(3)
a ERC725Y -> an ERC725Y

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I would simply get rid of human friendly as it's not ahah, it just allows programs to interpret the data. So I would change to:
The LSP2 standard introduces a schema format that facilitates the definition, reading, and interpretation of data stored on an [ERC725Y](https://github.com/ERC725Alliance/ERC725/blob/develop/implementations/contracts/ERC725YCore.sol) smart contract.


> _how does someone that does not know the set of ERC725Y JSON schemas used by a smart contract can read the data from the contract storage in the first place?_

With no prior knowledge of the schemas, the contract metadata cannot be fetched as the schema helps to construct the `bytes32` data key, so that the contract can be queried to fetch data from it.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(1)
.. the contract metadata .. -> .. the contract's metadata ..

(2)
.. the schema helps to construct .. -> .. the schema is required to construct ..

(3)
.. the bytes32 data key, so that the contract can be queried to fetch data from it.
->
.. the bytes32 data key used to fetch the data.
or
.. the bytes32 data key used to fetch the data from the contract.

The final result looks like this:

With no prior knowledge of the schemas, the contract's metadata cannot be fetched as the schema is required to construct the bytes32 data key used to fetch the data.


### Existing Solutions

Currently, the JSON schemas can be obtained through various ways, including public/private Github repositories, documentation websites, README, packages or Gist. There is no standard "rules" or recommendations on where and how these schemas should be shared, which leads to a need for a "LSP2 JSON sharing" model, a way to store the link of the JSON Metadat where the schemas can be retrieved from.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no standard "rules" -> change is to are.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"LSP2 JSON sharing" -> "LSP2 JSON schema sharing"

Because we are sharing schemas not just JSON (though, technically it's just JSON).


Currently, the JSON schemas can be obtained through various ways, including public/private Github repositories, documentation websites, README, packages or Gist. There is no standard "rules" or recommendations on where and how these schemas should be shared, which leads to a need for a "LSP2 JSON sharing" model, a way to store the link of the JSON Metadat where the schemas can be retrieved from.

The only way to be able to read all the data of an ERC725Y contract without prior knowledge of it is to be aware of all the schemas available. In the previous "link sharing model", users and participants are aware of the schemas through third party services, where the schemas are hosted and published. To accomplish this without a trusted party, the schemas must be publicly discoverable, and we need a system for participants to agree on a single method to retrieve these schemas and the metadata.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... and we need a system for participants to agree on a single method to retrieve these schemas and the metadata.

... and we need a system for participants to agree on that provides a single method to retrieve schemas and metadata.

Comment on lines +41 to +43
One approach can be to store the external URL inside the smart contract on-chain.

A common solution is to introduce a state variable inside the smart contract that can be publicly queried. However, using this method creates several limitations and inconsistencies:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you are talking here about Existing solutions I suggest the following rephrasing:

One of the approaches is to store the external URL inside the smart contract on-chain.
A common implementation is the introduction of a state variable inside the smart contract that can be publicly queried. ...


The data stored under the **zero data key** can be one of the following two options:
- **on-chain**: a `JSON` file as utf8 encoded string.
- **off-chain**: a `JSONURL` linking to
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a suggestion: ... linking to a JSON file.


_Requirements_

Whether the Schemas are stored on or off-chains, the JSON data MUST adhere to the following requirements:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it's a typo Schemas. I think it should be uncapitalized.

Comment on lines +97 to +98
<!--The rationale fleshes out the specification by describing what motivated the design and why particular design decisions were made. It should describe alternate designs that were considered and related work, e.g. how the feature is supported in other languages. The rationale may also provide evidence of consensus within the community, and should discuss important objections or concerns raised during discussion.-->
The rationale fleshes out the specification by describing what motivated the design and why particular design decisions were made. It should describe alternate designs that were considered and related work, e.g. how the feature is supported in other languages. The rationale may also provide evidence of consensus within the community, and should discuss important objections or concerns raised during discussion.-->
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment is duplicated.

Comment on lines +92 to +93
- _What are the additional requirements_
- _Put an example here_
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably obvious but still I'll write it down:

This method is cheaper but less secure. Recommended for cases when a smart contract changes very frequently.

Your responsibility is:

  • To make sure the link is publicly available;
  • To make sure the link leads to a JSON file, not a directory of JSON files or anything else;
  • To find the best hosting service you can that will store this file;
  • To regularly check if the file is still available. Some automation might help here.

Comment on lines +65 to +73
```json
{
"name": "LSP21MetadataDiscovery",
"key": "0x0000000000000000000000000000000000000000000000000000000000000000",
"keyType": "Singleton",
"valueType": "string",
"valueContent": "<JSON|JSONURL>"
}
```
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking about the requirement for the case when we store a link to JSON file gave me an idea that it could be useful to have actually 2 keys:

  • 1 for storing raw JSON;
  • 1 for storing link to a JSONURL.

You may change the external file frequently and post once in a while updates to the first key that holds JSON on-chain.
The standardised JSON file may have the following format that is used by both on and off-chain files:

{
    "date": 1679149644213,
    "schemas": [
        ...
    ]
}

The date key will allow you to guess which one is more relevant and pick schemas from the more recent one.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As an alternative we can write a suggestion that says the following:

To ensure the best experience and reduce dependency on 3rd parties you should replace JSONURL with raw JSON once you are certain it is no longer going to change or won't change anytime soon. This way anyone you'll make sure that others will be able to read your contract's metadata.

But I'm not sure yet how that will be done by non-tech people like some future users of UP.


## Simple Summary
<!--"If you can't explain it simply, you don't understand it well enough." Provide a simplified and layman-accessible explanation of the LIP.-->
This standard defines the **zero data key** `0x0000000000000000000000000000000000000000000000000000000000000000` as an entry point for an ERC725Y smart to make its metadata publicly discoverable and retrievable.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This standard defines the **zero data key** `0x0000000000000000000000000000000000000000000000000000000000000000` as an entry point for an ERC725Y smart to make its metadata publicly discoverable and retrievable.
This standard defines the **zero data key** `0x0000000000000000000000000000000000000000000000000000000000000000` as an entry point for an ERC725Y smart contract to make its metadata publicly discoverable and retrievable.


## Abstract
<!--A short (~200 word) description of the technical issue being addressed.-->
This standard addresses the issue of making the different schemas used by an ERC725Y contract discoverable for users or applications that interact with the contract for the first time. This is useful for applications that have no prior knowledge of the different JSON schemas used for the metadata, and that do not know where this schema can be obtained off-chain.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This standard addresses the issue of making the different schemas used by an ERC725Y contract discoverable for users or applications that interact with the contract for the first time. This is useful for applications that have no prior knowledge of the different JSON schemas used for the metadata, and that do not know where this schema can be obtained off-chain.
This standard addresses the issue of making the different schemas used by an ERC725Y contract discoverable for users or applications that interact with the contract for the first time. This is useful for applications that have no prior knowledge of the different JSON schemas used for the metadata, and do not know where these schemas can be obtained off-chain.


### Proposed Solution

For our purpose, we use a single unique and easy to remember `bytes32` data key: the **zero data key**: `0x0000000000000000000000000000000000000000000000000000000000000000`.
Copy link
Contributor

@samuel-videau samuel-videau Mar 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
For our purpose, we use a single unique and easy to remember `bytes32` data key: the **zero data key**: `0x0000000000000000000000000000000000000000000000000000000000000000`.
For our purpose, we use a standardized, unique and easy to remember `bytes32` data key: the **zero data key**: `0x0000000000000000000000000000000000000000000000000000000000000000`.

@samuel-videau
Copy link
Contributor

@CJ42 we might want to add a something about the fact we don't have to set all the schema if we want semi-private data, such as:

The **zero data key** is used to set only the publicly discoverable data from the ERC725Y contract, allowing for semi-private data to be stored within the contract without being referenced in the zero data key JSON, thereby maintaining a level of privacy.```


### When the Schemas are stored on-chain

- _What are the requirements_
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- _What are the requirements_
- Requirements: The JSON data should be stored as a utf8 encoded string within the smart contract itself, ensuring the data is permanently available on-chain.


### When the Schemas are stored off-chain

- _What are the additional requirements_
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- _What are the additional requirements_
- Requirements: The smart contract should store a `JSONURL` pointing to the off-chain location where the JSON data can be found. This off-chain location should be accessible and reliable.

@skimaharvey
Copy link
Member

Cool standard.
Also, agree with what @YamenMerhi came up with where I don't think we should use the Zero Data Key for it.
Also might be expensive to implement if every time you change the layout of your storage, you need to somehow update this key. If you decide to "automatically" update this key at the SC level it would quite increase setData as you would always need to read from storage first and then potentially update this storage.
Btw I think this is a feature any indexer could easily implement but for the blockchain maximalist, it may make sense to have it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

6 participants