Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Text, raw varint, bytecode distinction #76

Closed
ivan386 opened this issue May 24, 2018 · 7 comments · Fixed by #90
Closed

Text, raw varint, bytecode distinction #76

ivan386 opened this issue May 24, 2018 · 7 comments · Fixed by #90
Assignees

Comments

@ivan386
Copy link

ivan386 commented May 24, 2018

miscellaneous,,
raw, raw binary, 0x55

Ok. This valid raw varint code.

bases encodings,,
identity, raw binary, NUL

Empty string or 0x00 or what?

base1, unary, "1"

Only when i open raw table.csv i see " in there.

multiaddrs
...
udp 0x0111

This not valid raw varint code.

Can be same codes in different sections?

In multiformats/multihash#55 (comment) i use raw varint prefix

0x84... - Varint Prefix for Merkle Hash Tree Root

Is there conflict with sctp?

sctp, , 0x84

@Stebalien
Copy link
Member

I ran into that same concern and added the following to the README. Does this help?

A multicodec identifier may either be a varint (in a byte string) or a symbol (in a text string).

(if not, we need to improve the readme)

When the table uses codes in the form 0x..., those codes are expected to appear in a binary (byte string) context as varints. Otherwise, the codes are "symbols".

identity, raw binary, NUL

Empty string or 0x00 or what?

NUL means the NUL character. Text encodings usually encode this as 0x00.

base1, unary, "1"

Only when i open raw table.csv i see " in there.

Hm. Yes, we should consider escaping those quotes. This case is the same as NUL.
It means the symbol "1" (however your chosen text encoding encodes "1").

udp 0x0111

This not valid raw varint code.

The table shows them as numbers. When encoded in a byte string, they must be
converted to varints.

Can be same codes in different sections?

Nope. That's why we made this table.

In multiformats/multihash#55 (comment) i use raw varint prefix

0x84... - Varint Prefix for Merkle Hash Tree Root

Is there conflict with sctp?

sctp, , 0x84

As 0x84 is just a number, no (I'll comment on your other PR separately).

@fluency03
Copy link
Contributor

@Stebalien

A multicodec identifier may either be a varint (in a byte string) or a symbol (in a text string).

This is indeed very confusing. I really don't understand what do you mean by symbol.

If you check my #89, you will understand my problem.

For example,

multihash 0x31 - base1 '1', which is 0x31
multicodec 0x30 - base2 '0', which is 0x30
dns6 0x37 - base8 '7', which is 0x37

@fluency03
Copy link
Contributor

@Stebalien

If you check the JS implementation here base-table.js#L11:

exports['base1'] = Buffer.from('01', 'hex')
exports['base2'] = Buffer.from('00', 'hex')
exports['base8'] = Buffer.from('07', 'hex')
exports['base10'] = Buffer.from('09', 'hex')

All of them are treated as 0x01, 0x00 (which is conflicting with Identity), 0x07, 0x09.
They are supposed to be symbol (in a text string), as you call it.

@Stebalien
Copy link
Member

The JS implementation is wrong, thanks for catching that: multiformats/js-multicodec#29

This is indeed very confusing. I really don't understand what do you mean by symbol.

For example, binary is composed of two symbols 0 and 1 (or true and false). Bytes are defined to each be a string of 8 binary symbols but are also, themselves, symbols (there are 256 of them).

Every character is also a symbol. On a computer, these symbols may be encoded into bits/bytes but there are often several ways to encode a single symbol into bits/bytes and the symbol exists apart from these encodings (an '1' on paper is a '1', not 0x31).

@fluency03
Copy link
Contributor

fluency03 commented Nov 15, 2018

@Stebalien

In this case, the python implementation is also wrong: https://github.com/multiformats/py-multicodec/blob/master/multicodec/constants.py

    # miscellaneous
    # disabling bin because its prefix collides with base2
    'bin':                  {'prefix': 0x55, },

    # bases encodings
    'base1':                {'prefix': 0x01, },
    # 'base2':                {'prefix': 0x55, },
    'base8':                {'prefix': 0x07, },
    'base10':               {'prefix': 0x09, },

@Stebalien
Copy link
Member

Gah...

@Stebalien
Copy link
Member

Let's continue all discussion on your new issue (#89) so we don't split it.

Stebalien added a commit that referenced this issue Nov 15, 2018
Resolution from a discussion with Juan and the discussion on the following
issues:

fixes #89
fixes #76
@ghost ghost assigned Stebalien Nov 15, 2018
@ghost ghost added the in progress label Nov 15, 2018
@vmx vmx closed this as completed in #90 Dec 18, 2018
vmx pushed a commit that referenced this issue Dec 18, 2018
Resolution from a discussion with Juan and the discussion on the following
issues:

fixes #89
fixes #76
@ghost ghost removed the in progress label Dec 18, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants