© 2022 Blockchain Commons
Authors: Wolf McNally, Christopher Allen, Shannon Applecline
Date: Aug 10, 2022
Updated: Sep 3, 2023
Information systems use many kinds of identifiers for many purposes. The main purpose of an identifier is to uniquely point to an object, or referent, within a given domain. An identifier that is universally unique can be associated to any object or concept in all of existence and be relied on to be unique because it contains sufficient entropy (randomness) to ensure that it will, for every conceivable practical purpose, never collide with another such identifier.
Universally unique identifiers have precedent in (for example) UUIDs, URIs, and cryptographic digests.
UUIDs are 128 bits in length and come in several different versions. Each version specifies several bitfields and their semantics. Version 4 is specified to be random, but is still not completely random because it does not specify that cryptographically strong randomness is always be used, and it reserves a 7 bits to identify it as a version 4 UUID, leaving 121 bits of actual randomness.
URIs are (more or less) human readable text. The specification of URIs usually focuses on human-understandable semantics and are frequently hierarchical, starting with the scheme
field, which describes a namespace within which the remainder of the URI is considered to point to a referent.
A cryptographic hash algorithm such as SHA-256 or BLAKE3 maps a block of data of arbitrary size to a fixed-length "digest." This digest reveals nothing about the source image by itself, but can only be computed by applying the same algorithm to the same image. A digest can thereby be considered a "pointer" to a particular binary referent.
We propose herein a standard for a cryptographically strong, universally unique identifier known as an Apparently Random Identifier, or ARID.
The goals for this form of identifier are:
- Non-correlatability
- Neutral semantics
- Open generation
- Minimum strength
- Cryptographic suitability
To be an ARID, the sequence of bits that comprise it MUST NOT be correlatable with its referent, nor any other ARID. Therefore, it MUST NOT be a hash or digest of another object.
The sequence of bits in an ARID MUST be statistically indistinguishable from pure entropy. Therefore one method of generating an ARID is to use a cryptographically strong random number generator.
However, the source of entropy for an ARID does not itself have to actually be random; it simply has to be indistinguishable from randomness without additional hidden information. One example would be when a sequence of ARIDs are generated from a ratcheting key generation algorithm. Knowing the current state of the ratchet and correct ARID would give one the ability to ratchet the key to the next state and generate the next ARID in the sequence. A third-party observer would be unable to correlate the next ARID with the previous ARID without access to the secret ratchet state.
Existing identifiers frequently contain inherent type information (UUID version 4 identifies itself as such) and frequently specify the type of referent (URIs specify the scheme
and often specify a referent type such as .jpg
in their path.)
ARIDs contain no type information. Statistically, they are uniformly random sequences of bits. If you merely encoded an ARID as a sequence of binary or hexadecimal digits, it would appear to be a random sequence.
Type information can be added at higher levels. When encoded as CBOR, an ARID is tagged with #6.40012. Tagged this way, the receiver of an ARID can still only determine that it is an ARID, and nothing about the type or nature of its referent.
In particular, this construct provides no information about the lifetime of the referent. The referent could exist persistently for all time, such as in a blockchain, or it could exist for milliseconds, as in a distributed function call.
This construct also provides no information as to the source of its bit sequence. Since the sequence is statistically random, it could have been generated by a cryptographic random number generator or a sequence of ratcheting keys, and either case would be indistinguishable to a third-party observer.
Higher level semantics are provided by how an ARID is further tagged, or by how it is positioned in a larger structure, or both. For instance, a distributed function call could have a header that includes the construct request(ARID(XXX))
where request
is a CBOR tag indicating that the remainder of the structure specifies which function to call and with what parameters, and ARID
specifies its tagged contents as conforming to the other requirements of this document. Positional information would include, for example the position of the ARID within a header, or which field an ARID populates, such as person: ARID(XXX)
. In this example, being the value of the person
field is sufficient to use the ARID as a "person identifier" unless there is more than one distinct kind of "person", in which case another tag would be needed to disambiguate this.
As mentioned above, any method of generating an ARID is allowed as long as it fulfills the other requirements of this document, chiefly:
- statistically random bits, and
- universal uniqueness.
ARIDs must be a minimum of 256 bits (32 bytes) in length. At this time, there is no perceived need for ARIDs to be longer, and thus conformant processes that receive ARIDs MAY reject ARIDs that are longer or shorter than 256 bits, while processes that generate ARIDs SHOULD only generate ARIDs that are exactly 256 bits in length.
The foregoing notwithstanding, ARIDs MAY be used as inputs to cryptographic constructs such as a ratcheting key algorithms, or used as additional entropy for random number generators, or salt for hashing algorithms, as long as the output of such algorithms is necessarily related to the ARID's referent.
For example in the distributed call scenario, a caller might transmit a structure including request(ARID(A))
, where A is an ARID generated from an iteration of a ratcheting key algorithm. The receiver compares A
to its own internal state, rejecting the call if it does not match, and advancing the state of its ratchet if it does. The receiver computes the result of the call and returns a structure including response(ARID(B))
, where B is generated from the new state of the ratchet. The caller receives the response and uses the algorithm to correlate B
in the response to its call A
, and if further exchanges are needed, uses the ratchet to produce the next expected transaction ID, C
. Third parties viewing the exchange cannot correlate A
, B
, or C
, and in particular, they cannot correlate a specific response to its call.
ARIDs MUST NOT be confused with any other sort of identifier or sequence of random or pseudorandom numbers.
- ARIDs MUST NOT be cast to or from other identifier types such as UUIDs, nor should they be considered isomorphic to any other type.
- ARIDs MUST NOT be cast from digests (hashes) or similar structures.
- ARIDs are not nonces. Unlike nonces, ARIDs always have a referent. ARIDs MUST NOT be used as nonces, and MUST NOT be created by casting from a nonce used anywhere else.
- ARIDs are not keys and MUST NOT be used as keys.
- ARIDs are not cryptographic seeds. They are generally not considered secret, and MUST NOT be used as secret key material from which keys or other secret constructs are derived.
This document defines the following UR types along with their corresponding CBOR tags:
UR type | CBOR Tag |
---|---|
ur:arid | #6.40012 |
These tags have been registered in the IANA Registry of CBOR Tags.
arid = #6.40012(arid-data)
arid-data = bytes .size 32
Hashes identify a fixed, immutable state of data. If the data changes, the hash changes. ARIDs, on the other hand, can serve as a stable identifier for mutable data structures. They provide universal uniqueness without tying them to a specific data snapshot, making them more versatile for identifying evolving or mutable referents.
Casting ARIDs to or from other identifier types compromises their neutral semantics and could introduce correlation with their referent or with other ARIDs. It undermines the fundamental aim of being universally unique while remaining completely opaque regarding their origin or the data they reference.
Hashes like SHA-256 are deterministic and directly tied to the data they represent. This compromises the non-correlatability requirement of ARIDs. If you use a hash, anyone with the same input data could generate the same ARID, making it possible to correlate the identifier with its referent. This runs counter to the primary goals of ARIDs, which aim for complete opacity regarding their generation method and the data they are linked to.
ARIDs are designed to be universally unique identifiers tied to a referent, whereas nonces are often ephemeral and context-dependent. Using an ARID as a nonce could mislead into thinking it's meant to be associated with a specific object or event long-term. This discrepancy in purpose could cause semantic confusion and potential security risks.
ARIDs are set at 256 bits to meet a minimum threshold for cryptographic strength and universal uniqueness. Shorter lengths compromise these properties. Longer lengths don't offer proportionate benefits but increase computational and storage costs. Therefore, the 256-bit length is both sufficient and efficient.
ARIDs are not designed to be secret; their primary role is to serve as identifiers that are uncorrelated with their referents. Using them as secret key material would be a misuse of the structure and could compromise the security of cryptographic systems where actual secret keys are needed.
The "universal uniqueness" of ARIDs comes from adhering to stringent entropy requirements. Regardless of the generation method—be it a cryptographically secure random number generator or a ratcheting key algorithm—the resulting ARID must be statistically indistinguishable from pure entropy and at least 256 bits long. The sheer scale of the entropy space for a 256-bit identifier effectively guarantees that the chance of collision, even when using multiple methods, is astronomically low. Therefore, as long as the entropy requirements are rigorously met, the "universal uniqueness" is practically assured.
Using ARIDs as inputs to cryptographic constructs doesn't violate their non-correlatability or neutral semantics. It doesn't reveal information about the ARID or its referent, maintaining their core attributes. It simply utilizes their high-entropy nature for cryptographic operations.