CBOR::Simple - Simple codec for the CBOR serialization format
use CBOR::Simple;
# Encode a Raku value to CBOR, or vice-versa
my $cbor = cbor-encode($value);
my $val1 = cbor-decode($cbor); # Fails if more data past first decoded value
my $val2 = cbor-decode($cbor, my $pos = 0); # Updates $pos after decoding first value
# By default, cbor-decode() marks partially corrupt parsed structures with
# Failure nodes at the point of corruption
my $bad = cbor-decode(buf8.new(0x81 xx 3)); # [[[Failure]]]
# Callers can instead force throwing exceptions on any error
my $*CBOR_SIMPLE_FATAL_ERRORS = True;
my $bad = cbor-decode(buf8.new(0x81 xx 3)); # BOOM!
# Decode CBOR into diagnostic text, used for checking encodings and complex structures
my $diag = cbor-diagnostic($cbor);
# Force the encoder to tag a value with a particular tag number
my $tagged = CBOR::Simple::Tagged.new(:$tag-number, :$value);
my $cbor = cbor-encode($tagged);
CBOR::Simple
is an easy-to-use implementation of the core functionality of the CBOR serialization format, implementing the standard as of RFC 8949, plus a collection of common tag extensions as described below in TAG IMPLEMENTATION STATUS.
CBOR::Simple
is one of the fastest data structure serialization codecs available for Raku. It is comparable in round-trip speed to JSON::Fast
for data structures that are the most JSON-friendly. For all other cases tested, CBOR::Simple
produces smaller, higher fidelity encodings, faster. For more detail, and comparison with other Raku serialization codecs, see serializer-perf.
Currently known NOT to work:
-
Any tag marked '✘' (valid but not yet supported) or 'D' (deprecated spec) in the ENCODE or DECODE column of the Tag Status Details table below, or any tag not explicitly listed therein, will be treated as an opaque tagged value rather than treated as a native type.
-
Packed arrays of 128-bit floats (num128); these are not supported in Rakudo yet.
-
Encoding finite 16-bit floats (num16); encoding 16-bit NaN and ±Inf, as well as decoding any num16 all work. This is a performance tradeoff rather than a technical limitation; detecting whether a finite num32 can be shrunk to 16 bits without losing information is costly and rarely results in space savings except in trivial cases (e.g. Nums containing only small integers).
When encoding, CBOR::Simple
makes every attempt to encode tagged content strictly within the tag standards as written, always producing spec-compliant encoded values.
When decoding, CBOR::Simple
will often slightly relax the allowed content types in tagged content, especially when later tag proposals made no change other than to extend the allowed content types and allocate a new tag number for that. In the extension case CBOR::Simple
is likely to allow both the old and new tag to accept the same content domain when decoding.
For example, when encoding CBOR::Simple
will always encode Instant
or DateTime
as a CBOR epoch-based date/time (tag 1), using standard integer or floating point content data. But when decoding, CBOR::Simple
will accept any content that decodes properly as a Raku Real
value -- and in particular will handle a CBOR Rational (tag 30) as another valid content type.
Raku's builtin time handling is richer than the default CBOR data model (though certain tag extensions improve this), so the following mappings apply:
-
Encoding
-
Instant
andDateTime
are both written as tag 1 (epoch-based date/time) with integer (if lossless) or floating point content. -
Other
Dateish
are written as tag 100 (RFC 8943 days since 1970-01-01).
-
-
Decoding
-
Tag 0 (date/time string) is parsed as a
DateTime
. -
Tag 1 (epoch-based date/time) is parsed via
Instant.from-posix()
, and handles any Real type in the tag content. -
Tag 100 (days since 1970-01-01) is parsed via
Date.new-from-daycount()
. -
Tag 1004 (date string) is parsed as a
Date
.
-
-
CBOR's
null
is translated asAny
in Raku. -
CBOR's
undefined
is translated asMu
in Raku. -
A real
Nil
in an array (which must be bound, not assigned) is encoded as a CBOR Absent tag (31). Absent values will be recognized on decode as well, but since array contents are assigned into their parent array during decoding, aNil
in an array will be translated toAny
by Raku's array assignment semantics.
-
To mark a substructure for lazy decoding (treating it as an opaque
Blob
until explicitly decoded), use the tagged value idiom in the SYNOPSIS with:tag-number(24)
(encoded CBOR value) or:tag-number(63)
(encoded CBOR Sequence). -
CBOR strings claiming to be longer than
2⁶³-1
are treated as malformed. -
Bigfloats and decimal fractions (tags 4, 5, 264, 265) with very large exponents may result in numeric overflow when decoded.
-
Keys for Associative types are sorted using Raku's internal
sort
method rather than the RFC 8949 default sort, because the latter is much slower. -
cbor-diagnostic()
always adds encoding indicators for float values.
Note that unrecognized tags will decode to their contents wrapped with a CBOR::Simple::Tagged
object that records its tag-number
; check marks in the details table indicate conversion to/from an appropriate native Raku type rather than this default behavior.
GROUP | SUPPORT | NOTES |
---|---|---|
Core | Good | Core RFC 8949 CBOR data model and syntax |
Collections | Good | Sets, maps with only object or only string keys |
Graph | NONE | Cyclic, indirected, and self-referential structures |
Numbers | Good | Rational/BigInt/BigFloat support except non-finite triplets |
Packed Arrays | Partial | Packed num16/32/64 arrays supported; packed int arrays not |
Special Arrays | NONE | Explicit multi-dim/homogenous arrays |
Tag Fallbacks | Good | Round tripping of unknown tagged content |
Date/Time | Partial | All but tagged time (tags 1001-1003) supported |
GROUP | SUPPORT | NOTES |
---|---|---|
Encodings | NONE | baseN, MIME, YANG, BER, non-UTF-8 strings |
Geo | NONE | Geographic coordinates and shapes |
Identifiers | NONE | URI, IRI, UUID, IPLD CID, general identifiers |
Networking | NONE | IPv4/IPv6 addresses, subnets, and masks |
Security | NONE | COSE and CWT |
Specialty | NONE | IoT data, Openswan, PlatformV, DOTS, ERIS, RAINS |
String Hints | NONE | JSON conversions, language tags, regex |
SPEC | TAGS | ENCODE | DECODE | NOTES |
---|---|---|---|---|
RFC 8949 | 0 | → | ✓ | DateTime strings → Encoded as tag 1 |
RFC 8949 | 1 | ✓ | ✓ | DateTime/Instant |
RFC 8949 | 2,3 | ✓ | ✓ | (Big) Int |
RFC 8949 | 4,5 | → | ✓ | Big fractions → Encoded as tag 30 |
unassigned | 6-15 | |||
COSE | 16-18 | ✘ | ✘ | MAC/Signatures |
unassigned | 19-20 | |||
RFC 8949 | 21-23 | ✘ | ✘ | Expected JSON conversion to baseN |
RFC 8949 | 24 | T | ✓ | Encoded CBOR data item |
[Lehmann] | 25 | ✘ | ✘ | String backrefs |
[Lehmann] | 26,27 | ✘ | ✘ | General serialized objects |
[Lehmann] | 28,29 | ✘ | ✘ | Shareable referenced values |
[Occil] | 30 | ✓ | ✓ | Rational numbers |
[Vaarala] | 31 | ✓ | * | Absent values |
RFC 8949 | 32-34 | ✘ | ✘ | URIs and base64 encoding |
RFC 7094 | 35 | D | D | PCRE/ECMA 262 regex (DEPRECATED) |
RFC 8949 | 36 | ✘ | ✘ | Text-based MIME message |
[Clemente] | 37 | ✘ | ✘ | Binary UUID |
[Occil] | 38 | ✘ | ✘ | Language-tagged string |
[Clemente] | 39 | ✘ | ✘ | Identifier semantics |
RFC 8746 | 40 | ✘ | ✘ | Row-major multidim array |
RFC 8746 | 41 | ✘ | ✘ | Homogenous array |
[Mische] | 42 | ✘ | ✘ | IPLD content identifier |
[YANG] | 43-47 | ✘ | ✘ | YANG datatypes |
unassigned | 48-51 | |||
draft | 52 | D | D | IPv4 address/network (DEPRECATED) |
unassigned | 53 | |||
draft | 54 | D | D | IPv6 address/network (DEPRECATED) |
unassigned | 55-60 | |||
RFC 8392 | 61 | ✘ | ✘ | CBOR Web Token (CWT) |
unassigned | 62 | |||
[Bormann] | 63 | T | ✓ | Encoded CBOR Sequence |
RFC 8746 | 64-79 | ✘! | ✘! | Packed int arrays |
RFC 8746 | 80-87 | ✓ | ✓ | Packed num arrays (except 128-bit) |
unassigned | 88-95 | |||
COSE | 96-98 | ✘ | ✘ | Encryption/MAC/Signatures |
unassigned | 99 | |||
RFC 8943 | 100 | ✓ | ✓ | Date |
unassigned | 101-102 | |||
[Vidovic] | 103 | ✘ | ✘ | Geo coords |
[Clarke] | 104 | ✘ | ✘ | Geo coords ref system WKT/EPSG |
unassigned | 105-109 | |||
RFC 9090 | 110-112 | ✘ | ✘ | BER-encoded object ID |
unassigned | 113-119 | |||
[Vidovic] | 120 | ✘ | ✘ | IoT data point |
unassigned | 121-255 | |||
[Lehmann] | 256 | ✘ | ✘ | String backrefs (see tag 25) |
[Occil] | 257 | ✘ | ✘ | Binary MIME message |
[Napoli] | 258 | ✓ | ✓ | Set |
[Holloway] | 259 | T | ✓ | Map with object keys |
[Raju] | 260-261 | ✘ | ✘ | IPv4/IPv6/MAC address/network |
[Raju] | 262-263 | ✘ | ✘ | Embedded JSON/hex strings |
[Occil] | 264-265 | → | * | Extended fractions -> Encoded as tag 30 |
[Occil] | 266-267 | ✘ | ✘ | IRI/IRI reference |
[Occil] | 268-270 | ✘✘ | ✘✘ | Triplet non-finite numerics |
RFC 9132 | 271 | ✘✘ | ✘✘ | DDoS Open Threat Signaling (DOTS) |
[Vaarala] | 272-274 | ✘ | ✘ | Non-UTF-8 strings |
[Cormier] | 275 | T | ✓ | Map with only string keys |
[ERIS] | 276 | ✘ | ✘ | ERIS binary read capability |
[Meins] | 277-278 | ✘ | ✘ | Geo area shape/velocity |
unassigned | 279-1000 | |||
[Bormann] | 1001-1003 | ✘ | ✘ | Extended time representations |
RFC 8943 | 1004 | → | ✓ | → Encoded as tag 100 |
unassigned | 1005-1039 | |||
RFC 8746 | 1040 | ✘ | ✘ | Column-major multidim array |
unassigned | 1041-22097 | |||
[Lehmann] | 22098 | ✘ | ✘ | Hint for additional indirection |
unassigned | 22099-25440 | |||
[Broadwell] | 25441 | ✓ | ✓ | Capture: reference implementation |
unassigned | 25442-49999 | |||
[Tongzhou] | 50000-50011 | ✘✘ | ✘✘ | PlatformV |
unassigned | 50012-55798 | |||
RFC 8949 | 55799 | ✓ | ✓ | Self-described CBOR |
[Richardson] | 55800 | ✓ | ✓ | Self-described CBOR Sequence |
unassigned | 55801-65534 | |||
invalid | 65535 | ✓ | Invalid tag detected | |
unassigned | 65536-15309735 | |||
[Trammell] | 15309736 | ✘✘ | ✘✘ | RAINS message |
unassigned | 15309737-1330664269 | |||
[Hussain] | 1330664270 | ✘✘ | ✘✘ | CBOR-encoded Openswan config file |
unassigned | 1330664271-4294967294 | |||
invalid | 4294967295 | ✓ | Invalid tag detected | |
unassigned | ... | |||
invalid | 18446744073709551615 | ✓ | Invalid tag detected |
SYMBOL | MEANING |
---|---|
✓ | Fully supported |
* | Supported, but see notes below |
T | Encoding supported by explicitly tagging contents |
→ | Raku values will be encoded using a different tag |
D | Deprecated and unsupported tag spec; may eventually be decodable |
✘ | Not yet implemented |
✘! | Not yet implemented, but already requested |
✘? | Not yet implemented, but may be easy to add |
✘✘ | Probably won't be implemented in CBOR::Simple |
Geoffrey Broadwell gjb@sonic.net
Copyright 2021 Geoffrey Broadwell
This library is free software; you can redistribute it and/or modify it under the Artistic License 2.0.