Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(cbor/unstable): Introduce @std/cbor #5909

Open
wants to merge 53 commits into
base: main
Choose a base branch
from

Conversation

BlackAsLight
Copy link
Contributor

@BlackAsLight BlackAsLight commented Sep 4, 2024

Closes #5479

This pull request introduces a CBOR implementation based off the RFC 8949 spec from scratch. It introduces functions like encodeCbor, encodeCborSequence, decodeCbor, and decodeCborSequence, and a CborTag class to provide additional semantic information.

It also introduces streaming versions called:

  • CborSequenceEncoderStream
  • CborByteEncoderStream
  • CborTextEncoderStream
  • CborArrayEncoderStream
  • CborMapEncoderStream
  • CborSequenceDecoderStream
  • CborByteDecodedStream
  • CborTextDecodedStream
  • CborArrayDecodedStream
  • CborMapDecodedStream

It should be noted the different naming convention used between the "encoder", "decoder" and "decoded". The "encoder" and "decoder" classes are TransformStreams, while the "decoded" classes are ReadableStreams and act merely as a way for the user to figure out what type they have.

Due to the way streams work, if one of the "decoded" streams are yielded then they'll need to either be entirely consumed or cancelled before the next value will be yielded. This should be noted when using things like Array.fromAsync. Such a function would work only if you can guarantee that no value yielded will be one of these "decoded" streams. If such a value is yield in such a function then it will hang indefinitely.

Example 1

import { assert, assertEquals } from "@std/assert";
import { decodeCbor, encodeCbor } from "@std/cbor";

const rawMessage = [
  "Hello World",
  35,
  0.5,
  false,
  -1,
  null,
  Uint8Array.from([0, 1, 2, 3]),
];

const encodedMessage = encodeCbor(rawMessage);
const decodedMessage = decodeCbor(encodedMessage);

assert(decodedMessage instanceof Array);
assertEquals(decodedMessage, rawMessage);

Example 2

import { assert, assertEquals } from "@std/assert";
import { CborTextEncoderStream, CborTextDecodedStream, CborSequenceDecoderStream } from "@std/cbor";

const reader = CborTextEncoderStream.from(["Hello World!"])
  .readable
  .pipeThrough(new CborSequenceDecoderStream()).getReader();

const { done, value } = await reader.read();
assert(done === false);
assert(value instanceof CborTextDecodedStream);
assertEquals((await Array.fromAsync(value)).join(''), "Hello World!")

assert((await reader.read()).done === true);
reader.releaseLock();

Example 3

import { encodeBase64Url } from "@std/encoding";
import {
  CborArrayDecodedStream,
  CborArrayEncoderStream,
  CborByteDecodedStream,
  CborByteEncoderStream,
  CborMapDecodedStream,
  CborMapEncoderStream,
  type CborOutputStream,
  CborSequenceDecoderStream,
  CborSequenceEncoderStream,
  CborTag,
  CborTextDecodedStream,
  CborTextEncoderStream,
} from "@std/cbor";

const rawMessage = [
  undefined,
  null,
  true,
  false,
  3.14,
  5,
  2n ** 32n,
  "Hello World",
  new Uint8Array(25),
  new Date(),
  new CborTag(33, encodeBase64Url(new Uint8Array(7))),
  ["cake", "carrot"],
  { a: 3, b: "d" },
  CborByteEncoderStream.from([new Uint8Array(7)]),
  CborTextEncoderStream.from(["Bye!"]),
  CborArrayEncoderStream.from([
    "Hey!",
    CborByteEncoderStream.from([new Uint8Array(18)]),
  ]),
  CborMapEncoderStream.from([
    ["a", 0],
    ["b", "potato"],
  ]),
];

async function logValue(value: CborOutputStream) {
  if (
    value instanceof CborByteDecodedStream ||
    value instanceof CborTextDecodedStream
  ) {
    for await (const x of value) console.log(x);
  } else if (value instanceof CborArrayDecodedStream) {
    for await (const x of value) logValue(x);
  } else if (value instanceof CborMapDecodedStream) {
    for await (const [k, v] of value) {
      console.log(k);
      logValue(v);
    }
  } else if (value instanceof CborTag) {
    console.log(value);
    logValue(value.tagContent);
  } else console.log(value);
}

for await (
  const value of ReadableStream.from(rawMessage)
    .pipeThrough(new CborSequenceEncoderStream())
    .pipeThrough(new CborSequenceDecoderStream())
) {
  logValue(value);
}

Copy link

codecov bot commented Sep 4, 2024

Codecov Report

Attention: Patch coverage is 91.20623% with 113 lines in your changes missing coverage. Please review.

Project coverage is 96.40%. Comparing base (91c108e) to head (8b3afc6).
Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
cbor/sequence_decoder_stream.ts 77.80% 77 Missing and 2 partials ⚠️
cbor/sequence_encoder_stream.ts 86.50% 17 Missing ⚠️
cbor/_common_decode.ts 96.65% 8 Missing ⚠️
cbor/_common.ts 93.65% 4 Missing ⚠️
cbor/_common_encode.ts 97.47% 3 Missing ⚠️
cbor/byte_encoder_stream.ts 97.95% 1 Missing ⚠️
cbor/text_encoder_stream.ts 97.95% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #5909      +/-   ##
==========================================
- Coverage   96.56%   96.40%   -0.17%     
==========================================
  Files         516      536      +20     
  Lines       39771    41056    +1285     
  Branches     5883     6155     +272     
==========================================
+ Hits        38406    39578    +1172     
- Misses       1323     1434     +111     
- Partials       42       44       +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@github-actions github-actions bot added the cbor label Sep 5, 2024
@kt3k
Copy link
Member

kt3k commented Sep 5, 2024

I wonder if these need to be classes when they both don't have any instance states.

How about making them just deocde and encode functions? That mirrors the structure of std/msgpack

@BlackAsLight
Copy link
Contributor Author

I wonder if these need to be classes when they both don't have any instance states.

How about making them just deocde and encode functions? That mirrors the structure of std/msgpack

Sure. I was trying to mirror the structure of TextEncoder/TextDecoder, but this other structure seems more common in the std.

@kt3k
Copy link
Member

kt3k commented Sep 5, 2024

Side note: TextDecoder stores some chunk from the previous decode when stream option is specified. Probably that is why it's implemented as a class

@iuioiua
Copy link
Collaborator

iuioiua commented Sep 9, 2024

@BlackAsLight, could you please draft this PR and un-draft once ready for us to review?

@BlackAsLight BlackAsLight reopened this Sep 9, 2024
@BlackAsLight BlackAsLight marked this pull request as ready for review September 22, 2024 01:57
@BlackAsLight
Copy link
Contributor Author

Is there a way I can get the Sentry bot to update or create a new report on what lines are missing?

cbor/deno.json Outdated Show resolved Hide resolved
cbor/deno.json Outdated Show resolved Hide resolved
@kt3k
Copy link
Member

kt3k commented Sep 24, 2024

@BlackAsLight

Coverage diff of this PR should be visible in this page https://app.codecov.io/gh/denoland/std/pull/5909

CborSequenceDecoderStream
CborByteDecodedStream
CborTextDecodedStream
CborArrayDecodedStream
CborMapDecodedStream

There are mixed naming styles, FooDecoderStream and FooDecodedStream. Are these differences intentional?


We'd like to export a single API from a single file. Can you split files based on exported APIs

  • encode_cbor.ts exports encodeCbor
  • encode_cbor_sequence.ts exports encodeCborSequence
  • tag.ts exports CborTag
  • sequence_encoder.ts exports CborSequenceEncoderStream
  • byte_encoder_stream.ts exports CborByteEncoderStream
  • text_encoder_stream.ts exports CborTextEncoderStream
  • array_encoder_stream.ts exports CborArrayEncoderStream
  • map_encoder_stream.ts exports CborMapEncoderStream
  • decode_cbor.ts exports decodeCbor
  • decode_cbor_sequence.ts exports decodeCborSequence
  • sequence_decoder_stream.ts exports CborSequenceDecoderStream
  • byte_decoded_stream.ts exports CborByteDecodedStream
  • text_decoded_stream exports CborTextDecodedStream
  • array_decoded_stream exports CborArrayDecodedStream
  • map_decoded_stream exports CborMapDecodedStream

@BlackAsLight
Copy link
Contributor Author

@BlackAsLight

Coverage diff of this PR should be visible in this page https://app.codecov.io/gh/denoland/std/pull/5909

That page hasn't gotten updated in like 5 days now, which is why I asked.

CborSequenceDecoderStream
CborByteDecodedStream
CborTextDecodedStream
CborArrayDecodedStream
CborMapDecodedStream

There are mixed naming styles, FooDecoderStream and FooDecodedStream. Are these differences intentional?

The difference in naming style is intentional. The "Encoder" and "Decoder" ones are TransformStreams that actually do the work of converting, while the "Decoded" ones are ReadableStreams that act as a simple wrapper so the user of the lib is able to know what type of chunks to expect.

I did explore the idea of having the "Decoded" ones also be TransformStreams and handling the logic of conversion, but the logic there seemed too complicated as you don't know how far to go until you've actually decoded it.

I also explored the idea of having the other "Encoder" ones (i.e. CborByteEncodedStream) act as a mere wrapper for CborSequenceEncoderStream, but that then seemed redundant if all you wanted to send for example a byte string, as more checks would essentially need to be done for no reason.

We'd like to export a single API from a single file. Can you split files based on exported APIs

* `encode_cbor.ts` exports `encodeCbor`

* `encode_cbor_sequence.ts` exports `encodeCborSequence`

* `tag.ts` exports `CborTag`

* `sequence_encoder.ts` exports `CborSequenceEncoderStream`

* `byte_encoder_stream.ts` exports `CborByteEncoderStream`

* `text_encoder_stream.ts` exports `CborTextEncoderStream`

* `array_encoder_stream.ts` exports `CborArrayEncoderStream`

* `map_encoder_stream.ts` exports `CborMapEncoderStream`

* `decode_cbor.ts` exports `decodeCbor`

* `decode_cbor_sequence.ts` exports `decodeCborSequence`

* `sequence_decoder_stream.ts` exports `CborSequenceDecoderStream`

* `byte_decoded_stream.ts` exports `CborByteDecodedStream`

* `text_decoded_stream` exports `CborTextDecodedStream`

* `array_decoded_stream` exports `CborArrayDecodedStream`

* `map_decoded_stream` exports `CborMapDecodedStream`

That is a lot of files, but I can do that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

suggestion: add CBOR encoding/decoding
3 participants