From 51fd30451413caeb33e8ab91b9249db598f99c1d Mon Sep 17 00:00:00 2001 From: Christoph Hagen Date: Sun, 10 Jul 2022 01:46:21 +0200 Subject: [PATCH] Separate protobuf encoding description --- ProtobufSupport.md | 205 +++++++++++++++++++++++++++++++++++++++++++++ README.md | 47 +---------- 2 files changed, 208 insertions(+), 44 deletions(-) create mode 100644 ProtobufSupport.md diff --git a/ProtobufSupport.md b/ProtobufSupport.md new file mode 100644 index 0000000..ce8c6b0 --- /dev/null +++ b/ProtobufSupport.md @@ -0,0 +1,205 @@ +# Protocol Buffer Compatibility + +`BinaryCodable` provides limited compatibility to [Google Protocol Buffers](https://developers.google.com/protocol-buffers). Certain Swift types can be encoded to protobuf compatible binary data, and vice versa. The standard [binary format](BinaryFormat.md) is similar to protobuf, but includes some deviations to support all Swift types and features. There are additional `ProtobufEncoder` and `ProtobufDecoder` types which change the encoding format to be protobuf-compatible, at the expense of errors for unsupported features. + +For a description of the Protocol Buffer format, see the [official documentation](https://developers.google.com/protocol-buffers). + +**Important notes** +- Advanced protobuf features like message concatenation are not supported. +- Unsupported features of Protobuf *may* cause the encoding to fail with a `ProtobufEncodingError`. Interoperability should be thoroughly checked through testing. + +## Usage + +The conversion process is equivalent to the `BinaryEncoder` and `BinaryDecoder` types. + +```swift +import BinaryCodable +``` + +### Encoding + +Construct an encoder when converting instances to binary data, and feed the message(s) into it: + +```swift +let message: Message = ... + +let encoder = ProtobufEncoder() +let data = try encoder.encode(message) +``` + +### Decoding + +Decoding instances from binary data works much the same way: + +```swift +let decoder = ProtobufDecoder() +let message = try decoder.decode(Message.self, from: data) +``` + +Alternatively, the type can be inferred: + +```swift +let message: Message = try decoder.decode(from: data) +``` + +### Errors + +It is possible for both encoding and decoding to fail. +All possible errors occuring during encoding produce `BinaryEncodingError` or `ProtobufEncodingError` errors, while unsuccessful decoding produces `BinaryDecodingError` or `ProtobufDecodingError`. +All are enums with several cases describing the nature of the error. +See the documentation of the types to learn more about the different error conditions. + +## Message definition + +Protobuf organizes data into messages, which are structures with keyed fields. Compatible Swift types must be similar, so either `struct` or `class`. Simple types like `Int` or `Bool` are not supported on the root level, and neither are `Dictionary`, `Array` or `enum`. It's best to think of the proto definitions and construct Swift types in the same way. Let's look at an example from the [Protocol Buffer documentation](https://developers.google.com/protocol-buffers/docs/proto3#simple): + +```proto +message SearchRequest { + string query = 1; + int32 page_number = 2; + int32 result_per_page = 3; +} +``` + +The corresponding Swift definition would be: + +```swift +struct SearchRequest: Codable { + + var query: String + + var pageNumber: Int32 + + var resultPerPage: Int32 + + enum CodingKeys: Int, CodingKey { + case query = 1 + case pageNumber = 2 + case resultPerPage = 3 + } +} +``` + +The general structure of the messages is very similar, with the proto field numbers specified as integer coding keys. + +### Assigning integer keys + +The assignment of integer keys follow the same [rules](https://developers.google.com/protocol-buffers/docs/proto3#assigning_field_numbers) as for field numbers, just written out as an `enum` with `RawValue == Int` on the type conforming to `CodingKey`. The smallest field number you can specify is `1`, and the largest is `229 - 1`, or `536,870,911`. Codable types without (or with invalid) integer keys can't be encoded using `ProtobufEncoder` and will throw an error. + +### Scalar value types + +There are several [scalar types](https://developers.google.com/protocol-buffers/docs/proto3#scalar) defined for Protocol Buffers, which are the basic building blocks of messages. `BinaryCodable` provides Swift equivalents for each of them: + +| Protobuf primitive | Swift equivalent | Comment | +| :-- | :-- | :-- | +`double` | `Double` | Always 8 byte +`float` | `Float` | Always 4 byte +`int32` | `Int32` | Uses variable-length encoding +`int64` | `Int64` | Uses variable-length encoding +`uint32` | `UInt32` | Uses variable-length encoding +`uint64` | `UInt64` | Uses variable-length encoding +`sint32` | `SignedInteger` | Uses ZigZag encoding, see [`SignedInteger` wrapper](#signed-integers) +`sint64` | `SignedInteger` | Uses ZigZag encoding, see [`SignedInteger` wrapper](#signed-integers) +`fixed32` | `FixedSize` | See [`FixedSize` wrapper](#fixed-size-integers) +`fixed64` | `FixedSize` | See [`FixedSize` wrapper](#fixed-size-integers) +`sfixed32` | `FixedSize` | See [`FixedSize` wrapper](#fixed-size-integers) +`sfixed64` | `FixedSize` | See [`FixedSize` wrapper](#fixed-size-integers) +`bool` | `Bool` | Always 1 byte +`string` | `String` | Encoded using UTF-8 +`bytes` | `Data` | Encoded as-is +`message` | `struct` | Nested messages are also supported. +`repeated`| `Array` | Scalar values must always be `packed` (the proto3 default) +`enum` | `Enum` | See [Enums](#enums) +`oneof` | N/A | No `Codable` equivalent available + +The Swift types `Int8`, `UInt8`, `Int16`, and `UInt16` are **not** supported, and will result in an error. + +Note: `Int` and `UInt` values are always encoded as 64-bit numbers, despite the fact that they might be 32-bit values on some systems. Decoding a 64-bit value on a 32-bit system will result in an error. + +### Property wrappers + +The Protocol Buffer format provides several different encoding strategies for integers to minimize the binary size depending on the encoded values. By default, all integers are encoded using [Base 128 Varints](https://developers.google.com/protocol-buffers/docs/encoding#varints), but this can be changed using Swift `PropertyWrappers`. The following encoding options exist: + +| Swift type | [Varint encoding](https://developers.google.com/protocol-buffers/docs/encoding#varints) | [ZigZag Encoding](https://developers.google.com/protocol-buffers/docs/encoding#signed-ints) | [Fixed-size encoding](https://developers.google.com/protocol-buffers/docs/encoding#non-varint_numbers) | +| :-- | :-- | :-- | :-- | +`Int32` | `Int32` | `SignedInteger` | `FixedSize` +`Int64` | `Int64` | `SignedInteger` | `FixedSize` +`UInt32` | `UInt32` | - | `FixedSize` +`UInt64` | `UInt64` | - | `FixedSize` + +#### Fixed size integers + +While varints are efficient for small numbers, their encoding introduces a storage and computation penalty when the integers are often large, e.g. for random numbers. `BinaryCodable` provides the `FixedSize` wrapper, which forces integers to be encoded using their little-endian binary representations. This means that e.g. an `Int32` is always encoded as 4 byte (instead of 1-5 bytes using Varint encoding). This makes 32-bit `FixedSize` types more efficient than `Varint` if values are often larger than `2^28` (`2^56` for 64-bit types). + +Use the property wrapper within a `Codable` definition to enforce fixed-width encoding for a property: +```swift +struct MyStruct: Codable { + + /// Always encoded as 4 bytes + @FixedSize + var largeInteger: Int32 +} +``` + +The `FixedSize` wrapper is available to all `Varint` types: `Int`, `UInt`, `Int32`, `UInt32`, `Int64`, and `UInt64`. + +#### Signed integers + +Integers are by default [encoded as `Varint` values](BinaryFormat.md#integer-encoding), which is efficient while numbers are small and positive. For numbers which are mostly or also often negative, it is more efficient to store them using `Zig-Zag` encoding. `BinaryCodable` offers the `SignedValue` wrapper that can be applied to `Int`, `Int32` and `Int64` properties to increase the efficiency for negative values. + +Whenever your integers are expected to be negative, then you should apply the wrapper: +```swift +struct MyStruct: Codable { + + /// More efficiently encodes negative numbers + @SignedValue + var count: Int +} +``` + +### Enums + +Protocol Buffer [enumerations](https://developers.google.com/protocol-buffers/docs/proto3#enum) are supported, with a few notable caveats. Here is the example from the official documentation: +```proto +message SearchRequest { + + ... + + enum Corpus { + UNIVERSAL = 0; + WEB = 1; + IMAGES = 2; + LOCAL = 3; + NEWS = 4; + PRODUCTS = 5; + VIDEO = 6; + } + Corpus corpus = 4; +} +``` +The `BinaryCodable` Swift equivalent would be: + +```swift +struct SearchRequest: Codable { + + ... + + enum Corpus: Int, Codable { + case universal = 0 + case web = 1 + case images = 2 + case local = 3 + case news = 4 + case products = 5 + case video = 6 + } + + var corpus: Corpus + + enum CodingKeys: Int, CodingKey { + case corpus = 4 + } +} +``` + +It should be noted that protobuf enums require a default key `0`. diff --git a/README.md b/README.md index a9f037a..e572b94 100644 --- a/README.md +++ b/README.md @@ -146,7 +146,7 @@ While varints are efficient for small numbers, their encoding introduces a stora #### Other property wrappers - There are additional wrappers that can be applied to properties, but these are only useful when encoding in [protobuf-compatible format](#protobuf-compatibility). + There is an additional `SignedValue` wrapper, which is only useful when encoding in [protobuf-compatible format](ProtobufSupport.md#signed-integers). ### Options @@ -156,50 +156,9 @@ Sorting the binary data does not influence decoding, but introduces a computatio **Note:** The `sortKeysDuringEncoding` option does not guarantee deterministic binary data, and should be used with care. -### Protobuf compatibility - -Both `BinaryEncoder` and `BinaryDecoder` offer the property `forceProtobufCompatibility`, which changes the binary data encoding/decoding to be compatible with Google's Protocol Buffer format. This compatibility is limited to basic protobuf functionality, and should be used with care. The following features are currently supported: - -| Protobuf primitive | Swift equivalent | Comment | -| :-- | :-- | :-- | -`double` | `Double` | Always 8 byte -`float` | `Float` | Always 4 byte -`int32` | `Int32` | Uses variable-length encoding -`int64` | `Int64` | Uses variable-length encoding -`uint32` | `UInt32` | Uses variable-length encoding -`uint64` | `UInt64` | Uses variable-length encoding -`sint32` | `Int32` | Uses ZigZag encoding, see [`SignedInteger` wrapper](#signed-integers) -`sint64` | `Int64` | Uses ZigZag encodingsee [`SignedInteger` wrapper](#signed-integers) -`fixed32` | `FixedSize` | See [`FixedSize` wrapper](#fixed-size-integers) -`fixed64` | `FixedSize` | See [`FixedSize` wrapper](#fixed-size-integers) -`sfixed32` | `FixedSize` | See [`FixedSize` wrapper](#fixed-size-integers) -`sfixed64` | `FixedSize` | See [`FixedSize` wrapper](#fixed-size-integers) -`bool` | `Bool` | Always 1 byte -`string` | `String` | Encoded using UTF-8 -`bytes` | `Data` | Encoded as-is -`message` | `struct` | Nested messages should also be supported. -`repeated`| `Array` | Scalar values must always be `packed` (the proto3 default) - - -**Important notes** -- Protobuf compatibility requires [integer coding keys](#coding-keys), or the encoding/decoding will fail. -- Advanced protobuf features like message concatenation are not supported. -- Unsupported features of Protobuf *may* cause the encoding to fail with a `BinaryEncodingError` of type `notProtobufCompatible`. Interoperability should be thoroughly checked through testing. +### Protocol Buffer compatibility - -#### Signed integers - -Integers are by default [encoded as `Varint` values](BinaryFormat.md#integer-encoding), which is efficient while numbers are small and positive. For numbers which are mostly or also often negative, it is more efficient to store them using `Zig-Zag` encoding. `BinaryCodable` offers the `SignedValue` wrapper that can be applied to `Int`, `Int32` and `Int64` properties to increase the efficiency for negative values. - -Whenever your integers are expected to be negative, then you should apply the wrapper: -```swift -struct MyStruct: Codable { - - /// More efficiently encodes negative numbers - @SignedValue - var count: Int -} -``` +Achieving Protocol Buffer compatibility is described in [ProtobufSupport.md](ProtobufSupport.md). ## Binary format