Skip to content

Latest commit

 

History

History
304 lines (211 loc) · 18.3 KB

README.org

File metadata and controls

304 lines (211 loc) · 18.3 KB

Motivation

When using some serialization libraries I have to write code to traverse my data structures and then put each of my struct’s fields into the format the serialization library wants, and then I have to do this again in reverse. I have to write this traversal code again when converting to JSON or Lua. I have to write it again when writing a debug GUI that lets me see all the values. I wanted to reduce the boilerplate.

  1. I wanted it to be easy to turn a struct into a serializable struct, ideally just one line of code per struct.
  2. I didn’t want to introduce new build steps or external tools.
  3. I wanted to be able to serialize existing structs without modifying their definitions.
  4. I wanted to add support for containers (vector, set, variant, etc.) without modifying this library.
  5. I wanted to add support for formats (binary, JSON, Lua, etc.), without modifying this library.
  6. I did not need it to work for arbitrary data structures.

The result is a C++17 header-only serialization library.

Design

This isn’t a general purpose library. I created it primarily for my own use. I plan to use it for (copyable, assignable, default constructible) structs with public fields, numbers, enums, std::string, and std::vector. For these data structures, the library defines a generic traversal routine that recursively visits each element, calling a serialization/deserialization function for each.

Since there are many different data structures (nouns) and many different operations we want to do to them (verbs), there are two axes for extension:

Operation \ Typeenumnumberstringvectorstruct1struct2
Debug output
Binary serialize
Binary deserialize
GUI output

Two-axis extension can leave “holes” in the matrix. For example, if you add a new extension for Lua that handles enum, number, string, vector, and structs, and you also add a new extension for std::multiset that handles debug output, binary serialize, binary deserialize, and gui output, there’s no code to handle the combination of Lua and multiset. The library doesn’t have a way to automatically fill those holes, but is designed to be “open” so that the end user can add a handle for any missing combinations. See traverse-picojson-variant.h for an example of filling a hole (picojson extension + variant extension).

Installation

This is a header-only library, but has optional extensions for JSON (via picojson or rapidjson), Lua 5.2, and mapbox::variant. To install the dependencies for the extensions:

git submodule update --init

Usage

To use any of the visitors, construct one, then call the visit function. The API varies by visitor type. Binary serialization uses a std::streambuf and reports no errors; binary deserialization also uses a std::streambuf and reports errors as a string:

std::stringbuf buf;
traverse::BinarySerialize writer(buf);
visit(writer, yourobject);
// buf.str() contains the serialized string

traverse::BinaryDeserialize reader(buf);
visit(reader, yourobject)
if (!reader.Errors().empty()) throw reader.Errors();
if (buf.getc() != std::streambuf::traits_type::eof()) {
    throw "not all bytes processed";
}

To define traversal of a user-defined struct:

struct Point {
    int32_t x;
    int32_t y;
};

TRAVERSE_STRUCT(Point, FIELD(x) FIELD(y))

All serializable fields must be public.

Visitor types

As described in the Motivation section, I want to be able to extend the set of visitors or the set of data types. These are the visitor types included with the library:

Debug output

The CoutWriter writes everything to stdout. The TRAVERSE_STRUCT macro defines operator <<(ostream&) to use CoutWriter for output.

User defined <<

If operator << is already defined on your type T, the TRAVERSE_STRUCT macro fails. I need to provide an alternative.

Binary serialization

The BinarySerialize and BinaryDeserialize classes write/read to a simple binary format. There is no backwards/forwards compatibility, compression, optional fields, data structure sharing, zero-copy, support for multiple programming languages, or other nice features.

If there are structural errors during deserialization, the errors field will contain them. If the string is empty, there were no errors. The library does not perform semantic validation such as numbers being in range or an enum being one of the named items; you will have to write your own code for that.

Integers are encoded using Google’s ZigZag format (from Google Protocol Buffers). It handles endian changes and also size changes. You can binary serialize a big endian int16 and binary deserialize into a little endian int32. You can’t mix signed and unsigned ints.

The deserialization code is intended to handle malformed data. There is some fuzz testing of int, enum, struct, string, and vector deserialization using the AFL (American Fuzzy Lop); see the fuzz-tests rule in the Makefile.

Binary serialization writes to and reads from a std::streambuf, which may be a string (std::stringbuf), file, stdin/stdout (*std::cin.rdbuf(), *std::cout.rdbuf()), or a custom streambuf derived class. To read from a block of memory without allocating a std::stringbuf:

struct memorybuf : public std::streambuf {
    memorybuf(char* begin, char* end) {
        setg(begin, begin, end);
    }
};

JSON serialization using picojson

For C++ to JSON, use a writer visitor to convert a C++ data structure into picojson value, then the json library can convert this into a JSON string. Example:

picojson::value output;
traverse::JsonWriter jsonwriter{output};
visit(jsonwriter, yourobject);
std::cout << output.serialize();

Integers, enums, and floats are written as JSON numbers. Strings, vectors, and structs are written as JSON strings, arrays, and objects.

For JSON to C++, use picojson to parse a JSON string into a picojson value, then a reader visitor to convert a picojson value into the C++ data structure. Example:

picojson::value input;
auto err = picojson::parse(input, "{\"a\": 3}");
if (!err.empty()) { throw "parse error"; }
std::stringstream errors;
traverse::JsonReader jsonreader{input, errors};
visit(jsonreader, yourobject);
if (!errors.empty()) { throw "type mismatch error"; }

When deserializing, there may be type mismatches between the JSON data and the C++ data structures. The library leaves data unchanged in the object if it does not have new data to place there. If the JSON object does not contain all the fields in the user struct, or if the types don’t match, those fields will be left unchanged. Any errors and warnings during deserialization are written to the errors stream. Use a stringstream that captures them; if the string is empty, there were no problems.

It is expected that you will put a convenience wrapper around this.

JSON serialization using rapidjson

For C++ to JSON, use a writer visitor to convert a C++ data structure into a rapidjson document, then the json library can convert this into a JSON string. Example:

rapidjson::StringBuffer output;
traverse::JsonWriter jsonwriter{output};
visit(jsonwriter, yourobject);
std::cout << buffer.GetString();

Integers, doubles, enums, and floats are written as JSON numbers. Bools are written as JSON bools. Strings, vectors, and structs are written as JSON strings, arrays, and objects.

For JSON to C++, use rapidjson to parse a JSON string into a rapidjson document, then a reader visitor to convert that into the C++ data structure. Example:

rapidjson::Document input;
input.Parse("json string");
if (input.HasParseError()) { throw "parse error"; }
std::stringstream errors;
traverse::RapidJsonReader jsonreader{input, errors};
visit(jsonreader, yourobject);
if (!errors.empty()) { throw "read error"; }

When deserializing, there may be type mismatches between the JSON data and the C++ data structures. The library leaves data unchanged in the object if it does not have new data to place there. If the JSON object does not contain all the fields in the user struct, or if the types don’t match, those fields will be left unchanged. Any errors and warnings during deserialization are written to the errors stream. Use a stringstream that captures them; if the string is empty, there were no problems.

It is expected that you will put a convenience wrapper around this.

Lua serialization

The Lua extension uses the C-Lua API for Lua 5.2. The writer converts a C++ value into a Lua equivalent and pushes it onto the the Lua stack.

lua_State* L;
traverse::LuaWriter luawriter{L};
visit(luawriter, yourobject);
// this leaves the object at the top of the lua stack

Integers, enums, and floats are written as Lua numbers; the library doesn’t handle overflow. Strings are written as Lua strings. Vectors and structs are written as Lua tables.

The reader pops a value off the Lua stack and writes it to a C++ value.

// first put a lua object at the top of the stack
std::stringstream errors;
traverse::LuaReader luareader{L, errors};
visit(luareader, yourobject);
if (!errors.empty()) { throw "read error"; }
// the value will be popped off the lua stack

As Lua is dynamically typed, and tables are used both as arrays and structs, there are several type mismatches that may occur when converting Lua to C++. See the LuaReader class in traverse-lua.h to control which type mismatches will be treated as errors and which will be ignored.

It is expected that you will put a convenience wrapper around this.

I have also included a Lua-to-string function lua_repr and a string-to-Lua function lua_eval (primarily for unit tests) in lua-util.h.

Other visitors

The intent of this library is to define data structure traversal separately from the serialization format, so you can write a visitor class to interface to Protocol Buffers, Thrift, Capn Proto, Flatbuffer, MsgPack, XML, YAML, or one of many other formats. Although serialization is the primary use case, I’ve also used this library to visit the fields of data structures so that I can construct a debug GUI with the dear imgui library; I haven’t included that code here. Look at the existing visitors in traverse.h, traverse-picojson.h, traverse-rapidjson.h traverse-lua.h to see how to write a new visitor. You’ll have to define how the visitor works with each data type (numbers, strings, vectors, structs).

Data types

As described in the Motivation section, I want to be able to extend the set of visitors or the set of data types. Each of the included visitors supports signed/unsigned integers, enum, class enum, std::string, std::vector, and user-defined structs.

Use the TRAVERSE_STRUCT macro to define the visitor for a user-defined struct or class. For example: TRAVERSE_STRUCT(Point, FIELD(x) FIELD(y)) will visit the x and y fields of the Point class.

For binary serialization, structs are written by serializing each field. For JSON, structs are written as JSON objects. For Lua, structs are converted into Lua tables.

Variant data types

For passing messages over a network or through an external message queue, I’ve used the mapbox::variant library, which is similar to boost::variant and std::variant. Instead of sending many types of messages A, B, C over the network, I send one type, variant<A,B,C>. The variant keeps track of which type the message is.

This keeps the system simpler. I don’t need serialization to know about multiple types; it only knows about serializing one type. The variant class knows about multiple types but not about serialization.

The code in traverse-variant.h will serialize a variant by first serializing the integer type code and then serializing the data. It will deserialize by first deserializaing the the type code, switching to that variant, then deserializing the data.

One of the downsides of two-axis extension is that there can be “holes” in the combinations of extensions. I did not define the variant+json or variant+lua combinations.

Other data types

You’ll have to define how the data type works with each of the visitors that you want to use (binary serialize, binary deserialize, etc.). Look at traverse.h to see how string and vector work, or traverse-variant.h to see how data type extension works.

I didn’t need float/double binary serialization for my project so I didn’t implement them, but the JSON and Lua extensions do handle floats/doubles.

You can override the visitor for a specific type. For example, consider this data structure:

struct Message {
  enum {A, B, C, D} x;
  enum {P, Q, R, S} y;
};

The binary serialization will encode x and y to 1 byte each, for a total of 2 bytes. A more efficient encoding would use 2 bits for each, and could fit both into a total of 1 byte. You can define your own encoding for Message by defining template<> void traverse::visit(BinarySerialize& writer, const Message& m) and template<> void traverse::visit(BinaryDeserialize& reader, Message& m).

Libraries

The picojson extension uses the picojson library, licensed 2-clause BSD:

Copyright 2009-2010 Cybozu Labs, Inc. Copyright 2011-2014 Kazuho Oku All rights reserved.

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

  1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
  2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS “AS IS” AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

The rapidjson library uses the rapidjson library, licensed MIT:

Tencent is pleased to support the open source community by making RapidJSON available.

Copyright (C) 2015 THL A29 Limited, a Tencent company, and Milo Yip. All rights reserved.

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

The Lua extension links with the C-Lua library (not included).

The Variant extension uses the mapbox::variant library, licensed 3-clause BSD:

Copyright (c) MapBox All rights reserved.

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

  • Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
  • Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
  • Neither the name “MapBox” nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS “AS IS” AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.