What about maps ? #16

stephanenicolas · 2017-01-14T14:59:58Z

It would be nice to serialize maps, maybe as a simple 2D array ?

pascaldekloe · 2017-01-14T22:01:33Z

To keep the unmarshaller performant you'd need access to the key-value pairs one by one. Generally speaking maps cause performance degradation and they are overused with I/O out of lazyness.

I'd like to make the choice a compilation option such that one can have maps only where needed and only when possible for the respective language.

C has no native maps
Go has limited struct key support
JavaScript requires string keys

How about a compiler flag to specify the key in struct lists? When the struct has two fields then the other field will be used as a value. Otherwise the entire struct is.

stephanenicolas · 2017-01-14T22:31:09Z

I am not knowledgeable about C, and even less Go or Js. Though in Java, it would make sense to have Maps available. My idea was that Colfer could serialize a list of buckets (a bucket contains some entries in a list). Cofler could unmarshall the array of buckets efficiently and a read only Map could be created, backed by this array. The implementation would be super fast. And if devs want a non read-only Map, they would create it and fill the data from the read-only one. In java, that could be terribly efficient for unmarshalling and getting a data structure read to use. Serializing would require going through all the entries, building buckets and serializing them. It would be a bit slower, but that's not usually where apps need speed. For Android those scenarios would make a lot of sense and provide high perf in the most common use cases. 2017-01-14 14:01 GMT-08:00 Pascal S. de Kloe <notifications@github.com>:

…

To keep the unmarshaller performant you'd need access to the key-value pairs one by one. Generally speaking maps cause performance degradation and they are overused with I/O out of lazyness. I'd like to make the choice a compilation option such that one can have maps only where needed and only when possible for the respective language. - C has no native maps - Go has limited struct key support - JavaScript requires string keys How about a compiler flag to specify the key in struct lists? When the struct has two fields then the other field will be used as a value. Otherwise the entire struct is. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#16 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABv33X1Lcm4cZLV3vlIurGAGX-kYxAqlks5rSUW9gaJpZM4Ljpc1> .

pascaldekloe · 2017-01-15T10:15:22Z

I like this read-only map idea. The problem is that unmarshaling would rely on the data for the uniqueness constraint which goes against the resilient against mallicious input principal. Imagine a Map of size n with a #keySet() of size n - 1.
Plus some languages may not be able to deliver uniquene keys for some types without a ton of work. The development has been just me in my spare time until now.

What I can do is make this work for Java as stated before. Duplicate keys would disappear (in the HashMap) during the unmarshaling. The struct list wire format has a size prefix which should help allocation.

On second thought with the forward compatibility in mind this value extraction idea for the cases with 2 struct fields might be a bit whobly b.t.w..

mzaks · 2017-03-11T16:08:17Z

It would be possible to do the same trick as FlatBuffers does. Annotate one field of the struct as key and if another struct has a field which is list of structs with key annotated struct, it will unmarshal it as a Map. When you marshal such a struct with dictionary you can marshal just the array of values. Performance should be fine I guess.

pascaldekloe · 2017-03-12T14:01:33Z

Sounds good @mzaks. With the key declaration in the schema the compiler can make explicit declarations about it's uniqueness on platforms without map support too.

mzaks · 2017-03-13T12:37:51Z

One more consideration from my side. While FlatBuffers is marshalling, it sorts the array by key and it provides accessor by key methods which internally do binary search on the array to have a fast lookup. This way FlatBuffers mimics map semantics by providing O log(n) for average lookup time.

This strategy totally makes sense in FlatBuffers but is a bit more "complicated" for colfer.
In FlatBuffers there is no unmarshmaling step. Buffers are build in the way which gives user random access directly into the buffer. This is why lookup by key is a cool feature to have. And it also implies that the data is read only.

Colfer has an unmarshmaling step and the resulting objects are mutable. It still fits for languages which have a natural Map concept like for example Java and JavaScript, but for languages which don't have such concept e.g. C the generated struct would need to hide the backing arrays and provide add, remove, get, has accessor methods. Which internally perform binary search and keep the underlying array sorted. It also implies that even for languages which support Map, the marshalling step will be slower because they would need to store the array sorted by key.

All in all it is possible to implement maps in colfer, but IMHO, there should be a cost/value consideration. I see colfer as a format to describe data transfer objects. AFAIK maps are not common for data transfer objects.

pascaldekloe · 2017-03-13T14:04:45Z

Interesting @mzaks! So by offering only those 4 map operators you bypass the issue of malformed data.

Sorted keys do give better performance with unmarshalling, especially when you're using an array instead of tree nodes. It is quite easy to detect unsorted data in which case you sort after unmarshaling the entire list.

Writing your own map alike functions is quite a bit more work yet it seems like the right thing to do ™️. Go's sort.Interface and java.utils.Arrays#sort should help a little.

mzaks · 2017-03-13T14:11:17Z

Well you can even use SortedMap implementation in languages which support it, like for example Javas TreeMap. In JavaScript it would imply foot work.

pascaldekloe · 2017-03-13T14:29:40Z

The problem with java.util.(Sorted)Map is that it offers methods such as #size() and #keySet() which require the map to be consistent all the time. If not users could suffer from serious bugs and even security issues. Sure java.util.TreeMap does that yet it is notoriously slow and the unmarshaller needs another error for receiving duplicates. All Java collections (including the hash based) allocate memory like it's Christmas. With the array backing sorted set you can do operations on the array directly and sort once if needed.

pascaldekloe added the enhancement label Jan 14, 2017

pascaldekloe added the help wanted label Feb 20, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What about maps ? #16

What about maps ? #16

stephanenicolas commented Jan 14, 2017

pascaldekloe commented Jan 14, 2017

stephanenicolas commented Jan 14, 2017 via email

pascaldekloe commented Jan 15, 2017

mzaks commented Mar 11, 2017

pascaldekloe commented Mar 12, 2017

mzaks commented Mar 13, 2017

pascaldekloe commented Mar 13, 2017

mzaks commented Mar 13, 2017

pascaldekloe commented Mar 13, 2017

What about maps ? #16

What about maps ? #16

Comments

stephanenicolas commented Jan 14, 2017

pascaldekloe commented Jan 14, 2017

stephanenicolas commented Jan 14, 2017 via email

pascaldekloe commented Jan 15, 2017

mzaks commented Mar 11, 2017

pascaldekloe commented Mar 12, 2017

mzaks commented Mar 13, 2017

pascaldekloe commented Mar 13, 2017

mzaks commented Mar 13, 2017

pascaldekloe commented Mar 13, 2017