-
-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
What about maps ? #16
Comments
To keep the unmarshaller performant you'd need access to the key-value pairs one by one. Generally speaking maps cause performance degradation and they are overused with I/O out of lazyness. I'd like to make the choice a compilation option such that one can have maps only where needed and only when possible for the respective language.
How about a compiler flag to specify the key in struct lists? When the struct has two fields then the other field will be used as a value. Otherwise the entire struct is. |
I am not knowledgeable about C, and even less Go or Js. Though in Java, it
would make sense to have Maps available.
My idea was that Colfer could serialize a list of buckets (a bucket
contains some entries in a list). Cofler could unmarshall the array of
buckets efficiently and a read only Map could be created, backed by this
array. The implementation would be super fast. And if devs want a non
read-only Map, they would create it and fill the data from the read-only
one.
In java, that could be terribly efficient for unmarshalling and getting a
data structure read to use.
Serializing would require going through all the entries, building buckets
and serializing them. It would be a bit slower, but that's not usually
where apps need speed.
For Android those scenarios would make a lot of sense and provide high perf
in the most common use cases.
2017-01-14 14:01 GMT-08:00 Pascal S. de Kloe <notifications@github.com>:
… To keep the unmarshaller performant you'd need access to the key-value
pairs one by one. Generally speaking maps cause performance degradation and
they are overused with I/O out of lazyness.
I'd like to make the choice a compilation option such that one can have
maps only where needed and only when possible for the respective language.
- C has no native maps
- Go has limited struct key support
- JavaScript requires string keys
How about a compiler flag to specify the key in struct lists? When the
struct has two fields then the other field will be used as a value.
Otherwise the entire struct is.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#16 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABv33X1Lcm4cZLV3vlIurGAGX-kYxAqlks5rSUW9gaJpZM4Ljpc1>
.
|
I like this read-only map idea. The problem is that unmarshaling would rely on the data for the uniqueness constraint which goes against the resilient against mallicious input principal. Imagine a Map of size n with a #keySet() of size n - 1. What I can do is make this work for Java as stated before. Duplicate keys would disappear (in the HashMap) during the unmarshaling. The struct list wire format has a size prefix which should help allocation. On second thought with the forward compatibility in mind this value extraction idea for the cases with 2 struct fields might be a bit whobly b.t.w.. |
It would be possible to do the same trick as FlatBuffers does. Annotate one field of the struct as |
Sounds good @mzaks. With the key declaration in the schema the compiler can make explicit declarations about it's uniqueness on platforms without map support too. |
One more consideration from my side. While FlatBuffers is marshalling, it sorts the array by key and it provides accessor by key methods which internally do binary search on the array to have a fast lookup. This way FlatBuffers mimics map semantics by providing This strategy totally makes sense in FlatBuffers but is a bit more "complicated" for colfer. Colfer has an unmarshmaling step and the resulting objects are mutable. It still fits for languages which have a natural All in all it is possible to implement maps in colfer, but IMHO, there should be a cost/value consideration. I see colfer as a format to describe data transfer objects. AFAIK maps are not common for data transfer objects. |
Interesting @mzaks! So by offering only those 4 map operators you bypass the issue of malformed data. Sorted keys do give better performance with unmarshalling, especially when you're using an array instead of tree nodes. It is quite easy to detect unsorted data in which case you sort after unmarshaling the entire list. Writing your own map alike functions is quite a bit more work yet it seems like the right thing to do ™️. Go's sort.Interface and java.utils.Arrays#sort should help a little. |
Well you can even use SortedMap implementation in languages which support it, like for example Javas |
The problem with java.util.(Sorted)Map is that it offers methods such as #size() and #keySet() which require the map to be consistent all the time. If not users could suffer from serious bugs and even security issues. Sure java.util.TreeMap does that yet it is notoriously slow and the unmarshaller needs another error for receiving duplicates. All Java collections (including the hash based) allocate memory like it's Christmas. With the array backing sorted set you can do operations on the array directly and sort once if needed. |
It would be nice to serialize maps, maybe as a simple 2D array ?
The text was updated successfully, but these errors were encountered: