Performance related bench marks for Serialization an Deserialization #89
-
This is more of a question: Data classes: from faker import Faker
from dataclasses import dataclass
from pure_protobuf.dataclasses_ import field, message
Faker.seed(0)
fake = Faker()
@message
@dataclass
class Head():
msgId: str = field(1)
msgCode: str = field(2)
guid: str = field(3)
src: str = field(4)
ts: int = field(5)
@staticmethod
def fakeMe():
return Head(fake.md5(),
fake.pystr(min_chars=5, max_chars=5),
fake.ean(length=13),
fake.pystr(min_chars=1, max_chars=1),
int(time()*1000)
)
@message
@dataclass
class Message():
head: Head = field(1)
# data: Data = field(2)
status: bool = field(2)
def fakeMe(self):
self.head = Head.fakeMe()
# self.data = Data.fakeMe()
self.bool = fake.pybool()
return self Running Serialization and Deserialization: import time, sys, orjson, message_pb2
from object_gen import create_dummy_obj
from dto.device_message import Message # this is my data calss
def measure_serialize_deserialize(obj, format):
ser_fun = ser_obj.get(format)
deser_fun = deser_obj.get(format)
# serialize and measure time
start_time = time.time()
ser_data = ser_fun(obj)
time_taken_ser = time.time() - start_time
mem_ser = sys.getsizeof(ser_data)
# deserialize and measure time
start_time = time.time()
deser_data = deser_fun(ser_data, Message)
time_taken_deser = time.time() - start_time
return (time_taken_ser, time_taken_deser, mem_ser)
def serialize_json(obj):
return orjson.dumps(obj)
def deserialize_json(byteArr, klass):
return orjson.loads(byteArr)
def serialize_proto(obj):
return obj.dumps()
def deserialize_proto(byteArr, klass):
return klass.loads(byteArr)
def serialize_avro(obj):
pass
def deserialize_avro(byteArr, klass):
pass
ser_obj = {
"J": serialize_json,
"P": serialize_proto,
"A": serialize_avro,
}
deser_obj = {
"J": deserialize_json,
"P": deserialize_proto,
"A": deserialize_avro,
}
def runBenchMarks(numberOfMsgs, format):
ser_times = []
deser_times = []
memory_usage_plain = []
memory_usage_ser = []
for i in range(1, numberOfMsgs + 1):
# create new object basis on the format.
obj = create_dummy_obj(format)
memory_usage_plain.append(sys.getsizeof(obj))
ser_time, deser_time, mem_ser = measure_serialize_deserialize(obj, format)
ser_times.append(ser_time)
deser_times.append(deser_time)
memory_usage_ser.append(mem_ser)
# return values
return ser_times, deser_times, memory_usage_plain, memory_usage_ser After running the program for 1000 messages using Running benchmark for 1000 samples and format = P
=========== Serialization METRICES (Time in ms) ====================
Total Time taken for serialization: 15.65241813659668
Avg Time taken for serialization: 0.01565241813659668
Min Time taken for serialization: 0.014781951904296875
Max Time taken for serialization: 0.04220008850097656
=========== Deserialization METRICES (Time in ms) ====================
Total Time taken for deserialization: 21.908044815063477
Avg Time taken for deserialization: 0.021908044815063477
Min Time taken for deserialization: 0.0209808349609375
Max Time taken for deserialization: 0.051975250244140625
=========== MEMORY METRICES (Bytes) ====================
Total memory utilized by Plain objects: 103000
Avg memory utilized by Plain objects: 103.0
Min memory utilized: 103
Max memory utilized: 103
Total memory utilized by serialized objects: 103000
Avg memory utilized by serialized objects: 103.0
Min memory utilized: 103
Max memory utilized: 103
Then I ran the same code for JSON: Running benchmark for 1000 samples and format = J
=========== Serialization METRICES (Time in ms) ====================
Total Time taken for serialization: 0.9558200836181641
Avg Time taken for serialization: 0.0009558200836181642
Min Time taken for serialization: 0.0
Max Time taken for serialization: 0.20194053649902344
=========== Deserialization METRICES (Time in ms) ====================
Total Time taken for deserialization: 1.4314651489257812
Avg Time taken for deserialization: 0.0014314651489257812
Min Time taken for deserialization: 0.0007152557373046875
Max Time taken for deserialization: 0.029087066650390625
=========== MEMORY METRICES (Bytes) ====================
Total memory utilized by Plain objects: 182518
Avg memory utilized by Plain objects: 182.518
Min memory utilized: 182
Max memory utilized: 183
Total memory utilized by serialized objects: 182518
Avg memory utilized by serialized objects: 182.518
Min memory utilized: 182
Max memory utilized: 183
If you seen total time for serialization and Deserialization its way less than compared to JSON. Ideally this should not be the case if I am correct and understand protobuf correctly. Could it be because we are compiling proto schema everytime we are calling |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 2 replies
-
I am using python |
Beta Was this translation helpful? Give feedback.
-
Hi @matrixbegins, This is barely a fair comparison, to be honest. :) You're comparing pure Python code performance in
There's no such thing as «schema» in |
Beta Was this translation helpful? Give feedback.
Hi @matrixbegins,
This is barely a fair comparison, to be honest. :) You're comparing pure Python code performance in
pure_protobuf
to the compiled Rust code performance inorjson
(and by the way, the fastest JSON package ever for Python AFAIK). 15x difference is quite reasonable here.There's no such thing as «schema» in
pure_protobuf
. Basically, the@message
decorator is already doing what you want: constructing a message serializer from a class definition. This only happens once as soon as Python executes the module. Thedumps
/l…