-
Notifications
You must be signed in to change notification settings - Fork 120
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Missing speed test #20
Comments
Thanks for this. I'm a little surprised to see the msg/ms so low, especially compared to running a similar benchmark using Go and its channels. My understanding is Go channels are not lock-free, so I would at least expect results to be within an order of magnitude but they appear to be several orders apart. chan_t (C)chan_t: 1_1000000 send/recv time in ms: 3149 (317 nr_of_msg/msec) chan (Go)chan: 1_1000000 send/recv time in ms: 54.913379 (18210.498392 nr_of_msg/msec) Obviously it's not a completely equivalent comparison (e.g. goroutines vs pthreads). The Golang team has spent a lot of effort optimizing channels, but I was expecting chan_t to hold up better. |
Sorry, but I do not know Go. I've only read about it. My understanding is that Go routines are high speed user threads (aka fibers or coroutines). And if Go channels synchronize per default, only one slot per client is needed. Also if all routines run in only one thread then no locks are needed. So it is possible (if my understanding is right) running the benchmark comes down to a simple function call – the clients (channel <- id) call into servers which are stored in a waiting list (lock-free cause of a single thread). I've rewritten the benchmark to call a simple function instead of into the queue and the result is And as you can see the Go benchmark does not scale up for more threads which suggests that |
Yeah, I think you're right. You can tell the Go scheduler to utilize more cores with: runtime.GOMAXPROCS(runtime.NumCPU()) Doing that with a quad core system yields slightly higher latency, likely because it's no longer on a single thread. chan: 1_1000000 send/recv time in ms: 88.191041 (11339.020253 nr_of_msg/msec) |
If written test code to see whether goroutines could be implemented the way we've speculated. See https://github.com/je-so/testcode/blob/master/gochan.c I've implemented only the single thread case and got more than 11000 msg/msec. The implementation uses a gcc extension: take address of goto labels with &&LABEL and jump to label with goto*(addr). |
Now the test code (gochan.c) supports system threads. It scales very well: gochan: 1_30000 send/recv time in ms: 1 (30000 msg/msec) |
There is a much better test driver in directory Try it with chan if you want. |
With padding of the variables to the size of one cache line performance is much better!! See https://github.com/je-so/iqueue/blob/master/README.md for some numbers. |
Impressive, the padding has a pretty remarkable impact. |
I written a test which compares the number of transfered messages per millisec to the number of threads. I'm using a similar test for the lock-free iqueue.
The file is located at https://github.com/je-so/testcode/blob/master/chan_speed_test.c. I'd appreciate it if you would integrate it.
The text was updated successfully, but these errors were encountered: