-
Notifications
You must be signed in to change notification settings - Fork 63
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Core model was changed::even more high performance #17
base: next
Are you sure you want to change the base?
Conversation
cmpxchg16
commented
Jun 15, 2013
- Changing the core model so no locks at all in Scheduler layer
- Adding optimizations
- Adding some examples
- Adding bug fixes
2. add destructor to SSLStream - call close 3. bug fix in SSL handling::call flush in case of successfull SSL_write that small from the threshold
2. rename simplefileserver to simplehttpfileserver 3. add to simplehttpfileserver SSL 4. empty README
2. change echoserver example to be echoserver
The change request improves the performance in the sake of losing multi-thread safety. Instead of changing the core scheduler model, it would be better to provide compiling level switch that is able to turn off multi-thread safety and gain the performance benefits. |
It doesn't lose multi-thread safety while the core model was changed without the need of locks because:
|
These changes seem like the right way to go to me. I'm guessing even more atomic and multi threaded concurrency could be removed, improving performance even more. I guess what we are sacrificing in such a model is for many fibers to be distributed easily and evenly between a workerpool of threads? |
We get ~=uniform distribution by that the native thread that catch the accept his fibers will handle the connection/disk tasks. |
For the typical web-service application, I agree, this would be the case. However, if you have something a little more complex, a web-service that spawns off hundreds of fibers, each needing a certain amount of IO, and a mixture of IO and CPU processing, then it may get a little more complicated. Knowing how or when to migrate which fibers to which threads is now not obvious. However, with some higher-level scheduling, this could be handled. Ideally, you should only pay for the multithreaded concurrency at the points where you need it, instead of being everywhere. |
If fibers can automatically and magically migrate between threads, however, that can create other problems. Then you are forced to deal with multithreaded concurrency issues in cases where you may not want the extra design complexity nor do you want the (sometimes substantial) additional concurrency overhead. So I think moving fibers between threads controlled more at an application level is what makes sense to me. |
A few comments:
|
I agree with your comment on StackPool. |
OOPS... |
2. change the default size of buffered stream to 4K 3. add simple implementation to transfer stream
I've did some tests against your branch cmpxchg16 and my own branch. My own branch contains changes to port Mordor to C++11 but also includes a change that uses malloc instead of mmap for stack allocation. Generally system mallocs (or better yet tcmalloc / jemalloc) perform thread aware caching of freed values, which basically means we get pooled stacks for free. We also avoid some potential performance penalties for using mmap where the kernel has to change the VMA for the process. It strikes me as much of the work to do stack polling internally is moot if we made that small change. There's also no need to remove the built in multi core model. I'm sure I could get better performance on Linux by using _setjmp fibers (avoid process mask change) but I didn't for the sake of not changing things too much. Here's my results. I've used the same testing methodology as outline in your post. This is a test on a Ubuntu 13.04 VM inside OSX. Mordor C++11 (my branch) Lifting the server siege... done. cmpxchng16 Mordor The server is now under siege... Mordor without mmap stack / but malloc Transactions: 44371 hits Mordor C++11 malloc + tcmalloc Transactions: 43552 hits |
For comparison here is a stock mordor (without C++11) from Cody's branch. Same as before, best run out of 3. Transactions: 33561 hits |