Core model was changed::even more high performance #17

cmpxchg16 · 2013-06-15T11:10:03Z

Changing the core model so no locks at all in Scheduler layer
Adding optimizations
Adding some examples
Adding bug fixes

…ample

2. add destructor to SSLStream - call close 3. bug fix in SSL handling::call flush in case of successfull SSL_write that small from the threshold

…ion was asserr)

2. rename simplefileserver to simplehttpfileserver 3. add to simplehttpfileserver SSL 4. empty README

2. change echoserver example to be echoserver

kevincai · 2013-06-18T05:02:05Z

The change request improves the performance in the sake of losing multi-thread safety. Instead of changing the core scheduler model, it would be better to provide compiling level switch that is able to turn off multi-thread safety and gain the performance benefits.

cmpxchg16 · 2013-06-18T19:20:58Z

It doesn't lose multi-thread safety while the core model was changed without the need of locks because:

each Scheduler/IOManager run in it's own native thread without open native threads itself, so any tasks execution can be without locks at all because each Scheduler run ONLY it's tasks and doesn't submit tasks to other native threads
For context switching between network IO and disk IO, I am not using the WorkerPool for the same reason of wasted locks, I just switch in the same Scheduler between them, so again the Scheduler handle ONLY his tasks
I change accept so any Scheduler/Native thread can subscribe to accept (OS are thread safe internally from multiple threads accept) and the read/write/timeout handling submitted to specific native thread/Scheduler.

ianupright · 2013-07-26T03:18:53Z

These changes seem like the right way to go to me. I'm guessing even more atomic and multi threaded concurrency could be removed, improving performance even more. I guess what we are sacrificing in such a model is for many fibers to be distributed easily and evenly between a workerpool of threads?

cmpxchg16 · 2013-08-03T07:44:43Z

We get ~=uniform distribution by that the native thread that catch the accept his fibers will handle the connection/disk tasks.
The work stealing between native threads can be ignore because if there is one native thread that idle - it's mean your system not under load - so this case not interesting.

ianupright · 2013-08-03T15:28:30Z

For the typical web-service application, I agree, this would be the case. However, if you have something a little more complex, a web-service that spawns off hundreds of fibers, each needing a certain amount of IO, and a mixture of IO and CPU processing, then it may get a little more complicated. Knowing how or when to migrate which fibers to which threads is now not obvious. However, with some higher-level scheduling, this could be handled. Ideally, you should only pay for the multithreaded concurrency at the points where you need it, instead of being everywhere.

ianupright · 2013-08-03T15:48:07Z

If fibers can automatically and magically migrate between threads, however, that can create other problems. Then you are forced to deal with multithreaded concurrency issues in cases where you may not want the extra design complexity nor do you want the (sometimes substantial) additional concurrency overhead. So I think moving fibers between threads controlled more at an application level is what makes sense to me.

kevincai · 2013-08-04T13:46:53Z

A few comments:

Scheduler is changed to the single-thread model, so that the mutex can be removed safely. This is an extreme case of original design that the Scheduler will run with only one native thread. I would rather there is a compiling flag that turns Scheduler to this model if the application is only interested in single-thread model and be critical to the performance.
This changes the original design purpose of the Scheduler which treats native threads as execution pool. Single-thread scheduler throws the multi-threads scheduling issue to the application level. Each application needs a Scheduler of the scheduler in order to leverage multi-thread and task balancer. In other words, it is not so application friendly if used in multi-thread environment.
StackPool should not be implemented in Scheduler. It should be implemented independently and should be easily replaced by user-provided pool, just like the allocator in STL.

cmpxchg16 · 2013-08-04T21:02:11Z

I agree with your comment on StackPool.
I also want to develop specific stack manager to gain more performance on Linux system,
Because the implementation of mmap/munmap include a lot of VMA's, for a long run process it's a performance killer, a very lightweight implementation at kernel can boost performance.

cmpxchg16 · 2013-08-04T21:02:26Z

OOPS...

2. change the default size of buffered stream to 4K 3. add simple implementation to transfer stream

mtanski · 2014-01-02T04:05:20Z

I've did some tests against your branch cmpxchg16 and my own branch. My own branch contains changes to port Mordor to C++11 but also includes a change that uses malloc instead of mmap for stack allocation. Generally system mallocs (or better yet tcmalloc / jemalloc) perform thread aware caching of freed values, which basically means we get pooled stacks for free. We also avoid some potential performance penalties for using mmap where the kernel has to change the VMA for the process.

It strikes me as much of the work to do stack polling internally is moot if we made that small change. There's also no need to remove the built in multi core model. I'm sure I could get better performance on Linux by using _setjmp fibers (avoid process mask change) but I didn't for the sake of not changing things too much.

Here's my results. I've used the same testing methodology as outline in your post.

This is a test on a Ubuntu 13.04 VM inside OSX.
The hardware is a 13" Early 2013 Macbook Pro. SSD drive, 8gig rams, two HT
i7 cores. The VM gets 4 virtual cores.

Mordor C++11 (my branch)

Lifting the server siege... done.
Transactions: 34628 hits
Availability: 100.00 %
Elapsed time: 9.95 secs
Data transferred: 43.33 MB
Response time: 0.12 secs
Transaction rate: 3480.20 trans/sec
Throughput: 4.35 MB/sec
Concurrency: 426.04
Successful transactions: 34628
Failed transactions: 0
Longest transaction: 7.18
Shortest transaction: 0.00

cmpxchng16 Mordor

The server is now under siege...
Lifting the server siege... done.
Transactions: 40459 hits
Availability: 100.00 %
Elapsed time: 9.26 secs
Data transferred: 50.62 MB
Response time: 0.11 secs
Transaction rate: 4369.22 trans/sec
Throughput: 5.47 MB/sec
Concurrency: 485.08
Successful transactions: 40459
Failed transactions: 0
Longest transaction: 1.72
Shortest transaction: 0.00

Mordor without mmap stack / but malloc

Transactions: 44371 hits
Availability: 100.00 %
Elapsed time: 9.95 secs
Data transferred: 55.52 MB
Response time: 0.11 secs
Transaction rate: 4459.40 trans/sec
Throughput: 5.58 MB/sec
Concurrency: 486.28
Successful transactions: 44371
Failed transactions: 0
Longest transaction: 1.24
Shortest transaction: 0.00

Mordor C++11 malloc + tcmalloc

Transactions: 43552 hits
Availability: 100.00 %
Elapsed time: 9.57 secs
Data transferred: 54.49 MB
Response time: 0.11 secs
Transaction rate: 4550.89 trans/sec
Throughput: 5.69 MB/sec
Concurrency: 485.36
Successful transactions: 43552
Failed transactions: 0
Longest transaction: 1.23
Shortest transaction: 0.00

mtanski · 2014-01-02T04:42:47Z

For comparison here is a stock mordor (without C++11) from Cody's branch. Same as before, best run out of 3.

Transactions: 33561 hits
Availability: 100.00 %
Elapsed time: 9.76 secs
Data transferred: 41.99 MB
Response time: 0.13 secs
Transaction rate: 3438.63 trans/sec
Throughput: 4.30 MB/sec
Concurrency: 438.10
Successful transactions: 33561
Failed transactions: 0
Longest transaction: 7.43
Shortest transaction: 0.00

cmpxchg16 and others added 27 commits April 28, 2013 09:42

renamed: README -> README.md

0dd30a8

Updated README.md

d5b8847

Updated README.md

d433740

change the scheduler core + add stacks pool + add simplefileserver ex…

a48bc5a

…ample

Updated README.md

152e1af

Updated README.md

150ee16

Updated README.md

23746b9

small fix

48c0f9e

Updated README.md

84ea6b7

Update README.md

e425d5a

Update README.md

6ea4f59

Update README.md

74e373a

Update README.md

1ee1aa8

remove locks from timer while the core model was changed

830afa9

Merge branch 'next' of https://github.com/cmpxchg16/mordor into next

0d7d1e9

Update README.md

8a5711b

remove wasted commented locks from timer

3d68e9c

remove locks from iomanager_epoll while the core model was changed

b1be6c3

1. add static function to create SSL context::add SSL session cache

50653b9

2. add destructor to SSLStream - call close 3. bug fix in SSL handling::call flush in case of successfull SSL_write that small from the threshold

add include for tcp socket options

6ce784c

bug fix::ensure unregister before register to READ event (SSL connect…

94d71af

…ion was asserr)

remove locks from iomanager_iocp while the core model was changed

f735f70

copy README.md to README::compilation need it

e8748d2

add tcmalloc support

aeeeccb

1. add SSL terminator example

71df5b5

2. rename simplefileserver to simplehttpfileserver 3. add to simplehttpfileserver SSL 4. empty README

1. add simplehttpserver example

a473eae

2. change echoserver example to be echoserver

dummy fix

99afd58

cmpxchg16 closed this Aug 4, 2013

cmpxchg16 reopened this Aug 4, 2013

1. add simplehttpfileuploadserver example

58e62ab

2. change the default size of buffered stream to 4K 3. add simple implementation to transfer stream

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Core model was changed::even more high performance #17

Core model was changed::even more high performance #17

cmpxchg16 commented Jun 15, 2013

kevincai commented Jun 18, 2013

cmpxchg16 commented Jun 18, 2013

ianupright commented Jul 26, 2013

cmpxchg16 commented Aug 3, 2013

ianupright commented Aug 3, 2013

ianupright commented Aug 3, 2013

kevincai commented Aug 4, 2013

cmpxchg16 commented Aug 4, 2013

cmpxchg16 commented Aug 4, 2013

mtanski commented Jan 2, 2014

mtanski commented Jan 2, 2014

Core model was changed::even more high performance #17

Are you sure you want to change the base?

Core model was changed::even more high performance #17

Conversation

cmpxchg16 commented Jun 15, 2013

kevincai commented Jun 18, 2013

cmpxchg16 commented Jun 18, 2013

ianupright commented Jul 26, 2013

cmpxchg16 commented Aug 3, 2013

ianupright commented Aug 3, 2013

ianupright commented Aug 3, 2013

kevincai commented Aug 4, 2013

cmpxchg16 commented Aug 4, 2013

cmpxchg16 commented Aug 4, 2013

mtanski commented Jan 2, 2014

mtanski commented Jan 2, 2014