-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OpenMP benchmark #3
Comments
I suggest to use nonlinear oscillator lattices, I have some tuned codes for that already we can check against. |
1725e45 has a trivial benchmark to check if there's any kind of speedup but the values vary wildly with gcc's OpenMP library, with Intel's it's more stable... |
Maybe 1024 lorenz systems is still not enough to benefit from parallelization? For a proper benchmark I suggest to look at the scaling with cores. For such a loranz example, which is completely uncoupled, it should scale almost perfectly with the number of cores (or memory bandwidth for that matter) |
Based on <https://github.com/mariomulansky/hpx_odeint/tree/9792ca4f330bf0cffde4f000e900fb4c1c254891/osc_chain_1d/openmp2> Use osc_chain_speedup.{sh,gnu} to compute and plot speedup. "split" uses openmp_state/openmp_algebra; "simple" uses vector/openmp_range_algebra
Benchmark results for GCC 4.7.3 and ICC 13.1.1 on i7-3770 with (short) n=4096, 1024steps; (long) n=4194304, 1step; using (split) |
Times for split/simple not comparable because simple case doesn't store values between cycles of the loop see here; all with |
From <https://github.com/mariomulansky/hpx_odeint/blob/9792ca4f330bf0cffde4f000e900fb4c1c254891/osc_chain_1d/openmp/system.hpp>. Makes simple/split times comparable (change is minimal though).
Measure speedup
The text was updated successfully, but these errors were encountered: