-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Gas Cooling on GPU #185
Open
spencerw
wants to merge
63
commits into
N-BodyShop:master
Choose a base branch
from
spencerw:gpu_cool
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Gas Cooling on GPU #185
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Free all device data at end of simulation
Note that 'grackle' and 'planet' cooling still need to be tested and updated. Although they don't make use of StiffStep, changes to updateuDot and the way memory is managed have created some incompatibilities. It looks like a few tweaks need to be made to get this to work without the CUDA or cooling flags as well. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
A few of the cooling modules (boley, cosmo, metal and h2) use a stiff ODE solver (StiffStep), which creates a significant bottleneck during updateuDot. This PR introduces CudaStiffStep, which is a GPU version of the solver. With the CUDA flag enabled, the ODE integration now happens for all particles on a given TreePiece in parallel. The parameter 'nGpuGasMinParts' can be used to direct TreePieces with small particle counts to do the integration on the CPU.
This required making a few significant structural changes to the code:
To minimize code duplication,
__device__ __host__
specifiers have been added to many of the cooling subroutines. An empty .cu file has been added to each of the cooling modules, which allows the old C code to be used on the GPU. When the CUDA flag is enabled, these new .cu files are compiled separately for the host and device using the '-dc' flag and then linked together in a separate step at the end of the cuda.mk file. Also note that the clDerivs function for cosmo cooling makes use of RootFind, which required making some of the routines from stiff.c accessible from the device as well.The parallel nature of CudaStiffStep requires a separate clDerivsData and Stiff struct (along with space for the associated deep pointers) for each of the gas particles on both the host and device side. Originally, allocation for this data was handled by the TreePieces from within the cooling subroutines. This is now handled by the DataManager (allocCoolParticleBlock), which then assigns blocks of pre-allocated host and device memory to the TreePieces (setCoolPtrs). In the event that the gas particle count increases significantly, a larger block of memory is then re-allocated.
A test suite 'test_cooling' for the different cooling modules is also included.