Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/gaugefield unity #1384

Merged
merged 67 commits into from
Oct 18, 2023
Merged

Feature/gaugefield unity #1384

merged 67 commits into from
Oct 18, 2023

Conversation

maddyscientist
Copy link
Member

@maddyscientist maddyscientist commented May 26, 2023

This PR's primary purpose is to unify the cpuGaugeField and cudaGaugeField classes, along with an assortment of other cleanups.

  • Removed cpuGaugeField and cudaGaugeField classes, we have now only GaugeField
    • Similar to ColorSpinorField the communiciation routines for host and device are still separate within the GaugeField class, unification of these is deferred until later
    • Added move and copy constructors and assignment operators for GaugeField to facilitate significant cleanup
    • Raw field pointer is now obtained using the data() method. This is to align with STL design. Though here we also have an optional template parameter that can be used to perform a cast to the desired type. (Also applied to ColorSpinorField and CloverField)
  • Introduced a smart pointer class quda_ptr, which provides heterogenous memory allocation
    • Deployed in GaugeField, ColorSpinorField and CloverField
  • Introduced qudaMemcpy and qudaMemset functions that accept a quda_ptr
    • This allows for unified calls to memcpy and memset without needing to worry about the location of the field
  • Introduced the concept of a profile stack, with pushProfile and popProfile functions, together with getProfile for getting a reference to the current top of stack profile.
    • This facilitates a significant cleanup of the profiling, we can now profile in arbitrary functions without needing to pass the profile to it explicitly
  • Profiles now no longer complain about being double started: we now have a ref counter to allow things to just work (as long as the number of starts and stops match)
  • Significant cleanup of the interface for the non-solver functions to utilize the technology introduced as part of this PR
  • Jitify
    • Updated the version of Jitify included with QUDA
    • Some fixes for issues that creeped in (e.g., moved some function definitions out of headers to cpp files to avoid the run-time compiler from parsing them)

Outstanding items

  • Add quda_ptr for ROCm
  • Apply clang-format

…method with a better replacement named data()
…fication. Introduced new memory allocation wrapper quda_ptr, which is deployed for gauge field allocations. Still a WIP
…d copy assigment) to clean up interface_quda.cpp. Added new profile stack to allow for autoprofiling while also dramatically reducing LOC in the interface. Work in progress
…ious QUDA interfaces. Add ref counting support to the profiling, to allow for multiple starts without throwing an error: if a timer has already been started we simply increment the ref counter and return. Profiling now performs a device sync if the type is H2D, D2H or COMPUTE: this negates the need to use explicit synchronization and ensures accurate profiling
Copy link
Member

@hummingtree hummingtree left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have verified (again) that CPS works good with this branch.

@weinbe2
Copy link
Contributor

weinbe2 commented Oct 6, 2023

Tentative approve, I noticed a few cosmetic things, and this needs a fresh merge of develop + clang-format. After that it's good to go!

@maddyscientist
Copy link
Member Author

maddyscientist commented Oct 6, 2023

@SaltyChiang I have merged in your changes into this branch, that will shortly be merged into develop.

One line I have not included is this one:

gauge[d] = (param.gauge) ? ((void **)param.gauge)[d] : nullptr;

as this line doesn't make sense to me. If it's a reference to a pre-existing field, why should the value be a nullptr? Can you explain why this change is needed?

Thanks

@SaltyChiang
Copy link
Contributor

Hi @maddyscientist, there are some if statements about the input gauge link, like

if (longlink) {
and
if (ulink) {
. I believe the logic here is originally used for QUDA_MILC_GAUGE_ORDER, but this will fail if QUDA_QDP_GAUGE_ORDER is used here. For example, if the ulink is nullptr here, the unitarized fat7 link will not be calculated, which is right; but cpuUnitarizedLink defined in
cpuGaugeField cpuUnitarizedLink(gParam);
fails now as you cannot get ((void **)nullptr)[d]. A null QUDA_QDP_GAUGE_ORDER field should be something like void *ulink[4] = {nullptr, nullptr, nullptr, nullptr} but now the if (ulink) will return true which we do not expect.

So I made all four reference fields nullptr when the input value itself is nullptr and I think this will not affect the existing logic.

@maddyscientist
Copy link
Member Author

@SaltyChiang thanks for the explanation. I have fixed my branch accordingly (4c308f6), so it should work for your usecase. If you could test this branch, that would be appreciated, thanks.

…unning, and if so push it to the stack, and restore after the newly started timer is stopped. Fixes timing issues as noted by Jiqun
… which is incremented whenever tuneLaunch is called; for solver gflops and timing, we now compute the time and gflop between pushing the present interface profile, this now ensures we include all operations and includes upload/download time
Copy link
Contributor

@weinbe2 weinbe2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, thanks @maddyscientist !

@maddyscientist maddyscientist merged commit 0c016af into develop Oct 18, 2023
4 checks passed
@maddyscientist maddyscientist deleted the feature/gaugefield_unity branch October 18, 2023 18:46
@SaltyChiang
Copy link
Contributor

SaltyChiang commented Oct 19, 2023

Hi @maddyscientist, very sorry for the late feedback. I think it should be !unitarizedLink.StaggeredPhaseApplied() && param->staggered_phase_applied here.

quda/lib/interface_quda.cpp

Lines 3748 to 3749 in 0c016af

if (unitarizedLink.StaggeredPhaseApplied() && param->staggered_phase_applied)
unitarizedLink.applyStaggeredPhase();

Edit: I believe gauge_param.location should be changed to QUDA_CUDA_FIELD_LOCATION here.

quda/lib/interface_quda.cpp

Lines 765 to 774 in 0c016af

case QUDA_SMEARED_LINKS:
gauge_param.create = QUDA_NULL_FIELD_CREATE;
gauge_param.reconstruct = param->reconstruct;
gauge_param.setPrecision(param->cuda_prec, true);
gauge_param.ghostExchange = QUDA_GHOST_EXCHANGE_PAD;
gauge_param.pad = param->ga_pad;
cudaGauge = new GaugeField(gauge_param);
copyExtendedGauge(*cudaGauge, *gaugeSmeared, QUDA_CUDA_FIELD_LOCATION);
break;
default: errorQuda("Invalid gauge type");

maddyscientist added a commit that referenced this pull request Oct 20, 2023
@maddyscientist
Copy link
Member Author

@SaltyChiang thanks for pointing put my bug. I've pushed a fix to my next PR that will be merged (#1393). This is a fairly simple PR, so will likely be merged soon.

@SaltyChiang
Copy link
Contributor

@maddyscientist thanks for the fix. You might miss another issue that will happen while saving the smeared gauge through saveGaugeQuda(). I updated the comment above and I repeat it here.

Edit: I believe gauge_param.location should be changed to QUDA_CUDA_FIELD_LOCATION here.

quda/lib/interface_quda.cpp

Lines 765 to 774 in 0c016af

case QUDA_SMEARED_LINKS:
gauge_param.create = QUDA_NULL_FIELD_CREATE;
gauge_param.reconstruct = param->reconstruct;
gauge_param.setPrecision(param->cuda_prec, true);
gauge_param.ghostExchange = QUDA_GHOST_EXCHANGE_PAD;
gauge_param.pad = param->ga_pad;
cudaGauge = new GaugeField(gauge_param);
copyExtendedGauge(*cudaGauge, *gaugeSmeared, QUDA_CUDA_FIELD_LOCATION);
break;
default: errorQuda("Invalid gauge type");

@maddyscientist
Copy link
Member Author

Thanks again @SaltyChiang. Fix is here 44c4100 (in #1393).

maddyscientist added a commit that referenced this pull request Oct 26, 2023
…uge field regardless if is a pointer or array of pointers. Fixes callMultiSrcQuda (broke with #1384).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants