Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DualView approach for runtime dispatch decision #1

Open
crtrott opened this issue Sep 3, 2024 · 2 comments
Open

DualView approach for runtime dispatch decision #1

crtrott opened this issue Sep 3, 2024 · 2 comments

Comments

@crtrott
Copy link

crtrott commented Sep 3, 2024

Since Peter brought that up in the slack here is a sketch of what I came up with using DualView as the fundamental data management approach:

// Generic helpers:
template<class Lambda>
void dynamic_parallel_for(std::string label, bool run_on_device, size_t N, Lambda lambda) {
  if(run_on_device) {
     parallel_for(label, RangePolicy<DefaultExecutionSpace>(0,N), lamdba);
  } else {
     parallel_for(label, RangePolicy<DefaultHostExecutionSpace>(0,N), lamdba);
  } 
}

template<class DV>
auto choose_side_read(bool device_side, DV a) {
  View<DV::const_data_type, DV::layout, AnonymousSpace, DV::memory_traits> tmp;
  if(device_side) {
    a.sync_device();
    tmp = a.d_view;
  } else {
    a.sync_host();
    tmp = a.h_view;
  }
  return tmp;
}

template<class DV>
auto choose_side_modify(bool device_side, DV a) {
  View<DV::data_type, DV::layout, AnonymousSpace, DV::memory_traits> tmp;
  if(device_side) {
    a.sync_device();
    a.modify_device();
    tmp = a.d_view;
  } else {
    a.sync_host();
    a.modify_host();
    tmp = a.h_view;
  }
  return tmp;
}

// User code
void foo(DualView<const double*> a_in, DualView<double*> b_in) {
  bool run_on_device = !a.need_sync_device();
  auto a = choose_side_read(run_on_device, a_in);
  auto b = choose_side_modify(run_on_device, b_in);
  dynamic_parallel_for("KernelName", run_on_device, a_tmp.extent(0), KOKKOS_LAMBDA(int i) {
    b(i) += a(i);
  });
}
@jbigot
Copy link
Member

jbigot commented Sep 4, 2024

We played with that, but as a result, the lambda is a template on the type of the view it captures, and IIRC, @pzehner had an issue with that & Cuda.

@pzehner
Copy link
Member

pzehner commented Sep 4, 2024

Since Peter brought that up in the slack

I guess you meant Paul...

The approach you came up with is very similar with the "layer" approach we tried (@jbigot not the one using the templated lambdas). It works well, but requires to implement an extra layer for parallel_* (which means to maintain this interface), and to recreate the various execution policies (which means more stuff to maintain).

The choose_side_read/_write strategy you proposed is interesting.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants