Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate header-only mpi c++ wrapper #4833

Open
RudolfWeeber opened this issue Dec 1, 2023 · 4 comments
Open

Investigate header-only mpi c++ wrapper #4833

RudolfWeeber opened this issue Dec 1, 2023 · 4 comments

Comments

@RudolfWeeber
Copy link
Contributor

https://github.com/acdemiralp/mpi

This would allow us to get rid of the boost::mpi dependency, which is responsible for a lot of the build hesdaches on hpc systems, conda, pip and friends.

We would still need boost::serialization to send around stuff that is not store contiguously in memory etc.

@RiccardoFrenner
Copy link
Contributor

RiccardoFrenner commented Aug 13, 2024

acdemiralp/MPI

Does not compile with OpenMPI 5.0 due to the following error:

error: non-type template argument is not a constant expression
template <> struct data_type_traits<MPI_CHAR                   > { using type = char                     ; };
                                    ^~~~~~~~
/opt/homebrew/Cellar/open-mpi/5.0.3_1/include/mpi.h:1242:18: note: expanded from macro 'MPI_CHAR'
#define MPI_CHAR OMPI_PREDEFINED_GLOBAL(MPI_Datatype, ompi_mpi_char)
                 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/opt/homebrew/Cellar/open-mpi/5.0.3_1/include/mpi.h:423:46: note: expanded from macro 'OMPI_PREDEFINED_GLOBAL'
#define OMPI_PREDEFINED_GLOBAL(type, global) (static_cast<type> (static_cast<void *> (&(global))))

I have not investigated whether it compiles with earlier versions of OMPI.
The project was last updated 10 months ago and there is no reaction to this issue since 5 months.

Dependencies

  • pfr (header-only boost reflection library)

Other Promising Options

  1. kaMPIng
  2. kokkos-comm
  3. mpl

kaMPIng

Dependencies

  • cereal (header-only c++11 lib for serialization)
  • pfr (header-only boost reflection library)
  • kassert (optional)

Simple example to serialize and send non-trivial type:

#include <cereal/types/vector.hpp>
#include "kamping/communicator.hpp"
#include "kamping/p2p/recv.hpp"
#include "kamping/p2p/send.hpp"
#include "kamping/serialization.hpp"

using namespace kamping;

struct Foo {
  double x;
  std::vector<int> v;
  template <class Archive> void serialize(Archive &ar) { ar(x, v); }
};

int main() {
  kamping::Environment env;
  auto const &comm = kamping::comm_world();

  Foo data = {3.14, {1, 2, 3}};
  if (comm.is_root()) {
    for (size_t dst = 0; dst < comm.size(); dst++) {
      if (comm.is_root(dst)) {
        continue;
      }
      comm.send(send_buf(as_serialized(data)), destination(dst));
    }
  } else {
    auto const recv_data =
        comm.recv(recv_buf(as_deserializable<Foo>()));
    std::cout << "Node " << comm.rank() << " received " << recv_data.x << " "
              << recv_data.v.size() << std::endl;
  }
}

Cereal supports the boost serialization syntax and therefore there should be no changes needed in the serialization functions. Therefore this attempt to send espresso particles around should just work when exclusions are disabled, however, it does not.

Attempt to send espresso Particle:

#include "kamping/collectives/allgather.hpp"
#include "kamping/communicator.hpp"
#include "kamping/environment.hpp"
#include "kamping/named_parameters.hpp"
#include "kamping/p2p/send.hpp"
#include <kamping/p2p/recv.hpp>
#include <mpi.h>

#include <array>
#include <iostream>

#include "Particle.hpp"

std::string particle_to_string(Particle const &p) {
  auto s = "Particle(pos=()" + std::to_string(p.pos()[0]) + "," +
           std::to_string(p.pos()[1]) + "," + std::to_string(p.pos()[2]) +
           "), force=(" + std::to_string(p.force()[0]) + "," +
           std::to_string(p.force()[1]) + "," + std::to_string(p.force()[2]) +
           "), exclusions=(";
#ifdef EXCLUSIONS
  for (int i = 0; i < p.exclusions().size(); i++) {
    s += std::to_string(p.exclusions().at(i)) + ",";
  }
#endif
  s += ")";
  return s;
}

int main() {
  using namespace kamping;
  kamping::Environment env;
  auto const &comm = kamping::comm_world();

  auto p_send = Particle();
  auto p_recv = Particle();
  p_send.pos()[0] = comm.rank() * 10 - 0;
  p_send.pos()[1] = comm.rank() * 10 - 1;
  p_send.pos()[2] = comm.rank() * 10 - 2;
  p_send.force()[0] = comm.rank() * 100 - 0;
  p_send.force()[1] = comm.rank() * 100 - 1;
  p_send.force()[2] = comm.rank() * 100 - 2;
#ifdef EXCLUSIONS
  p_send.exclusions().push_back(comm.ranke() * 100 + 9);
  p_send.exclusions().push_back(comm.ranke() * 100 + 8);
  p_send.exclusions().push_back(comm.ranke() * 100 + 7);
  p_send.exclusions().push_back(comm.ranke() * 100 + 6);
#endif

  auto const own_id = comm.rank();
  auto const send_id = (own_id + 1) % 4;
  auto const recv_id = (own_id - 1) % 4;
  comm.send(destination(send_id), send_buf(kamping::as_serialized(p_send)));
  comm.recv(recv_buf(kamping::as_deserializable<Particle>(p_recv)),
            source(recv_id));
  std::cout << "Rank " << own_id << " sends " << particle_to_string(p_send)
            << " to " << send_id << " and receives "
            << particle_to_string(p_recv) << " from " << recv_id << "\n";
}
This is the error message:
  /tmp/header_mpi/espressotests/build/_deps/cereal-src/include/cereal/cereal.hpp:570:9: error: static assertion failed due to requirement 'traits::detail::count_output_serializers<Particle, cereal::BinaryOutputArchive>::value != 0': cereal could not find any output serialization functions for the provided type and archive combination. 
  
   Types must either have a serialize function, load/save pair, or load_minimal/save_minimal pair (you may not mix these). 
   Serialize functions generally have the following signature: 
  
   template<class Archive> 
     void serialize(Archive & ar) 
     { 
       ar( member1, member2, member3 ); 
     } 
  
   
          static_assert(traits::detail::count_output_serializers<T, ArchiveType>::value != 0,
          ^             ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  /tmp/header_mpi/espressotests/build/_deps/cereal-src/include/cereal/cereal.hpp:444:15: note: in instantiation of function template specialization 'cereal::OutputArchive<cereal::BinaryOutputArchive, 1>::processImpl<Particle, (cereal::traits::detail::sfinae)0>' requested here
          self->processImpl( head );
                ^
  /tmp/header_mpi/espressotests/build/_deps/cereal-src/include/cereal/cereal.hpp:333:15: note: in instantiation of function template specialization 'cereal::OutputArchive<cereal::BinaryOutputArchive, 1>::process<Particle &>' requested here
          self->process( std::forward<Types>( args )... );
                ^
  /tmp/header_mpi/espressotests/libs/kamping/include/kamping/serialization.hpp:56:13: note: in instantiation of function template specialization 'cereal::OutputArchive<cereal::BinaryOutputArchive, 1>::operator()<Particle &>' requested here
              archive(_object.underlying());
              ^
  /tmp/header_mpi/espressotests/libs/kamping/include/kamping/p2p/send.hpp:82:31: note: in instantiation of member function 'kamping::internal::SerializationBuffer<cereal::BinaryOutputArchive, cereal::BinaryInputArchive, std::allocator<char>, kamping::internal::GenericDataBuffer<Particle, kamping::internal::ParameterType, kamping::internal::ParameterType::send_recv_buf, kamping::internal::BufferModifiability::modifiable, kamping::internal::BufferOwnership::referencing, kamping::internal::BufferType::in_out_buffer>>::serialize' requested here
          send_buf.underlying().serialize();
                                ^
  /tmp/header_mpi/espressotests/main.cpp:52:8: note: in instantiation of function template specialization 'kamping::Communicator<std::vector>::send<kamping::internal::RankDataBuffer<kamping::internal::RankType::value, kamping::internal::ParameterType::destination>, kamping::internal::DataBufferBuilder<kamping::internal::SerializationBuffer<cereal::BinaryOutputArchive, cereal::BinaryInputArchive, std::allocator<char>, kamping::internal::GenericDataBuffer<Particle, kamping::internal::ParameterType, kamping::internal::ParameterType::send_recv_buf, kamping::internal::BufferModifiability::modifiable, kamping::internal::BufferOwnership::referencing, kamping::internal::BufferType::in_out_buffer>>, kamping::internal::ParameterType::send_buf, kamping::internal::BufferModifiability::modifiable, kamping::internal::BufferType::in_buffer, kamping::BufferResizePolicy::no_resize>>' requested here
    comm.send(destination(send_id), send_buf(kamping::as_serialized(p_send)));
         ^
  /tmp/header_mpi/espressotests/build/_deps/cereal-src/include/cereal/cereal.hpp:570:87: note: expression evaluates to '0 != 0'
          static_assert(traits::detail::count_output_serializers<T, ArchiveType>::value != 0,
                        ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~
  In file included from /tmp/header_mpi/espressotests/main.cpp:1:
  In file included from /tmp/header_mpi/espressotests/libs/kamping/include/kamping/collectives/allgather.hpp:23:
  In file included from /tmp/header_mpi/espressotests/libs/kamping/include/kamping/collectives/collectives_helpers.hpp:17:
  In file included from /tmp/header_mpi/espressotests/libs/kamping/include/kamping/named_parameter_check.hpp:24:
  /tmp/header_mpi/espressotests/libs/kamping/include/kamping/serialization.hpp:70:23: error: no matching constructor for initialization of 'Particle'
              InArchive archive(buffer);
                        ^       ~~~~~~
  /tmp/header_mpi/espressotests/libs/kamping/include/kamping/serialization.hpp:124:28: note: in instantiation of member function 'kamping::internal::SerializationBuffer<void, Particle, std::allocator<char>, kamping::internal::GenericDataBuffer<Particle, kamping::internal::ParameterType, kamping::internal::ParameterType::recv_buf, kamping::internal::BufferModifiability::modifiable, kamping::internal::BufferOwnership::referencing, kamping::internal::BufferType::out_buffer>>::deserialize' requested here
          serialization_data.deserialize();
                             ^
  /tmp/header_mpi/espressotests/Particle.hpp:443:8: note: candidate constructor (the implicit copy constructor) not viable: no known conversion from 'std::istringstream' (aka 'basic_istringstream<char>') to 'const Particle' for 1st argument
  struct Particle { // NOLINT(bugprone-exception-escape)
         ^
  /tmp/header_mpi/espressotests/Particle.hpp:443:8: note: candidate constructor (the implicit move constructor) not viable: no known conversion from 'std::istringstream' (aka 'basic_istringstream<char>') to 'Particle' for 1st argument
  /tmp/header_mpi/espressotests/Particle.hpp:443:8: note: candidate constructor (the implicit default constructor) not viable: requires 0 arguments, but 1 was provided
  In file included from /tmp/header_mpi/espressotests/main.cpp:1:
  In file included from /tmp/header_mpi/espressotests/libs/kamping/include/kamping/collectives/allgather.hpp:23:
  In file included from /tmp/header_mpi/espressotests/libs/kamping/include/kamping/collectives/collectives_helpers.hpp:17:
  In file included from /tmp/header_mpi/espressotests/libs/kamping/include/kamping/named_parameter_check.hpp:24:
  /tmp/header_mpi/espressotests/libs/kamping/include/kamping/serialization.hpp:71:13: error: type 'Particle' does not provide a call operator
              archive(_object.underlying());
              ^~~~~~~
  3 errors generated.

TODOs

  • Before trying to figure out how to include the more complicated data types (e.g. exclusions inside Particle (uses compact_vector type)), I have to fix this problem first.
  • Figure out what else has to be changed in the code when this library will be used.
  • @jngrad mentioned something about memcpy being a problem (currently?)

@jngrad
Copy link
Member

jngrad commented Aug 14, 2024

Does not compile with OpenMPI 5.0 due to the following error:

error: non-type template argument is not a constant expression
template <> struct data_type_traits<MPI_CHAR                   > { using type = char                     ; };
                                    ^~~~~~~~
/opt/homebrew/Cellar/open-mpi/5.0.3_1/include/mpi.h:1242:18: note: expanded from macro 'MPI_CHAR'
#define MPI_CHAR OMPI_PREDEFINED_GLOBAL(MPI_Datatype, ompi_mpi_char)
                 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/opt/homebrew/Cellar/open-mpi/5.0.3_1/include/mpi.h:423:46: note: expanded from macro 'OMPI_PREDEFINED_GLOBAL'
#define OMPI_PREDEFINED_GLOBAL(type, global) (static_cast<type> (static_cast<void *> (&(global))))

Don't know if this is going to help, but I experienced the exact same error message in OpenMPI 4.1.2 recently. This was due to OpenMPI re-defining all MPI_Datatype values like MPI_CHAR, MPI_INT and friends to constants of the same name but with a different variable type. As a result, all template specialization that expected a MPI_Datatype complained that my MPI_INT was not of the correct type. You may very well be experiencing the same issue I had, and the solution was to extract the "real" MPI_CHAR from the type map via boost::mpi::get_mpi_datatype(), which annoyingly requires the dereferencing of a pointer of the relevant type, so it cannot be a nullptr:

template <typename T>
static void fft_sendrecv(T const *const sendbuf, int scount, int dest,
T *const recvbuf, int rcount, int source,
boost::mpi::communicator const &comm, int tag) {
auto const type = boost::mpi::get_mpi_datatype<T>(*sendbuf);
MPI_Sendrecv(reinterpret_cast<void const *>(sendbuf), scount, type, dest, tag,
reinterpret_cast<void *>(recvbuf), rcount, type, source, tag,
comm, MPI_STATUS_IGNORE);
}

Regarding the particle struct, honestly I don't see how an error message like expression evaluates to '0 != 0' is going to help developers diagnose the issue 😅 Maybe it's the compiler fault, and temporarily switching to another compiler would help. Anyway, we currently use Boost::mpi macros to mark Particle substructs as directly serializable, i.e. we serialize data members without metadata (e.g. class name, serialization version). This makes the communication non-portable, which is fine since we don't communicate the packed structs to external libraries, where portability would matter. We also mark them as bitwise-serializable, which allows Boost::mpi to completely skip the serialize() method and directly std::memcpy the object into the communication buffer. Interestingly, the compiler error also suggest that the serialize() methods is expecting a different syntax from the one used by Boost::serialization. Do you know if it makes a difference?

@RiccardoFrenner
Copy link
Contributor

Regarding the acdemiralp/MPI library:

  • I should have been more specific. This is an error on their side, so I would have to change their code, I think. They use the types (MPI_CHAR, ...) for template specialisations that expect MPI_Datatype. But MPI_CHAR is not a constant expression because it is defined by a macro that casts it to a void* and then to MPI_Datatype if I remember correctly. The compiler explicitly said that this is not a constant expression because of the void* cast.
template <MPI_Datatype data_type>
struct data_type_traits {};

template <> struct data_type_traits<MPI_CHAR                   > { using type = char                     ; };
  • This is the libraries template specialisation code. Maybe it would be enough to just change template <MPI_Datatype data_type> to template <ompi_predefined_datatype_t* data_type> which seems to be the real type of MPI_CHAR and co but I am not sure about it.
  • I thus decided to look for another library in agreement with Rudolf.

we serialize data members without metadata

As far as I can tell from the documentation, Cereal does this by default as well. However, bitwise serialization might still be better than calling the serialize method. I'll check if this is also easily possible with Cereal, but I'll do that later because the main idea was to see if it's possible to easily send non-trivial data, which is why we chose the Particle class.

Interestingly, the compiler error also suggest that the serialize() methods is expecting a different syntax from the one used by Boost::serialization.

The documentation says boost syntax is also supported, which I confirmed with a simple struct.

Regarding the particle struct, honestly I don't see how an error message like expression evaluates to '0 != 0' is going to help developers diagnose the issue

Yes, this is what I thought too. My only idea right now is to simplify the Particle class iteratively and compare with the working example to see what the difference and problem is.

@jngrad
Copy link
Member

jngrad commented Aug 15, 2024

Thank you for the work you did and for the detailed explanations!

They use the types (MPI_CHAR, ...) for template specialisations that expect MPI_Datatype. But MPI_CHAR is not a constant expression because it is defined by a macro that casts it to a void* and then to MPI_Datatype if I remember correctly. The compiler explicitly said that this is not a constant expression because of the void* cast.

I can confirm it's exactly the same error I encountered. I don't remember if I ever explored ompi_predefined_datatype_t*. I went directly for boost::mpi::get_mpi_datatype(), because I was more familiar with it and knew it "generates" the MPI_CHAR and other data types, i.e. the first time you call get_mpi_datatype() for a type it will take time to build stuff, and the next times it will be almost free thanks to the cache.

Interestingly, the compiler error also suggest that the serialize() methods is expecting a different syntax from the one used by Boost::serialization.

The documentation says boost syntax is also supported, which I confirmed with a simple struct.

Perfect.

Regarding the particle struct, honestly I don't see how an error message like expression evaluates to '0 != 0' is going to help developers diagnose the issue

Yes, this is what I thought too. My only idea right now is to simplify the Particle class iteratively and compare with the working example to see what the difference and problem is.

I guess that's the right move. Deciphering cryptic compiler diagnostics is never fun.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants