Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

crash if using cuda executor for reorder #1514

Open
uboats opened this issue Dec 18, 2023 · 2 comments
Open

crash if using cuda executor for reorder #1514

uboats opened this issue Dec 18, 2023 · 2 comments

Comments

@uboats
Copy link

uboats commented Dec 18, 2023

I had a crash in when choosing cuda as executor. The matrix csr format and rhs/sol vector are from std::vector

It crashed at "reordering = gko::experimental::reorder::Amd::build().on(g_exec)->generate(matrix);"
It also crashed if "reordering = gko::experimental::reorder::Amd::build().on(g_exec)->generate(matrix->transpose());"

============================================
Here's example code:
// values, col_idx, and row_ptrs are std::vectors
const auto g_exec = exec_map.at("cuda")(); // throws if not valid

// executor used by the application

// initialize matrix and vectors
auto matrix_host = mtx::create(g_exec->get_master(), gko::dim<2>(m_size),
		                  val_array::view(g_exec->get_master(), m_nnz, values.data()),
                          idx_array::view(g_exec->get_master(), m_nnz, col_idxs.data()),
                          idx_array::view(g_exec->get_master(), m_size + 1, row_ptrs.data()));
auto matrix = gko::share(gko::clone(g_exec, matrix_host));

auto b_host = vec::create(g_exec->get_master(), gko::dim<2>(m_size, 1));

auto x_host = vec::create(g_exec->get_master(), gko::dim<2>(m_size, 1));

for (IndexType ii = 0; ii < m_size; ii++) {
    b_host->at(ii, 0) = res[ii];
    x_host->at(ii, 0) = u[ii];
}

auto b = gko::clone(g_exec, b_host);
auto x = gko::clone(g_exec, x_host);

std::shared_ptr<gko::matrix::Permutation<IndexType>> reordering;
reordering = gko::experimental::reorder::Amd<IndexType>::build().on(g_exec)->generate(matrix); //

std::shared_ptr<gko::matrix::Csr<ValueType, IndexType>> mat_reorder;
std::unique_ptr<gko::matrix::Dense<>, std::default_delete<gko::matrix::Dense<> > > b_reorder, x_reorder;

{
    mat_reorder = gko::share(matrix->permute(reordering));
    b_reorder   = b->permute(reordering, gko::matrix::permute_mode::rows);
    x_reorder   = x->permute(reordering, gko::matrix::permute_mode::rows);
}

std::shared_ptr<gko::LinOpFactory> par_ilu_fact;
par_ilu_fact = gko::factorization::ParIlut<ValueType, IndexType>::build()
        .with_fill_in_limit(limit)
        .on(g_exec);

auto par_ilu = gko::share(par_ilu_fact->generate(clone(g_exec, mat_reorder)));

auto ilu_pre_factory =
    gko::preconditioner::Ilu<gko::solver::LowerTrs<ValueType, IndexType>,
                             gko::solver::UpperTrs<ValueType, IndexType>,
                             false>::build()
        .on(g_exec);

// Use incomplete factors to generate ILU preconditioner
auto ilu_preconditioner = gko::share(ilu_pre_factory->generate(par_ilu));
........

============================================
Below is the stack:

[0] from 0x000000000de0e177 in gko::experimental::reorder::suitesparse_wrapper::amd_2(int, int*, int*, int*, int, int, int*, int*, int*, int*, int*, int*, int*, double*, double*)
[1] from 0x000000000dc9a390 in gko::experimental::reorder::suitesparse_wrapper::amd_reorder(int, int*, int*, int*, int, int*, int*, int*, int*, int*, int*, int*)
[2] from 0x000000000dca17a9 in gko::detail::RegisteredOperation<gko::experimental::reorder::suitesparse_wrapper::make_amd_reorder<int, int*, int*, int*, int, int* const&, int* const&, int* const&, int* const&, int* const&, int* const&, int* const&>(int&&, int*&&, int*&&, int*&&, int&&, int* const&, int* const&, int* const&, int* const&, int* const&, int* const&, int* const&)::{lambda(auto:1)#1}>::run(std::shared_ptr<gko::CudaExecutor const>) const
[3] from 0x000000000e8f29f0 in gko::detail::ExecutorBasegko::CudaExecutor::run(gko::Operation const&) const
[4] from 0x000000000dca41dd in gko::experimental::reorder::Amd::generate_impl(std::shared_ptr<gko::LinOp const>) const
[5] from 0x0000000002b0ac1c in gko::AbstractFactory<gko::LinOp, std::shared_ptr<gko::LinOp const> >::generate<std::shared_ptr<gko::LinOp const>&>(std::shared_ptr<gko::LinOp const>&) const+102 at /tools/ginkgo/ginkgo//include/ginkgo/core/base/abstract_factory.hpp:69
[6] from 0x0000000002b06551 in gko::LinOpFactory::generate(std::shared_ptr<gko::LinOp const>) const+203 at /tools/ginkgo/ginkgo//include/ginkgo/core/base/lin_op.hpp:397
[7] from 0x000000000dca3498 in gko::experimental::reorder::Amd::generate(std::shared_ptr<gko::LinOp const>) const

@MarcelKoch
Copy link
Member

As part of our testing, we also test AMD with the cuda executor. You can check these test for yourself by calling

cmake --build . -t test_reorder_amd_cuda
ctest -R "amd_cuda"

in your build directory with the cmake option GINKGO_BUILD_TESTS=ON. Please check if these tests already fail for you.

@yhmtsai
Copy link
Member

yhmtsai commented Sep 17, 2024

Hi @uboats
Could you check whether the test mentioned by @MarcelKoch is passed or not?
If you can provide the matrix, it will help the debug process.
you can use the following code to write the matrix to matrix market format

{
std::ofstream output_file(filename);
gko::write(output_file, matrix_host);
}

it contains an additional scope {} to ensure that the file is completed in disk before crash

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants