Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[bug] lp.add_prefetch adds a prefetch instruction but does not replace accesses with prefetched variable #670

Open
kaushikcfd opened this issue Aug 18, 2022 · 0 comments

Comments

@kaushikcfd
Copy link
Collaborator

kaushikcfd commented Aug 18, 2022

Here's a reproducer

import loopy as lp


t_unit = lp.make_kernel(
    [
    "[N_e] -> { [r, x, i_tile, i_inner_outer, i_inner_inner, i_prcmpt, r_prcmpt, e_outer, e_inner, J_dim_0, J_dim_1, j_tile, j_inner] : (i_prcmpt) mod 4 = 0 and 0 <= r <= 2 and 0 <= x <= 2 and i_tile >= 0 and i_inner_inner >= 0 and -4i_inner_outer <= i_inner_inner <= 34 - 9i_tile - 4i_inner_outer and i_inner_inner <= 8 - 4i_inner_outer and i_inner_inner <= 3 and -i_inner_inner <= i_prcmpt <= 34 - 9i_tile - i_inner_inner and i_prcmpt <= 8 - i_inner_inner and 0 <= r_prcmpt <= 2 and e_inner >= 0 and -14e_outer <= e_inner <= 13 and e_inner < N_e - 14e_outer and 0 <= J_dim_0 <= 2 and 0 <= J_dim_1 <= 2 and j_inner >= 0 and -35j_tile <= j_inner <= 34 - 35j_tile and j_inner <= 34 }",
    ],
    '''
    <> J_fetch[J_dim_0, J_dim_1] = J[J_dim_0, J_dim_1, e_inner + 14*e_outer] {id=J_fetch_rule}
    <> subst_0[i_prcmpt, r_prcmpt] = reduce(sum, [j_tile, j_inner], D[r_prcmpt, i_prcmpt + 9*i_tile + i_inner_inner, j_inner + j_tile*35]*u[e_inner + e_outer*14, j_inner + j_tile*35]) {id=prcmpt_j_redn}
    _fe_out[x, e_inner + e_outer*14, i_tile*9 + i_inner_inner + i_inner_outer*4] = reduce(sum, [r], subst_0[4*i_inner_outer, r]*J_fetch[x, r]) {id=insn, dep=J_fetch_rule:prcmpt_j_redn}
    ''', lang_version=(2018, 2))

t_unit = lp.add_prefetch(t_unit, "D", ["i_inner_inner", "r_prcmpt",
                                       "i_prcmpt", "j_inner"],
                         fetch_outer_inames=frozenset(["e_outer",
                                                       "i_tile",
                                                       "j_tile"]),
                         temporary_address_space=lp.AddressSpace.LOCAL,
                         default_tag=None)
print(t_unit)

That generates the kernel:

---------------------------------------------------------------------------
INSTRUCTIONS:
   for e_outer, i_tile, D_dim_1, j_tile, D_dim_0, D_dim_2
               D_fetch[D_dim_0, D_dim_1, D_dim_2] = D[D_dim_0, D_dim_1 + 9*i_tile, D_dim_2]  {id=D_fetch_rule}
     end i_tile, D_dim_1, j_tile, D_dim_0, D_dim_2
     for e_inner, J_dim_1, J_dim_0J_fetch[J_dim_0, J_dim_1] = J[J_dim_0, J_dim_1, e_inner + 14*e_outer]  {id=J_fetch_rule}
│      end J_dim_1, J_dim_0for i_prcmpt, r_prcmpt, i_tile, i_inner_inner
│↱             subst_0[i_prcmpt, r_prcmpt] = reduce(sum, [j_tile, j_inner], D[r_prcmpt, i_prcmpt + 9*i_tile + i_inner_inner, j_inner + j_tile*35]*u[e_inner + e_outer*14, j_inner + j_tile*35])  {id=prcmpt_j_redn}
││         end i_prcmpt, r_prcmpt
││         for x, i_inner_outer
└└             _fe_out[x, e_inner + e_outer*14, i_tile*9 + i_inner_inner + i_inner_outer*4] = reduce(sum, [r], subst_0[4*i_inner_outer, r]*J_fetch[x, r])  {id=insn}
   end e_inner, e_outer, i_tile, i_inner_inner, x, i_inner_outer
---------------------------------------------------------------------------

Notice how the transformed kernel assigns D_fetch without using it anywhere.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant