Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simulating cores > 64 will hang at hart 63 of 64+ harts #148

Open
trueif opened this issue Jul 5, 2024 · 4 comments
Open

Simulating cores > 64 will hang at hart 63 of 64+ harts #148

trueif opened this issue Jul 5, 2024 · 4 comments

Comments

@trueif
Copy link

trueif commented Jul 5, 2024

图片

Hi experts,

I am making X tile = 13 and Y tile = 5 (or any X * Y > 64 combinations). The simulation will hang at the 63th hart (from 0th to 63th). The 64th hart will never work. Could you please let me know if it is the limitation / bug of the current RTL code, or there is some settings that I did not configure?

Thanks!

@Jbalkind
Copy link
Collaborator

Jbalkind commented Jul 8, 2024

After 64 tiles there are a couple of blockers:

  1. By default, the L2's share vector (which sets the maximum number of tiles) is set to 64. You will need some changes to work around this. If you look in the second last Metro-MPI commit (https://github.com/metro-mpi/metro-mpi/commits/metro-mpi/ commit 264b365) then you will see some changes to define.h.pyv, l2.h.pyv, and the l2 files to support a share vector size of 1024. Note that this will make the L2 metadata quite large and thus increasing area.
  2. The core ID in the bare metal boot code for simulation passes the tile count via a char, capping the number of tiles at 255. To get past this, there's a fix in syscalls.c and hello_world_token.c on the same commit as above.

@zhb9103
Copy link

zhb9103 commented Sep 22, 2024

Hi @Jbalkind, I think there is an issue in the commit (264b365), the code as below:
image

Could you help to confirm it for me?

Thanks!

@Jbalkind
Copy link
Collaborator

I don't think this is important. My recollection is that SDID is unused when NO_RTL_CSM is set, which is the case for all cores except the OST1. LSID is used regardless of whether CDR/CSM is enabled to track tile IDs and so it needed to be extended to 10. @guillemlp may recall better.

@zhb9103
Copy link

zhb9103 commented Sep 24, 2024

Ok, thank you very much!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants