-
Notifications
You must be signed in to change notification settings - Fork 393
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Chan z prefix sum #2781
Chan z prefix sum #2781
Conversation
|
…rilog-to-routing into chan_z_prefix_sum
…le_num_inter_die_conn_
QoR results are at https://www.eecg.utoronto.ca/~vaughn/vaughnwiki/doku.php?id=amin_weekly_update and look good. On both 3D SW and 2D Titan architectures, pack, place and route time and QoR are essentially unchanged, and on a 3D arch total runtime drops by 2%, which makes sense as the loading of the cost structures should be faster (and may be counted outside place time). @amin1377 : if you can paste the QoR data in for posterity that would be good. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some commenting updates, and one functional change to divide by layer-1.
vpr/src/place/net_cost_handler.cpp
Outdated
const auto& device_ctx = g_vpr_ctx.device(); | ||
const auto& rr_graph = device_ctx.rr_graph; | ||
|
||
const size_t grid_height = device_ctx.grid.height(); | ||
const size_t grid_width = device_ctx.grid.width(); | ||
|
||
|
||
chanz_place_cost_fac_ = vtr::NdMatrix<float, 4>({grid_width, grid_height, grid_width, grid_height}, 0.); | ||
acc_tile_num_inter_die_conn_ = vtr::NdMatrix<int, 2>({grid_width, grid_height}, 0.); | ||
|
||
vtr::NdMatrix<float, 2> tile_num_inter_die_conn({grid_width, grid_height}, 0.); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does tile_num_inter_die_conn
need to store float values?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a count, so int is probably better. Float could have some round-off issues with big chips, as this matrix is filled in by adding small numbers to a running count, which can get problematic if the big number ever becomes more than 16 million times or so bigger than the small number (the small number then gets thrown away by round off).
acc_tile_num_inter_die_conn_[x_low-1][y_high] - \ | ||
acc_tile_num_inter_die_conn_[x_high][y_low-1] + \ | ||
acc_tile_num_inter_die_conn_[x_low-1][y_low-1]; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
using vtr::NdOffsetMatrix
will be consistent with thest of file and will get rid of this if statement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that would be cleaner.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@soheilshahrouz, if you remember, my initial attempt was to use this data structure, but I encountered some memory issues with it. I’ll create an issue for this and, in another PR, attempt to replace this data structure with vtr::NdOffsetMatrix
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I figured out what the problem was.
Check if 68ecb55 sovles the memory issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good. I'll try it once the PR is merged.
This is separate from this PR, but it's possible that using the same technique in the x- and y- directions that you're using for the z-direction to calculate the average channel width by adding and subtracting the proper values would be faster for x- and y- too. It saves memory (would take storage for x and y average channel capacities down from O(device size) to O(sqrt(device size)), which would help the cache and memory bandwidth. If would mean two matrix accesses instead of 1, and an extra reciprocal and pow() call, but it might still be overall just as fast or faster. After this PR is landed, if you have th energy for it you could do a quick experiment to check by recoding the x- and y- dimensions to match the z-dimension approach. |
@amin1377 If you're busy with other tasks, I am interested in working on this idea for chan_x and chan_y. Let me know if I can take it over. |
How could I possibly say no to such a generous offer, Soheil? Thanks a lot! :) Let me know if I can be of any help. |
…rilog-to-routing into chan_z_prefix_sum
QoR looks good. |
@vaughnbetz, I’ve addressed both your comments and Soheil’s. After making the changes, I reran the benchmarks to ensure everything is still working correctly, and the tests passed without issues. I think the PR is ready to be merged. |
Description
This PR optimizes the calculation of the
ChanZ
cost factor by precomputing and storing the cumulative number of inter-die connections prior to block placement. During placement, the number of inter-die connections within a bounding box is calculated using the following formula: