Chan z prefix sum #2781

amin1377 · 2024-10-18T13:40:21Z

Description

This PR optimizes the calculation of the ChanZ cost factor by precomputing and storing the cumulative number of inter-die connections prior to block placement. During placement, the number of inter-die connections within a bounding box is calculated using the following formula:

if (x_low == 0 && y_low == 0) {
    num_inter_dir_conn = acc_tile_num_inter_die_conn_[x_high][y_high];
} else if (x_low == 0) {
    num_inter_dir_conn = acc_tile_num_inter_die_conn_[x_high][y_high] - 
                         acc_tile_num_inter_die_conn_[x_high][y_low - 1];
} else if (y_low == 0) {
    num_inter_dir_conn = acc_tile_num_inter_die_conn_[x_high][y_high] - 
                         acc_tile_num_inter_die_conn_[x_low - 1][y_high];
} else {
    num_inter_dir_conn = acc_tile_num_inter_die_conn_[x_high][y_high] - 
                         acc_tile_num_inter_die_conn_[x_low - 1][y_high] - 
                         acc_tile_num_inter_die_conn_[x_high][y_low - 1] + 
                         acc_tile_num_inter_die_conn_[x_low - 1][y_low - 1];
}

vaughnbetz · 2024-10-18T15:01:40Z

You should comment how the prefix sum computation works.
We can't afford an O(n^2) memory and O(n^2) compute to load this data structure, as N is the size of the circuit. We shouldn't compute the final matrix, but instead just store the O(n) (2D in x,y) prefix sum. Compute the average z-capacity in a helper function when called.

…rilog-to-routing into chan_z_prefix_sum

…le_num_inter_die_conn_

…ace_cost_fac_

…er_die_conn_

vaughnbetz · 2024-11-01T14:34:25Z

QoR results are at https://www.eecg.utoronto.ca/~vaughn/vaughnwiki/doku.php?id=amin_weekly_update and look good. On both 3D SW and 2D Titan architectures, pack, place and route time and QoR are essentially unchanged, and on a 3D arch total runtime drops by 2%, which makes sense as the loading of the cost structures should be faster (and may be counted outside place time). @amin1377 : if you can paste the QoR data in for posterity that would be good.

vaughnbetz

Some commenting updates, and one functional change to divide by layer-1.

vpr/src/place/net_cost_handler.cpp

vpr/src/place/net_cost_handler.h

vpr/src/place/net_cost_handler.cpp

soheilshahrouz · 2024-11-03T19:58:31Z

vpr/src/place/net_cost_handler.cpp

    const auto& device_ctx = g_vpr_ctx.device();
    const auto& rr_graph = device_ctx.rr_graph;

    const size_t grid_height = device_ctx.grid.height();
    const size_t grid_width = device_ctx.grid.width();


-    chanz_place_cost_fac_ = vtr::NdMatrix<float, 4>({grid_width, grid_height, grid_width, grid_height}, 0.);
+    acc_tile_num_inter_die_conn_ = vtr::NdMatrix<int, 2>({grid_width, grid_height}, 0.); 

    vtr::NdMatrix<float, 2> tile_num_inter_die_conn({grid_width, grid_height}, 0.);                           


does tile_num_inter_die_conn need to store float values?

It's a count, so int is probably better. Float could have some round-off issues with big chips, as this matrix is filled in by adding small numbers to a running count, which can get problematic if the big number ever becomes more than 16 million times or so bigger than the small number (the small number then gets thrown away by round off).

vpr/src/place/net_cost_handler.cpp

soheilshahrouz · 2024-11-03T20:24:38Z

vpr/src/place/net_cost_handler.cpp

+                             acc_tile_num_inter_die_conn_[x_low-1][y_high] - \
+                             acc_tile_num_inter_die_conn_[x_high][y_low-1] + \
+                             acc_tile_num_inter_die_conn_[x_low-1][y_low-1];
+    }


using vtr::NdOffsetMatrix will be consistent with thest of file and will get rid of this if statement.

Yes, that would be cleaner.

@soheilshahrouz, if you remember, my initial attempt was to use this data structure, but I encountered some memory issues with it. I’ll create an issue for this and, in another PR, attempt to replace this data structure with vtr::NdOffsetMatrix.

I think I figured out what the problem was.
Check if 68ecb55 sovles the memory issue.

Sounds good. I'll try it once the PR is merged.

vpr/src/place/net_cost_handler.cpp

vpr/src/place/net_cost_handler.h

vpr/src/place/net_cost_handler.cpp

vaughnbetz · 2024-11-03T21:11:22Z

This is separate from this PR, but it's possible that using the same technique in the x- and y- directions that you're using for the z-direction to calculate the average channel width by adding and subtracting the proper values would be faster for x- and y- too.

It saves memory (would take storage for x and y average channel capacities down from O(device size) to O(sqrt(device size)), which would help the cache and memory bandwidth. If would mean two matrix accesses instead of 1, and an extra reciprocal and pow() call, but it might still be overall just as fast or faster. After this PR is landed, if you have th energy for it you could do a quick experiment to check by recoding the x- and y- dimensions to match the z-dimension approach.

soheilshahrouz · 2024-11-03T23:45:10Z

This is separate from this PR, but it's possible that using the same technique in the x- and y- directions that you're using for the z-direction to calculate the average channel width by adding and subtracting the proper values would be faster for x- and y- too.

@amin1377 If you're busy with other tasks, I am interested in working on this idea for chan_x and chan_y. Let me know if I can take it over.

amin1377 · 2024-11-05T22:01:46Z

This is separate from this PR, but it's possible that using the same technique in the x- and y- directions that you're using for the z-direction to calculate the average channel width by adding and subtracting the proper values would be faster for x- and y- too.

@amin1377 If you're busy with other tasks, I am interested in working on this idea for chan_x and chan_y. Let me know if I can take it over.

How could I possibly say no to such a generous offer, Soheil? Thanks a lot! :) Let me know if I can be of any help.

amin1377 · 2024-11-05T23:43:14Z

QoR:
titan other 3d CB cube bb

titan_quick_qor

…rilog-to-routing into chan_z_prefix_sum

vaughnbetz · 2024-11-06T00:22:11Z

QoR looks good.

…rilog-to-routing into chan_z_prefix_sum

amin1377 · 2024-11-06T22:27:45Z

@vaughnbetz, I’ve addressed both your comments and Soheil’s. After making the changes, I reran the benchmarks to ensure everything is still working correctly, and the tests passed without issues. I think the PR is ready to be merged.

amin1377 added 2 commits October 18, 2024 09:17

[vpr][place] add acc_tile_num_inter_die_conn

400b5a1

[vpr][place] use prefix sum to populate chanz_place_cost_fac_

4018188

github-actions bot added VPR VPR FPGA Placement & Routing Tool lang-cpp C/C++ code labels Oct 18, 2024

amin1377 added 9 commits October 30, 2024 09:18

Merge branch 'master' of https://github.com/verilog-to-routing/vtr-ve…

0432740

…rilog-to-routing into chan_z_prefix_sum

[vpr][place] initialize other entried of acc_tile_num_inter_die_conn

11ee10f

[vpr][place][net_cost] add get_chanz_cost_factor signiture and acc_ti…

8bead96

…le_num_inter_die_conn_

[vpr][place][net_cost] get_chanz_cost_factor impl

bea400e

[vpr][place][net_cost] remove place_cost_exp from functions arguments

01b9cd6

[vpr][place][net_cost] fix num_inter_dir_conn corner cases

af5659d

[vpr][place][net_cost] call get_chanz_cost_factor instead of chanz_pl…

736d826

…ace_cost_fac_

[vpr][place][net_cost] remove chanz_place_cost_fac_ calculation

1a3e56d

[vpr][place][net_cost] recomment on how to calculate acc_tile_num_int…

54fe8d7

…er_die_conn_

vaughnbetz requested changes Nov 1, 2024

View reviewed changes

soheilshahrouz reviewed Nov 3, 2024

View reviewed changes

amin1377 added 2 commits November 5, 2024 17:29

[vpr][place] fix typos + unify edge loops

df4159b

[vpr][place]make get_chanz_cost_factor a private function

21c62d8

soheilshahrouz mentioned this pull request Nov 5, 2024

Chan x/y placement cost factors using prefix sum #2799

Merged

amin1377 added 4 commits November 5, 2024 17:58

[vpr][place] add is_multi_layer_ to net cost handler fields

da92578

[vpr][place] factor out crossing multiplication

052d6b9

[vpr][place][net_cost] use bb

e15a796

[vpr][place][net_cost] fix typos

f2939b1

Merge branch 'master' of https://github.com/verilog-to-routing/vtr-ve…

e213fa1

…rilog-to-routing into chan_z_prefix_sum

amin1377 added 3 commits November 6, 2024 16:03

[ci] update golden

a0b2c06

[ci] update odin golden

6ea8921

Merge branch 'master' of https://github.com/verilog-to-routing/vtr-ve…

596ddaf

…rilog-to-routing into chan_z_prefix_sum

vaughnbetz approved these changes Nov 6, 2024

View reviewed changes

vaughnbetz merged commit e179d88 into master Nov 6, 2024
37 checks passed

vaughnbetz deleted the chan_z_prefix_sum branch November 6, 2024 23:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Chan z prefix sum #2781

Chan z prefix sum #2781

amin1377 commented Oct 18, 2024 •

edited

Loading

vaughnbetz commented Oct 18, 2024

vaughnbetz commented Nov 1, 2024

vaughnbetz left a comment

soheilshahrouz Nov 3, 2024

vaughnbetz Nov 3, 2024

soheilshahrouz Nov 3, 2024

vaughnbetz Nov 3, 2024

amin1377 Nov 5, 2024

soheilshahrouz Nov 5, 2024

amin1377 Nov 5, 2024

vaughnbetz commented Nov 3, 2024

soheilshahrouz commented Nov 3, 2024

amin1377 commented Nov 5, 2024

amin1377 commented Nov 5, 2024

vaughnbetz commented Nov 6, 2024

amin1377 commented Nov 6, 2024

Chan z prefix sum #2781

Chan z prefix sum #2781

Conversation

amin1377 commented Oct 18, 2024 • edited Loading

Description

vaughnbetz commented Oct 18, 2024

vaughnbetz commented Nov 1, 2024

vaughnbetz left a comment

Choose a reason for hiding this comment

soheilshahrouz Nov 3, 2024

Choose a reason for hiding this comment

vaughnbetz Nov 3, 2024

Choose a reason for hiding this comment

soheilshahrouz Nov 3, 2024

Choose a reason for hiding this comment

vaughnbetz Nov 3, 2024

Choose a reason for hiding this comment

amin1377 Nov 5, 2024

Choose a reason for hiding this comment

soheilshahrouz Nov 5, 2024

Choose a reason for hiding this comment

amin1377 Nov 5, 2024

Choose a reason for hiding this comment

vaughnbetz commented Nov 3, 2024

soheilshahrouz commented Nov 3, 2024

amin1377 commented Nov 5, 2024

amin1377 commented Nov 5, 2024

vaughnbetz commented Nov 6, 2024

amin1377 commented Nov 6, 2024

amin1377 commented Oct 18, 2024 •

edited

Loading