-
Notifications
You must be signed in to change notification settings - Fork 393
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wire lookahead runtime scales poorly with number of switch/segment types #2811
Comments
The size of map router lookahead grows lineaerly with the number of segment types. vtr-verilog-to-routing/vpr/src/route/router_lookahead_map.h Lines 43 to 53 in 66f35d9
Router lookahead computation involves iterating over all segment types, sampling nodes for each type, and building cost maps. This likely explains why computation time increases as the number of segment types grows. vtr-verilog-to-routing/vpr/src/route/router_lookahead_map.cpp Lines 522 to 554 in 66f35d9
Two potential solutions are:
With the first solution, the memory usage of the router lookahead still increases linearly with the number of segment types. However, if each segment type is confined to a limited area of the device, this approach reduces reliance on the translation-invariance assumption in the router lookahead. |
If the key reason for having many different segment types (hundreds to thousands) is to have different delays on them, the most direct solution would be to allow the RRgraph to store a different delay (or delay offset) per wire (not wire type). We would hide the details behind a rr_node.delay() or some such method. This would allow you to use a small number of truly distinct segment types, each of which gives the average delay of a wire of that type (this makes things backward-compatible: if you don't need per wire delays, just don't specify them as the average == the delay of node(i)). Then we'd need to make some method(s) to load those delay offsets per rr_node. They could be stored in a vector from [0..rr_node.size()-1] so only those who want them pay the memory for them. One method could load every delay for every rr_node. Another might load them from a smaller file that gives a list of segment type / track number and delay offsets. Do you need R and C to be different for each of these wire types, or just the delay? If we need more per wire data, we might want to have rr_node.phys_info_id instead of delay_offset, but then use a similar approach to load all the per wire unique data, with the segment level data again being defined to be the average. |
@vaughnbetz a couple of follow-on notes:
|
Upvote for the feature where each MUX has individual RC values. Prefer to have Tdel which offers an easy way to annotate. Using RC modeling will cause extra effort and increase difficulty level when performing timing extraction from layouts. |
During testing of the delay modeling approach I am developing (related to #2664, although I am not using that feature just yet to help stage the debugging process), my testing flow has matured to the point where I can comment more definitively on the impact of adding a large number of switch and segment types on router lookahead runtime. This was flagged as a possible concern by @rs-dhow in previous discussions.
It appears that when the router lookahead algorithm has a large number of switch/segment types to manage (I am generating segments and switches in matched pairs, so the number of segment types and switch types in my architecture models are equal), the lookahead time does indeed increase. I'm putting together some slides and additional experiments to describe the results in more detail.
Expected Behaviour
For a small architecture (1500 LUTs), I would expect little to no increase in router lookahead runtime if I increase the segment count by a significant amount.
Current Behaviour
For a small architecture (~1500 LUTs), I observe that when the switch list and segment lists counts increase from 2 to ~200, the router lookahead runtime increases from ~6 seconds to ~50 seconds and dominates total router runtime. I expect to be able to show further degradation with larger architectures; experiments to quantify such a trend are in flight.
Possible Solution
I have not looked into possible solutions yet, but this is central enough to my IC development needs that I could invest time into developing a PR to resolve the issue. A collaboration with more seasoned VPR developers is likely the most effective approach, even if I do most of the heavy lifting.
Steps to Reproduce
Context
The goal of the delay model I am attempting to generate is to harvest ASIC EDA software output for use in delay annotation of the components of an eFPGA that will be productized. The delay information is unique on a per tile-type, per routing channel basis, so the number of unique delay parameters is O(1K) to O(10K). For accurate delay modeling, VPR's algorithms thus need to scale well as the number of segment types and switch types become large.
Your Environment
The text was updated successfully, but these errors were encountered: