Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Prepacker] Pack Molecule Data Structure Clarity #2791

Open
1 of 9 tasks
AlexandreSinger opened this issue Oct 30, 2024 · 0 comments
Open
1 of 9 tasks

[Prepacker] Pack Molecule Data Structure Clarity #2791

AlexandreSinger opened this issue Oct 30, 2024 · 0 comments

Comments

@AlexandreSinger
Copy link
Contributor

AlexandreSinger commented Oct 30, 2024

The molecule data structure is defined here:

/**
* @brief Represents a grouping of atom blocks that match a pack_pattern,
* these groups are intended to be placed as a single unit during packing
*
* Store in linked list
*
* A chain is a special type of pack pattern. A chain can extend across multiple logic blocks.
* Must segment the chain to fit in a logic block by identifying the actual atom that forms the root of the new chain.
* Assumes that the root of a chain is the primitive that starts the chain or is driven from outside the logic block
*
* Data members:
*
* type : either a single atom or more atoms representing a packing pattern
* pack_pattern : if not a single atom, this is the pack pattern representing this molecule
* atom_block_ids : [0..num_blocks-1] IDs of atom blocks that implements this molecule, indexed by
* t_pack_pattern_block->block_id
* chain_info : if this is a molecule representing a chained pack pattern, this data structure will
* hold the data shared between all molecules forming a chain together.
* valid : whether the molecule is still valid for packing or not.
* num_blocks : maximum number of atom blocks that can fit in this molecule
* root : index of the pack_pattern->root_block in the atom_blocks_ids. root_block_id = atom_block_ids[root]
* base_gain : intrinsic "goodness" score for molecule independent of rest of netlist
* next : next molecule in the linked list
*/
class t_pack_molecule {
public:
/* general molecule info */
bool valid;
float base_gain;
enum e_pack_pattern_molecule_type type;
/* large molecules info */
t_pack_patterns* pack_pattern;
int root;
int num_blocks;
std::vector<AtomBlockId> atom_block_ids;
std::shared_ptr<t_chain_info> chain_info;
t_pack_molecule* next;
// a molecule is chain is it is a forced pack and its pack pattern is chain
bool is_chain() const { return type == MOLECULE_FORCED_PACK && pack_pattern->is_chain; }
};

I have noticed that there are some parts of the data structure that are not clear and a bit wasteful. I think this struct should be reworked in the context of the Prepacker to make it easier to work with.

  • The atom_blk_ids member is confusing. This is what caused me to look into this. Some of the elements of this vector may be invalid. The reason is that this is not a list of atoms in the molecule; it is a mapping between the IDs in the pack pattern and the molecule atoms. The invalid entries are where the pack pattern is not completely full. This is very confusing but also not very good for performance. Instead we should have an ordered list of all atom blocks in the molecule, and then another list which stores the pattern ID for each of those atoms. That or the documentation should be made more clear.
  • The valid member is hardly used anymore and should be removed. This variable just says if this molecule has been packed already or not, which is now being stored in the Cluster Legalizer class. This should be removed / phased out.
  • The num_blocks member is poorly named. It is not the number of blocks currently in the molecule, it is the maximum number of blocks that can be stored in the molecule. I bet you this was used to allocate a C-style array, but in recent years we moved to a vector. This variable can probably be removed.
  • The root member is very strange. Why store the ID in the array instead of the AtomBlockId itself (they are both basically integers). This should be investigated and maybe cleaned up.
  • The base_gain member is specific to the clustering algorithm being used. This should not be in the molecule struct in my opinion.
  • The pack pattern part of this struct is a bit confusing. It may be a good idea to encapsulate that into another struct to make it clear.
  • The molecules should not be a linked list. Instead we should have MoleculeIDs which the Prepacker manages. Having unique IDs for each molecule is a good idea and can make traversing the molecules much more efficient. Since we do not care about the order of the molecules list (and we do not expect to be adding or removing molecules often within the list), storing in a standard vector would actually be more performant!
  • The declaration of this type should be moved to the Prepacker. It only makes sense in the context of the Prepacker.
  • We should add an independent verifier class which verifies that the Prepacker created all the molecules correctly and everything works as expected. This also creates a clear interface for what user of the Prepacker should expect to see.
AlexandreSinger added a commit to AlexandreSinger/vtr-verilog-to-routing that referenced this issue Nov 2, 2024
Pack molecules originally contained a flag called "valid" which
signified that the molecule has not been clustered yet. This flag is not
necessary since the Cluster Legalizer maintains this information
internally. Removed the valid flag from the pack molecule struct.

See issue verilog-to-routing#2791
AlexandreSinger added a commit to AlexandreSinger/vtr-verilog-to-routing that referenced this issue Nov 2, 2024
Pack molecules originally contained a flag called "valid" which
signified that the molecule has not been clustered yet. This flag is not
necessary since the Cluster Legalizer maintains this information
internally. Removed the valid flag from the pack molecule struct.

See issue verilog-to-routing#2791
AlexandreSinger added a commit to AlexandreSinger/vtr-verilog-to-routing that referenced this issue Nov 5, 2024
Pack molecules originally contained a flag called "valid" which
signified that the molecule has not been clustered yet. This flag is not
necessary since the Cluster Legalizer maintains this information
internally. Removed the valid flag from the pack molecule struct.

See issue verilog-to-routing#2791
AlexandreSinger added a commit to AlexandreSinger/vtr-verilog-to-routing that referenced this issue Nov 5, 2024
Pack molecules originally contained a flag called "valid" which
signified that the molecule has not been clustered yet. This flag is not
necessary since the Cluster Legalizer maintains this information
internally. Removed the valid flag from the pack molecule struct.

See issue verilog-to-routing#2791
Junius00 added a commit to abdelfattah-lab/vtr-updated that referenced this issue Nov 22, 2024
* more end-user friendly text in server_mode/index.rst

* IPA Client doc update

* minor fix in server_mode/index.rst

* add embedded url pointing to IPAClient

* fix embedded link syntax

* update NoC options in command_line_usage.rst

* mention all available options for --noc_routing_algorithm

* updated manual section

* [CI] Fix Golden Results

Golden results are incorrect for the strong tests after recent merge
conflicts.

* avoid a copy in initial placement

* use std::vector for arch.Switches

* typos and comments

* non-static member variable in CentroidMoveGenerator

* [Router] Added Comments for RCV in the Post-Pop Prune Function

Added Vaughn's detailed comments explaining the RCV technique and how
the post-pop prune function in the connection router works for RCV.

* [Router] Fixed a Bug in the Connection Router After Changing Some APIs

After changing the API of `evaluate_timing_driven_node_costs`, there was
a typo (or bug) left inside an ifdef block.

This commit fixed the bug and changed the typo to the correct variable
name.

* Fixed wirelength and minW anomalies. Fixed spelling and resolved requests.

* [Router] Fixed `vtr_reg_strong` QoR Failures After Rebasing to Master

`strong_place_delay_calc_method` and `strong_place_delay_model` tests
had QoR failures after rebasing to master at commit 5925e69.

This commit updated the corresponding golden results to make the
regression tests happy.

* add annealer.cpp/.h

* [STA] Fixed Incremental Slack Timing Analysis Verify Code

In debug mode, the STA checks if the incremental STA matches the STA
computed from scratch. In this check it checked for if the critical
nodes matched; however, the nodes returned are not stably deterministic
(it does not break ties in a deterministic way). After discussing with
Vaughn, it is safe to just remove the nodes check since it would not
change the result of the analysis.

closes verilog-to-routing#2754

* commit before I go home.

* updated quickstart pics and added ODIN II

* updated automatically running vtr flow

* fixed name typo

* [vpr][CLI] change has_choking_spot to router_opt_choke_points

* [vpr][route] rename has_choking_spot to has_choke_point

* [vtr_flow] fix CLI for choke point usage

* [vpr][route] renaming remaining has_choke_points

* [vpr][place] add chanz_place_cost_fac_

* [vpr][place] use chanz_place_cost_fac_ when calculating cost

* [vpr][place] add alloc_and_load_for_fast_vertical_cost_update to calculate chanz_place_cost_fac_

* [vpr][place] fix typos

* [vpr][place] add comments

* [vpr][place] populate chanz_place_cost_fac_

* [vpr][place] fix grid width/height type

* [vpr][cli] fix choke point msg

* added tests

* Clarify corner case behavior of --device option

* removed ABC from parmys section

* [vpr][route] return from set_nets_choking_spots if flat router is not selected

* [Task] Added Ability to Constrain Circuits in Config

The VTR task interface uses configuration files to specify what circuits
to run for a given task. The current implementation of these config
files only allowed the user to run all of their circuits on the same
device and using the same flat constraint file (for example fixing the
IOs). This was only able to be done through specifying them in the
script_params option, but this was limited.

This causes problems for the AP flow since each circuit has different IO
constraints and may be constrained to different devices or
architectures. The only way to use the VTR tasks is to create one
configuration file per circuit, which is very wasteful and challenging
to work with. It also makes parsing the QoR more difficult.

Added a new configuration option that can be added to the configuration
file which can constrain a circuit to a given option. The syntax is as
follows:
  circuit_constraint_list_add=(<circuit>, <constr_key, constr_val>)
This line will tell VTR run task to constrain the circuit on the
constr_key option.

The currently supported constraint keys are:
        - arch: Constrain the circuit to only run on the given arch
        - device: Constrain the circuit to only use the given device
        - constraints: Constrain the circuit's atom placement

If a constraint is not specified, the bahaviour is unchanged. So, for
example, if an arch constraint is not specified, the circuit will run on
all the architectures in the arch_list (as usual).

Future work is to support constraining the route_chan_width so different
circuits can be run with different channel widths in the same config
file. This can easily be added using this interface.

* [Router] Fixed the Bug Causing Tons of Nightly Test Failures

The bug was introduced by me in commit 2966d66; I merged two adjacent
if-bodies with conditions that appeared the same, but were in fact not.

A simplified example is:
`if (A == 0) { ... A might be modified ... } if (A == 0) { return ; }`

Merging the if-bodies would be problematic if A is modified in the first
if-body and is no longer 0.

This mistake caused tons of nightly test failures, as mentioned in the
comments in PR#2720. So glad that the nightly tests CAUGHT this bug,
which was not detected by the VTR strong regression tests.

* Incorporate PR feedback

* [vpr][place] add acc_tile_num_inter_die_conn

* [vpr][place] use prefix sum to populate chanz_place_cost_fac_

* solve compilation errors

* show annealing schedule in ShowSetup.cpp

* added a constructor for PlacerTimingContext

* add a constructor for PlacerMoveContext

* added invalidate_affected_connections() to NetPinTimingInvalidator

* added  PlacerTimingContext::revert_td_cost()

* add select_move_generator() function

* enum class e_place_algorithm and e_place_bounding_box_mode

* enum class e_agent_algorithm

* added golden

* remove unused types from vpr_types.h

* Added utility_scripts under the titan_blif benchmarks directory. It contains the q2_flow.tcl script and documentation (MS word format) on how to use the script to pass designs with novel hard blocks in them through the Titan flow. The script and documentation were created by Srivatsan, and enable new hard blocks that Quartus does not understand to pass through Quartus synthesis and on to VPR for evaluation.

* Updated the titan README to point at the new location of the titan repo, which is on Vaughn's filesystem instead of Kevin Murray's now.

* [RegTests] Fixed VTR Nightly Test Failures

- Commented out the tiny circuit (3-bit adder, `vtr_reg_nightly_test1/
  arithmetic_tasks/figure_8/adder_003bits.v`) which randomly failed to
  route at the relaxed channel width (after routing at min chan width).
- Updated the golden results for `power_extended_arch_list` and
  `power_extended_circuit_list`.
- Relaxed the tolerance of QoR pass for small designs. This involved
  only two files under `vtr_flow/parse/pass_requirements` directory:
  1. common/pass_requirements.vpr_route_min_chan_width_small.txt
  2. timing/pass_requirements.vpr_route_relaxed_chan_width_small.txt
- Relaxed the `max_vpr_mem` tolerance for titan designs, especially for
  `vtr_reg_nightly_test7/titan_other_run_flat`.

* converted logging macros in annealer.cpp to methods

* apply PR comments

* arch.Directs to arch.directs

* reformat comments of t_arch_switch_inf

* return by value in parse_direct_pin_name()

* add more comments

* move PortPinToBlockPinConverter to vpr_utils

* fix bug in parsing pb_type_name.port_name[end_pin_index:start_pin_index]

* [RegTests] Commented Out 3-Bit Adders in the VTR Nightly Tests

Commented out the tiny circuit (3-bit adder in `vtr_reg_nightly_test1/
arithmetic_tasks/figure_8/adder_003bits.v` and the same test for ODIN)
in both config and golden_results files.

It was a hotfix for PR verilog-to-routing#2785, which forgot to make these changes.

All failures, except for `vtr_reg_nightly_test7/titan_other_run_flat` as
mentioned in Issue verilog-to-routing#2786, have been fixed now.

* doxygen comments for static function in place_macro.h

* replace char* with std::string and use local loop vars

* fix compilation warning in compressed_grid.cpp

* fix compilation errors in arch_check.cpp

* call c_str() in physical_type_util.cpp

* call c_str() on name member variable when necessary

* avoid calling vtr::strdup() for t_logical_block_type.name and call c_str() when needed

* remove redundant parentheses + typo

* fix memory leak by removing call to strdup

* set --route_chan_width to 40 for post_routing_sync

* chagne the implementation of vtr::vector::pairs()

* update strong qor results

* remove print statements

* fix no member named 'tie' in namespace 'std'

* [vpr][place] initialize other entried of  acc_tile_num_inter_die_conn

* [vpr][place][net_cost] add get_chanz_cost_factor signiture and acc_tile_num_inter_die_conn_

* [vpr][place][net_cost] get_chanz_cost_factor impl

* [vpr][place][net_cost] remove place_cost_exp from functions arguments

* [CI] Turned Off Nightly Tests on PR and Merge

The Nightly Tests have not worked for weeks. There is no reason to keep
trying to run them everytime we raise a PR.

I still left the cron job on so we can see if it ever comes back.

This just makes merging PRs easier.

* [vpr][place][net_cost] fix num_inter_dir_conn corner cases

* [vpr][place][net_cost] call get_chanz_cost_factor instead of chanz_place_cost_fac_

* [vpr][place][net_cost] remove chanz_place_cost_fac_ calculation

* add an example in the comment for pairs() method of vtr::vector

* [vpr][place][net_cost] recomment on how to calculate acc_tile_num_inter_die_conn_

* update qor results in vtr_reg_multiclock_odin/func_multiclock

* add classes for random number generation

* add RngContainer class

* virtual destructor for RandomNumberGeneratorInterface

* swap M and N values in SpecRandomNumberGenerator

* resolve compilation warnings in specrand.cpp

* use vtr::RngContainer for switchpoints

* add vtr::RngContainer arguments to initial_noc_placement

* pass vtr::RngContainer down the function call hierarchy in initial placement

* add a reference to vtr::RngContainer in KArmedBanditAgent

* update move_utils and RL_agent_util with vtr::RngContainer args

* update move generators with argument of vtr::RngContainer

* update noc_place_utils with RngContainer args

* [libs][rr_graph] fix a typo

* [vpr][route] fix a typo in overuse report

* [vpr][route][lookahead] iterate over layers to get the cost from opin to chan

* [vpr][route][lookahead] add comment on router lookahead

* pass RngContainer argument down the call hierarchy in place.cpp

* [CI] renambe 3d_titan_opin* to 3d_cb_titan*

* [ci] add 3d sb test

* [ci] add custom_3d_sb_fanin_fanout 60

* use RngContainer in cb_metrics.cpp

* [ci] add 3d sb test to task list

* fix compilation error in test_random.cpp

* [CI] add golden results for 3d_sb

* Bump libs/EXTERNAL/libcatch2 from `fa43b77` to `119a7bb`

Bumps [libs/EXTERNAL/libcatch2](https://github.com/catchorg/Catch2) from `fa43b77` to `119a7bb`.
- [Release notes](https://github.com/catchorg/Catch2/releases)
- [Commits](catchorg/Catch2@fa43b77...119a7bb)

---
updated-dependencies:
- dependency-name: libs/EXTERNAL/libcatch2
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>

* remove calls to floor in median move generators

* use lower case letters for argument names

* update qor for vtr_nightly1

* call c_str() for name member variable

* initial implementation of prefix sum for chanx and chany

* use effective inted in operator[] of NdOffsetMatrix<T, 1>

* change DimRange from {-1, grid_height} to {-2, grid_height}

* update golden results of vtr_nightly6

* [vpr][place] fix typos + unify edge loops

* [vpr][place]make get_chanz_cost_factor a private function

* [vpr][place] add is_multi_layer_ to net cost handler fields

* [vpr][place] factor out crossing multiplication

* [vpr][place][net_cost] use bb

* [Prepacker] Removed Valid Flag From Molecules

Pack molecules originally contained a flag called "valid" which
signified that the molecule has not been clustered yet. This flag is not
necessary since the Cluster Legalizer maintains this information
internally. Removed the valid flag from the pack molecule struct.

See issue verilog-to-routing#2791

* [Prepacker] Removed Revalid Molecules Method

Found that the revalid molecules method was more complicated than it
needed to be in the context of the Cluster Legalizer. Simplified its
code so that it makes more sense.

* [vpr][place][net_cost] fix typos

* enum class e_move_result

* exclude left and bottom channels in average channel width factor computation

* multiply `crossing` at the end

* [ci] update golden

* [ci] update odin golden

* [vpr][route][lookahead] fix comment

* don't use default values for member variables initialized in constructor

* add comments

* [Place] Independent Placement Verification

Created a method that would independently verify the placement in the
VPR flow. If a placement passes this verification, it is assumed that it
can be used in routing without issue.

By design, this method does not use any global variables (everything
needs to be passed in) and recomputes everything that it does not
assume. This allows it to be independent of the placement flow being
used, so this method can be used in the AP flow as well.

* [Clustering] Independent Clustering Verification

Created a method that would independently verify the clustering in the
VPR flow. If a clustering passes this verification, it is assumed that
it can be used in placement and routing without issue.

By design, this method does not use any global variables (everything
needs to be passed in) and it recomputes everything that it does not
assume. This allows it to be independent of the packing flow, so this
method can also be used in the AP flow.

* [Clustering] Moved Clustered Netlist Verification Into Load Packing

The verification and printing of the clustered netlsit made more sense
inside of the load_packing method. Moved.

* [Placement] Resolved Inconsistency with Placing at Root Tile Locs

Found an inconsistency with how the placement of cluster blocks into
large tiles was handled in VPR. During placement, clusters are only
placed at the rool tile location of the grid; however, specific parts of
the later stages require that clusters are placed at all tile locations
of large tiles. This is inconsistent and makes it confusing to know
where clusters should be placed.

Enforced that clusters must be placed at the root tile location and
never placed in other tile locations in the large tiles. This required
fixing an oversite in the flat router which was relying on the fact that
clusters will not be placed at the root tile location.

* remove place_cost_exp and calls to pow()

* pass blk_loc_registry from placer_state instead of place_ctx

* skip updating the RL agent table and moves stats in case of manual move

* Adding layer widget to the manual move window in GUI

* update basic golden results

* update golden results for strong tests

* add some comments and update golder results

* update golden results

* fix the remaining failure in strong tests

* types and pass by reference in TimingGraph.cpp

* [AP] Partial Legalizer

This adds the Patial Legalizer block into VPR. A Partial Legalizer takes
a partial placement (which likely has many overlapping blocks and is
very illegal with blocks in the wrong tiles) and creates a more legal
partial placement. These are often called spreaders since their main
task is to spread blocks away from each other (the solver tends to put
blocks on top of each other).

Added a multi-commodity flow-based spreading partial legalizer based on
the work by Darav et al. This algorithm was modified to work within the
VPR infrastructure on any hypothetical architecture which can be
expressed in VPR.

* avoid name conflicts in sync_netlists_to_routing_flat

* fixed a few typos

* get rid of DUSTY_SCHED

* change the order of if statement so that e_place_algorithm::CRITICALITY_TIMING_PLACE is checked first

* don't resize PlacerMoveContext.??_coord in the constructor

* add incr_blk_type_moves() and incr_accept_reject() methods

* don't include time.h as we no longer call clock()

* move the ownership of move generators to annealer

* construct manual move generator in annealer's constructor

* add some comments

* fix compilation error by remove definition for t_annealing_state::update_move_lim()

* [AP] Global Placer

Created the Global Placer base class which will create a partial
placement based on the netlist and the architecture. This attempts to
find a "globally" good placement, without considering all of the complex
constraints of the FPGA architecture (without clustering).

Implemented a SimPL-based Global Placer which maintains an upper and
lower bound solution which slowly approach each other over several
iterations. The lower-bound is the analytically solved solution which
tries to optimize the placement (hinted by the upper bound solution).
The upper-bound solution is the lower-bound solution which has been
partially legalized.

* final

* reorder layout

* [CMake] Upgrade Minimum Version to 3.10

The CI currently always runs on the latest version of CMake. I am not
sure why this is, its not the best practice, but CMake is a fairly
stable application so it should be fine.

CMake is deprecating version 3.9, and a few CMake files require the
minimum version to be 3.9 or larger. Changed these files to be 3.10 or
later.

* Running VPR manually

* add comment for PlacementAnnealer class make a few methods private

* apply Alex's comments

* add separators to annelaer.cpp

* add get_move_abortion_logger() to PlacementAnnealer

* [AP] Testing Infrastructure

Added changes required to test the AP Flow on the MCNC and VTR
benchmarks.

This creates fixed device sizes for each benchmark which can be used to
fairly compare the AP Flow to the default VTR flow.

Added the currently passing MCNC and VTR benchmarks to the strong tests
to lock in the AP flow. It is a bit unstable right now until the full
legalizer is put in. May need to disable a few of the tests.

* [Sys] Increased Read Buffers to Reduce Total Syscalls

This change was originally proposed by @heshpdx

See PR verilog-to-routing#2772

By increasing the read buffer, the number of syscalls can be reduced
since more of the large files are read per call.

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: Oleksandr <apivovarov@quicklogic.com>
Co-authored-by: w0lek <141943514+w0lek@users.noreply.github.com>
Co-authored-by: soheilshahrouz <soheilqs@gmail.com>
Co-authored-by: vaughnbetz <vaughnbetz@gmail.com>
Co-authored-by: ZohairZaidi <zohair.zaidi@mail.utoronto.ca>
Co-authored-by: AlexandreSinger <alex.singer@mail.utoronto.ca>
Co-authored-by: Hang Yan <ueqri@outlook.com>
Co-authored-by: Joshua Fife <jpfife17@gmail.com>
Co-authored-by: Joshua Fife <83309027+WhiteNinjaZ@users.noreply.github.com>
Co-authored-by: amin1377 <amin1377.mohaghegh@gmail.com>
Co-authored-by: Zohair Zaidi <75513427+ZohairZaidi@users.noreply.github.com>
Co-authored-by: petergrossmann21 <peter@zeroasic.com>
Co-authored-by: AlexandreSinger <49374526+AlexandreSinger@users.noreply.github.com>
Co-authored-by: Vaughn Betz <vaughn@eecg.utoronto.ca>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Soheil Shahrouz <80951211+soheilshahrouz@users.noreply.github.com>
Co-authored-by: MohamedElgammal <mohamed.elgammal@mail.utoronto.ca>
Co-authored-by: Duck Deux <duck2@protonmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant