-
Notifications
You must be signed in to change notification settings - Fork 632
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(mirror): Pass epoch config overrides on forknet init #12421
base: master
Are you sure you want to change the base?
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #12421 +/- ##
==========================================
+ Coverage 71.16% 71.68% +0.51%
==========================================
Files 843 843
Lines 170788 170788
Branches 170788 170788
==========================================
+ Hits 121547 122432 +885
+ Misses 43913 42993 -920
- Partials 5328 5363 +35
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
remove the epoch config dir on new_test
4d2d660
to
4c53148
Compare
@@ -141,6 +143,77 @@ class TestState(Enum): | |||
}""" | |||
|
|||
|
|||
class EpochConfigOverrides: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: feel free to keep it if it's more your style and you want it that way, but why a class? The only place it's used is in the single statement EpochConfigOverrides(...).override(...)
, which to me is an indication you could just keep it simpler and write this as a normal function
|
||
existing_epoch_config_files = os.listdir(epoch_config_dir) | ||
existing_protocol_versions = set([ | ||
int(re.match(r'(\d+)\.json', f).group(1)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a bit fragile in that if someone ever adds a random file to this directory maybe while debugging some problem with the forknet setup or something, we will crash here instead of just ignoring it. Maybe not super likely but I think it's prob best to be more robust and skip filenames that dont match
int(re.match(r'(\d+)\.json', f).group(1)) | ||
for f in existing_epoch_config_files | ||
]) | ||
previous_protocol_version_file = min(existing_protocol_versions) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This min()
will throw an exception if it's empty. Even if it's guaranteed not to be after neard init --dump-epoch-config
, it's prob still not really needed for this program to crash here rather than just reporting an error, or maybe even just continuing without epoch config modifications but printing an error or something
also nit: previous_protocol_version_file
seems to refer to the number, not the filename, so maybe just previous_protocol_version
?
self.neard_init(rpc_port, protocol_port, validator_id, | ||
genesis_protocol_version, | ||
epoch_config_overrides) | ||
except Exception as e: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please try to catch a more specific set of exceptions, or to express the logic here in some other way if you want. Catching all exceptions is not great because some of them should really just cause us to exit if they indicate a bug in the program rather than some expected thing
see "Never use catch-all except: statements" here
@@ -591,6 +594,7 @@ def __call__(self, parser, namespace, values, option_string=None): | |||
new_test_parser.add_argument('--genesis-protocol-version', type=int) | |||
new_test_parser.add_argument('--stateless-setup', action='store_true') | |||
new_test_parser.add_argument('--gcs-state-sync', action='store_true') | |||
new_test_parser.add_argument('--epoch-config-overrides', type=json.loads) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe add some help text with an example like in the PR description? Also might be good to warn the user in this help text that this won't work unless the binary has the --dump-epoch-config
option in neard init
]) | ||
previous_protocol_version_file = min(existing_protocol_versions) | ||
for protocol_version in sorted(self.new_protocol_versions): | ||
if protocol_version in existing_protocol_versions: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This check will not work because self.new_protocol_versions
contains strings. Also note that self.epoch_config_overrides
has string keys. So this needs to all be handled uniformly somehow
I know you've already made some progress on the different things that go into making this epoch config stuff work, but if I may, I have a few thoughts about how this might be made simpler, since some of it seems a bit strange to me. What I'm saying here isn't blocking this PR (I would have approved if not for the bug and So when we run a forknet with this It seems much simpler to me to just directly upload the epoch config files. You could easily include them in the list of files we upload to the neard runner's directory here. And in light of that, maybe consider just reverting the change that added this So why not just do this? If the user gives |
json_data = json.load(f) | ||
|
||
# Apply the overrides | ||
updated_json = jq.compile(overrides).input(json_data).first() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of jq, did you consider just having the --epoch-config-overrides
flag take a path to a json file where the wanted changes are? It might be a bit easier if someone wants to make lots of edits (imagine editing the fields you included in the PR description, along with editing shard_layout
and num_block_producer_seats_per_shard
or something). In that case a command line arg with a bunch of big jq programs might be kind of unwieldy
I have a system design concern that we are building separate logic to handle epoch configs in Python. This feels more fragile than handling most of it in Rust nearcore code, which already knows EpochConfig structure. If we change EpochConfig, forknet will silently break and we'd need to find a fix an issue. My another perspective is that nearcore development heavily relies on forknet, as it shows whether new features are ready for mainnet. Because of that, we should aim to minimise number of dependencies and assumptions made. Then, practically, using Alternatively, it could be better to refactor Then, we could make it a command like Let me know if you have questions. I think it's quite crucial to discuss to make forknet more reliable. |
…2420) When setting up a forknet, the `neard fork-network set-validators` command is run after `neard init`. If `init` was executed with `--dump-epoch-config`, the `epoch_configs` directory is populated with the `mainnet` epoch configs. Additionally, the mirror tool now includes an option to provide overrides to the epoch configurations. ### For further details, check this PR (#12421). If epoch config files are present, they will be utilized to create the genesis configuration for the fork network.
Thanks for taking the time to look over this! The logic introduced here only depends on the fact that EpochConfig is represented as a JSON. All the overrides are provided by the user that has the knowledge of the EpochConfig structure. So changing the EpochConfig struct will not break forknet. |
I think it's also reasonable to have a goal of running forknet in a single command. |
Context
Forknet is initialized by provisioning a host with a setup directory containing the source data folder. The following commands are executed to prepare the network:
neard init
– generates the configs and keys.neard fork-network set-validators
– amends the database, inserts the specified set of validators, and creates the genesis config for the new network.neard fork-network finalize
– completes the forking process.In this PR, we focus on the first two steps. If we need to override the epoch configs, they will be generated by neard init and then modified by the neard-runner process with the provided overrides. We first apply the general overrides under the "all" key, followed by protocol version-specific ones.
When overriding an epoch config that does not exist, the contents are copied from the previously available epoch config and then overwritten. i.e. let's assume that
mainnet
only has epoch configs for protocol 72 and 74. If you want to add overrides for p.v. 75, they will be applied on top of the original epoch config for p.v. 74.When creating the genesis in set-validators, we read the epoch config corresponding to the genesis protocol version.
How to use it
When calling
mirror new-test
add the--epoch-config-overrides
.This will accept a JSON formatted string that has the protocol version as keys and a list of
jq
styled overrides sepparated by|
.By protocol version
For example, if you need to update the
num_block_producer_seats
,num_chunk_validator_seats
andnum_chunk_producer_seats
for protocol version73
, you would use the following argument:{ "74": ".num_block_producer_seats = 3 | .validator_selection_config.num_chunk_validator_seats = 22 | .validator_selection_config.num_chunk_producer_seats = 3"}
With
all
keyA protocol version override only affects a specific epoch config. If you want to apply a change to all epoch configs, add that change under the
all
key instead.Priority
Use the
all
key to apply overrides across all configs simultaneously. Protocol version-specific overrides will then be applied afterward.Testing
For testing, I used a 6 node forknet with different scenarios:
For backwards compatibility: old binary(
2.3.0
), newneard-runner
Result: network progessed ok with all 6 ndoes as validators.
New binary, new
neard-runner
, with overridesmirror new-test --epoch-length 20 --genesis-protocol-version 73 --num-validators 6 --num-seats 6 --stateless-setup --new-chain-id modknet --epoch-config-overrides '{"all": ".validator_selection_config.num_chunk_validator_seats = 22 | .validator_selection_config.num_chunk_producer_seats = 3 | .num_block_producer_seats = 3 | .validator_selection_config.shuffle_shard_assignment_for_chunk_producers = true"}'
Result:
This was for testing purposes. The same result could have been achieved by using the
--num-seats 3
parameter and overriding only theshuffle_shard_assignment_for_chunk_producers parameter
.New binary, new
neard-runner
, with overrides onall
and on version73
mirror new-test --epoch-length 20 --genesis-protocol-version 71 --num-validators 6 --num-seats 6 --stateless-setup --new-chain-id modknet --epoch-config-overrides '{"all": ".validator_selection_config.num_chunk_validator_seats = 22 | .validator_selection_config.num_chunk_producer_seats = 3 | .num_block_producer_seats = 3", "73": ".validator_selection_config.num_chunk_producer_seats = 6"}'
Result:
The network starts at protocol version 71. In the image, it is in the first epoch after the genesis epoch. In the next epoch, protocol version 73 will take effect, along with the
num_chunk_producer_seats = 6
override.