-
Notifications
You must be signed in to change notification settings - Fork 141
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change voting for write operations + associated replication fallout (main) #7490
Conversation
The test_delay_queue failure occurs when I build the commit used for 4.3.1 as well... in debug mode. When I run 4.3.1 in release mode, everything works as expected. I've created an issue here to look at it later: #7491. Meanwhile, I'm adding a commit to skip that test. |
Also noticing some inconsistencies in the commit messages. I'll fix that up on the next force push. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't reviewed the test changes yet.
Will look at those soonish. Other than that, things look correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like we're very close. Other than the review comments I just left, what remains for this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All of the review comments are resolved. Has this passed the test suite?
All the tests passed last time I ran them, but I'm going to squash, build, and run it again to be sure. |
Sounds good. |
Pushed up the squashed commits. All tests passed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The body of commit [ 7476] Update tests for new replication rules
appears to have a grammar error around a non-deterministic tests
?
Other than that, looks good.
Ah, I think I meant "a few non-deterministic tests". I'll update it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pound it.
This should make it a bit easier to find out which tests fail when changes to replication occur.
Sometimes, the server dies and leaves behind shmem files which prevents new shmem files from being created. This is a problem when running tests in succession because when one test fails due to this unusual issue, the rest of the tests in the set fail (more or less). This adds a function which deletes all the irods_* files out of all of the possible shmem locations. scripts/run_tests.py has been modified to stop calling IrodsController.restart() before running tests and instead stops the server, clears the shmem files if they are there, and starts the server again. Again, this function is only being used in run_tests.py as of this commit.
Also, skip a bunch of redundant tests in the replication-to/from-hierarchy cases. This also allows a few non-deterministic tests in the same suite to be run because we can determine the behavior and write a test for it now.
This commit changes the "preferred" replica status for votes on write operations from stale to good. One consequence of this decision is allowing replication operations to to allow targeting good replicas for update. Instead of actually overwriting the data in the good replica, fileModified is triggered directly to invoke any policy defined by coordinating resources. Clients can now request good replicas to be overwritten provided that the source replica is good. Clients can now request that a replica overwrite itself provided the source and destination replicas are good. In both cases, no data movement occurs. When the source and destination replicas are the same replica, the replication operation is a no-op, although fileModified will be triggered so that any configured policy will be in effect. If the source replica is stale, it cannot be used to update any other replicas regardless.
This test fails for debug builds for some reason. We need to get this fixed so that we can run the test again.
Addresses #7476
Workaround for #6954
Sibling PR: irods/irods_docs#241
Unit tests and core tests pass except for
test_delay_queue.Test_Delay_Queue
. Will investigate that, but it does not seem related to these changes.Important changes occurred in
rsDataObjRepl.cpp
and I tried to leave explanatory comments/log messages where I could. The rest is mostly changes to the tests.I kept the self-targeting commits separated in case we change our minds about that.