Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change voting for write operations + associated replication fallout (main) #7490

Merged
merged 5 commits into from
Feb 13, 2024

Conversation

alanking
Copy link
Contributor

@alanking alanking commented Feb 8, 2024

Addresses #7476
Workaround for #6954

Sibling PR: irods/irods_docs#241


Unit tests and core tests pass except for test_delay_queue.Test_Delay_Queue. Will investigate that, but it does not seem related to these changes.

Important changes occurred in rsDataObjRepl.cpp and I tried to leave explanatory comments/log messages where I could. The rest is mostly changes to the tests.

I kept the self-targeting commits separated in case we change our minds about that.

@alanking
Copy link
Contributor Author

alanking commented Feb 8, 2024

The test_delay_queue failure occurs when I build the commit used for 4.3.1 as well... in debug mode. When I run 4.3.1 in release mode, everything works as expected. I've created an issue here to look at it later: #7491. Meanwhile, I'm adding a commit to skip that test.

@alanking
Copy link
Contributor Author

alanking commented Feb 8, 2024

Also noticing some inconsistencies in the commit messages. I'll fix that up on the next force push.

Copy link
Collaborator

@korydraughn korydraughn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't reviewed the test changes yet.

Will look at those soonish. Other than that, things look correct.

scripts/run_tests.py Outdated Show resolved Hide resolved
server/api/src/rsDataObjRepl.cpp Show resolved Hide resolved
server/api/src/rsDataObjRepl.cpp Show resolved Hide resolved
server/api/src/rsDataObjRepl.cpp Outdated Show resolved Hide resolved
server/api/src/rsDataObjRepl.cpp Outdated Show resolved Hide resolved
Copy link
Collaborator

@korydraughn korydraughn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like we're very close. Other than the review comments I just left, what remains for this PR?

server/api/src/rsDataObjRepl.cpp Outdated Show resolved Hide resolved
scripts/run_tests.py Outdated Show resolved Hide resolved
scripts/irods/test/test_irepl.py Outdated Show resolved Hide resolved
scripts/irods/test/test_irepl.py Show resolved Hide resolved
Copy link
Collaborator

@korydraughn korydraughn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of the review comments are resolved. Has this passed the test suite?

@alanking
Copy link
Contributor Author

All the tests passed last time I ran them, but I'm going to squash, build, and run it again to be sure.

@korydraughn
Copy link
Collaborator

Sounds good.

@alanking
Copy link
Contributor Author

Pushed up the squashed commits. All tests passed.

Copy link
Collaborator

@korydraughn korydraughn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The body of commit [ 7476] Update tests for new replication rules appears to have a grammar error around a non-deterministic tests?

Other than that, looks good.

@alanking
Copy link
Contributor Author

Ah, I think I meant "a few non-deterministic tests". I'll update it.

Copy link
Collaborator

@korydraughn korydraughn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pound it.

This should make it a bit easier to find out which tests fail when
changes to replication occur.
Sometimes, the server dies and leaves behind shmem files which prevents
new shmem files from being created. This is a problem when running tests
in succession because when one test fails due to this unusual issue, the
rest of the tests in the set fail (more or less).

This adds a function which deletes all the irods_* files out of all of the
possible shmem locations. scripts/run_tests.py has been modified to stop
calling IrodsController.restart() before running tests and instead stops
the server, clears the shmem files if they are there, and starts the server
again.

Again, this function is only being used in run_tests.py as of this commit.
Also, skip a bunch of redundant tests in the replication-to/from-hierarchy
cases. This also allows a few non-deterministic tests in the same suite to
be run because we can determine the behavior and write a test for it now.
This commit changes the "preferred" replica status for votes on write
operations from stale to good.

One consequence of this decision is allowing replication operations to
to allow targeting good replicas for update. Instead of actually overwriting
the data in the good replica, fileModified is triggered directly to invoke
any policy defined by coordinating resources.

Clients can now request good replicas to be overwritten provided that the
source replica is good. Clients can now request that a replica overwrite
itself provided the source and destination replicas are good. In both cases,
no data movement occurs.

When the source and destination replicas are the same replica, the
replication operation is a no-op, although fileModified will be
triggered so that any configured policy will be in effect. If the
source replica is stale, it cannot be used to update any other
replicas regardless.
This test fails for debug builds for some reason. We need to get this
fixed so that we can run the test again.
@alanking alanking merged commit e6df49f into irods:main Feb 13, 2024
12 of 13 checks passed
@alanking alanking deleted the 7476.m branch February 13, 2024 17:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants