Change voting for write operations + associated replication fallout (main) #7490

alanking · 2024-02-08T15:42:18Z

Addresses #7476
Workaround for #6954

Unit tests and core tests pass except for test_delay_queue.Test_Delay_Queue. Will investigate that, but it does not seem related to these changes.

Important changes occurred in rsDataObjRepl.cpp and I tried to leave explanatory comments/log messages where I could. The rest is mostly changes to the tests.

I kept the self-targeting commits separated in case we change our minds about that.

alanking · 2024-02-08T19:50:09Z

The test_delay_queue failure occurs when I build the commit used for 4.3.1 as well... in debug mode. When I run 4.3.1 in release mode, everything works as expected. I've created an issue here to look at it later: #7491. Meanwhile, I'm adding a commit to skip that test.

alanking · 2024-02-08T20:13:04Z

Also noticing some inconsistencies in the commit messages. I'll fix that up on the next force push.

korydraughn

I haven't reviewed the test changes yet.

Will look at those soonish. Other than that, things look correct.

plugins/resources/replication/src/irods_create_write_replicator.cpp

scripts/run_tests.py

server/api/src/rsDataObjRepl.cpp

korydraughn

Looks like we're very close. Other than the review comments I just left, what remains for this PR?

server/api/src/rsDataObjRepl.cpp

scripts/run_tests.py

scripts/irods/test/test_irepl.py

korydraughn

All of the review comments are resolved. Has this passed the test suite?

alanking · 2024-02-12T16:34:31Z

All the tests passed last time I ran them, but I'm going to squash, build, and run it again to be sure.

korydraughn · 2024-02-12T16:41:32Z

Sounds good.

alanking · 2024-02-13T15:58:45Z

Pushed up the squashed commits. All tests passed.

korydraughn

The body of commit [ 7476] Update tests for new replication rules appears to have a grammar error around a non-deterministic tests?

Other than that, looks good.

alanking · 2024-02-13T16:49:04Z

Ah, I think I meant "a few non-deterministic tests". I'll update it.

korydraughn

Pound it.

This should make it a bit easier to find out which tests fail when changes to replication occur.

Sometimes, the server dies and leaves behind shmem files which prevents new shmem files from being created. This is a problem when running tests in succession because when one test fails due to this unusual issue, the rest of the tests in the set fail (more or less). This adds a function which deletes all the irods_* files out of all of the possible shmem locations. scripts/run_tests.py has been modified to stop calling IrodsController.restart() before running tests and instead stops the server, clears the shmem files if they are there, and starts the server again. Again, this function is only being used in run_tests.py as of this commit.

Also, skip a bunch of redundant tests in the replication-to/from-hierarchy cases. This also allows a few non-deterministic tests in the same suite to be run because we can determine the behavior and write a test for it now.

This commit changes the "preferred" replica status for votes on write operations from stale to good. One consequence of this decision is allowing replication operations to to allow targeting good replicas for update. Instead of actually overwriting the data in the good replica, fileModified is triggered directly to invoke any policy defined by coordinating resources. Clients can now request good replicas to be overwritten provided that the source replica is good. Clients can now request that a replica overwrite itself provided the source and destination replicas are good. In both cases, no data movement occurs. When the source and destination replicas are the same replica, the replication operation is a no-op, although fileModified will be triggered so that any configured policy will be in effect. If the source replica is stale, it cannot be used to update any other replicas regardless.

This test fails for debug builds for some reason. We need to get this fixed so that we can run the test again.

korydraughn reviewed Feb 8, 2024

View reviewed changes

korydraughn reviewed Feb 9, 2024

View reviewed changes

server/api/src/rsDataObjRepl.cpp Outdated Show resolved Hide resolved

scripts/run_tests.py Outdated Show resolved Hide resolved

scripts/irods/test/test_irepl.py Outdated Show resolved Hide resolved

scripts/irods/test/test_irepl.py Show resolved Hide resolved

korydraughn reviewed Feb 12, 2024

View reviewed changes

alanking force-pushed the 7476.m branch from a91a513 to 544f2c5 Compare February 13, 2024 15:58

korydraughn reviewed Feb 13, 2024

View reviewed changes

alanking force-pushed the 7476.m branch from 544f2c5 to 30fffa1 Compare February 13, 2024 16:53

korydraughn approved these changes Feb 13, 2024

View reviewed changes

alanking added 5 commits February 13, 2024 12:33

[irods#858] Add unittest.subTests for repl table tests

7ed7a6e

This should make it a bit easier to find out which tests fail when changes to replication occur.

[irods#7476] Update tests for new replication rules

7b4b0e6

Also, skip a bunch of redundant tests in the replication-to/from-hierarchy cases. This also allows a few non-deterministic tests in the same suite to be run because we can determine the behavior and write a test for it now.

[irods#7491] Skip test_exception_in_delay_server

98747c3

This test fails for debug builds for some reason. We need to get this fixed so that we can run the test again.

alanking force-pushed the 7476.m branch from 30fffa1 to 98747c3 Compare February 13, 2024 17:33

alanking merged commit e6df49f into irods:main Feb 13, 2024
12 of 13 checks passed

alanking deleted the 7476.m branch February 13, 2024 17:33

alanking mentioned this pull request Feb 13, 2024

Change voting for write operations + associated replication fallout (4-3-stable) #7499

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change voting for write operations + associated replication fallout (main) #7490

Change voting for write operations + associated replication fallout (main) #7490

alanking commented Feb 8, 2024 •

edited

Loading

alanking commented Feb 8, 2024

alanking commented Feb 8, 2024

korydraughn left a comment

korydraughn left a comment

korydraughn left a comment

alanking commented Feb 12, 2024

korydraughn commented Feb 12, 2024

alanking commented Feb 13, 2024

korydraughn left a comment •

edited

Loading

alanking commented Feb 13, 2024

korydraughn left a comment

Change voting for write operations + associated replication fallout (main) #7490

Change voting for write operations + associated replication fallout (main) #7490

Conversation

alanking commented Feb 8, 2024 • edited Loading

alanking commented Feb 8, 2024

alanking commented Feb 8, 2024

korydraughn left a comment

Choose a reason for hiding this comment

korydraughn left a comment

Choose a reason for hiding this comment

korydraughn left a comment

Choose a reason for hiding this comment

alanking commented Feb 12, 2024

korydraughn commented Feb 12, 2024

alanking commented Feb 13, 2024

korydraughn left a comment • edited Loading

Choose a reason for hiding this comment

alanking commented Feb 13, 2024

korydraughn left a comment

Choose a reason for hiding this comment

alanking commented Feb 8, 2024 •

edited

Loading

korydraughn left a comment •

edited

Loading