[server/kv] Fix out of order exception after delete a not exist row #238

luoyuxia · 2024-12-20T11:17:32Z

Purpose

Linked issue: close #230

Although change logs is empty, we should still try to append the empty logs to enable the sequence id to be awared by WriterStateManager.

Tests

FlussTableITCase#testDeleteNotExistRow

API and Format

Documentation

wuchong

This fix doesn't work. When I enable logging in local, there are many ERROR messages:

7540 [fluss-netty-server-worker-thread-1] WARN  com.alibaba.fluss.server.log.LocalLog [] - Trying to roll a new log segment with start offset 0 =max(provided offset = Optional[0], LEO = 0) while it already exists and is active with size 0., size of offset index: 0.
...
8090 [ReplicaFetcherThread-0-2] ERROR com.alibaba.fluss.server.replica.fetcher.ReplicaFetcherThread [] - Unexpected error occurred while processing data for bucket TableBucket{tableId=0, bucket=0} at offset 0
com.alibaba.fluss.exception.OutOfOrderSequenceException: Out of order batch sequence for writer 0 at offset 0 in table-bucket TableBucket{tableId=0, bucket=0} : 1 (incoming batch seq.), -1 (current batch seq.)
8093 [ReplicaFetcherThread-0-2] ERROR com.alibaba.fluss.server.replica.fetcher.ReplicaFetcherThread [] - Unexpected error occurred while processing data for bucket TableBucket{tableId=0, bucket=0} at offset 0
com.alibaba.fluss.exception.OutOfOrderSequenceException: Out of order batch sequence for writer 0 at offset 0 in table-bucket TableBucket{tableId=0, bucket=0} : 1 (incoming batch seq.), -1 (current batch seq.)
18368 [fluss-scheduler-0-thread-1] INFO  com.alibaba.fluss.server.replica.Replica [] - Shrink ISR From [2, 0, 1] to [2]. Leader: (high watermark: 0, end offset: 1, out of sync replicas: [0, 1])
18636 [coordinator-event-thread] INFO  com.alibaba.fluss.server.zk.ZooKeeperClient [] - Updated LeaderAndIsr{leader=2, leaderEpoch=0, isr=[2], coordinatorEpoch=0, bucketEpoch=1} for bucket TableBucket{tableId=0, bucket=0} in Zookeeper.
18702 [fluss-netty-client(NIO)-12-1] INFO  com.alibaba.fluss.server.replica.Replica [] - ISR updated to [2] and bucket epoch updated to 1 for bucket TableBucket{tableId=0, bucket=0}
...

This fix makes the replica leader broken and switch to a new leader (with empty writer state), that's why the following upserts can work. However, this is not a proper fix.

wuchong · 2024-12-21T14:46:25Z

fluss-server/src/main/java/com/alibaba/fluss/server/log/LogTablet.java

-                        appendInfo.startOffsetOfMaxTimestamp(),
-                        validRecords);
+                // if there are records to append
+                if (appendInfo.lastOffset() >= appendInfo.firstOffset()) {


Skipping writing log has problem, because writer state of replicas is out-of-sync with leader.

Besides, this if condition doesn't take effect, because, this method returned at the beginning as appendInfo.shallowCount() == 0.

wuchong · 2024-12-21T14:54:04Z

fluss-server/src/main/java/com/alibaba/fluss/server/kv/KvTablet.java

-                                    0L,
-                                    0,
-                                    0,
-                                    false);


cc @swuferhong , do you remember what cases to fix when we introduced this?

Besides, could you help to review this PR? How kafka client/server handle the sequence id if the produce messages are empty.

luoyuxia added 2 commits December 19, 2024 15:54

debug

3fc0cc6

[server/kv] Fix out of order exception after delete a not exist row

5efa1b9

wuchong requested changes Dec 21, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[server/kv] Fix out of order exception after delete a not exist row #238

[server/kv] Fix out of order exception after delete a not exist row #238

luoyuxia commented Dec 20, 2024

wuchong left a comment

wuchong Dec 21, 2024

wuchong Dec 21, 2024

-L,
-,
-,
-                                                  false);

[server/kv] Fix out of order exception after delete a not exist row #238

Are you sure you want to change the base?

[server/kv] Fix out of order exception after delete a not exist row #238

Conversation

luoyuxia commented Dec 20, 2024

Purpose

Tests

API and Format

Documentation

wuchong left a comment

Choose a reason for hiding this comment

wuchong Dec 21, 2024

Choose a reason for hiding this comment

wuchong Dec 21, 2024

Choose a reason for hiding this comment