-
Notifications
You must be signed in to change notification settings - Fork 393
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
riak_core_handoff_sender {badrecord,ho_acc} #1010
Comments
Hard to know - but the code is running a fold over the vnode and sending to the remote vnode at the time ... but in the midst of the logs is a reference to |
CentOS Linux release 7.9.2009 (Core) 3.10.0-1160.81.1.el7.x86_64 #1 SMP Fri Dec 16 17:29:43 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux HW and OS problems not detected. It's a recurring problem only for one node and only for one partition. Temporary resolve - I was delete this partition from backends on this node to use replication for restore data. Joining new node to cluster was finished. No active transfers. |
Perhaps there was a corrupted block on the disk within that vnode backend causing an IO error when the fold tried to read a particular part of the keyspace? Deleting the partition would then allow the handoff to complete. You should be able to run partition repair to restore data with relative efficiency - https://docs.riak.com/riak/kv/latest/using/repair-recovery/repairs/index.html#repairing-a-single-partition. |
I tried to start repair, but console return message to me "change ownership" and no effect. |
Hi, please help. I can't find reson fo broken handoff between old node riak@172.16.3.92 to new node riak@172.16.3.63.
In log I see error "failed because of error:{badrecord,ho_acc}" and which node has the problem is unclear.
versions
riak_kv 2.1.1
n_val = 3,
nodes is 40
console.log
2023-09-13 10:48:55.880 [info] <0.25322.2>@riak_core_handoff_sender:start_fold:153 Starting ownership transfer of riak_kv_vnode from 'riak@172.16.3.92' 52808164630 1205155991565808696313200461616250880 to 'riak@172.16.3.63' 528081646301205155991565808696313200461616250880 2023-09-13 10:57:13.707 [info] <0.1490.1>@riak_kv_index_hashtree:clear_tree:842 Clearing AAE tree: 625134489405210427903529254618878869735643021312 2023-09-13 10:57:14.659 [info] <0.30711.2>@riak_kv_index_hashtree:build_or_rehash:880 Starting AAE tree build: 625134489405210427903529254618878869735643021312 2023-09-13 11:48:20.321 [error] <0.4371.0> gen_server <0.4371.0> terminated with reason: no function clause matching riak_core_handoff_sender:visit_item({<<48,98,5 8,181,35,123,106,120,146,164,42,74,122,96,253,251,232,40,185>>,<<175,18,82,141,135,102,...>>}, <<53,1,0,0,0,38,131,108,0,0,0,1,104,2,109,0,0,0,12,100,4,162,222,89, 118,233,214,0,0,43,103,104,...>>, {error,eio}) line 296 2023-09-13 11:48:20.321 [error] <0.4371.0> CRASH REPORT Process <0.4371.0> with 0 neighbours exited with reason: no function clause matching riak_core_handoff_send er:visit_item({<<48,98,58,181,35,123,106,120,146,164,42,74,122,96,253,251,232,40,185>>,<<175,18,82,141,135,102,...>>}, <<53,1,0,0,0,38,131,108,0,0,0,1,104,2,109,0, 0,0,12,100,4,162,222,89,118,233,214,0,0,43,103,104,...>>, {error,eio}) line 296 in gen_server:terminate/6 line 744 2023-09-13 11:48:20.322 [error] <0.4368.0> Supervisor {<0.4368.0>,poolboy_sup} had child riak_core_vnode_worker started with {riak_core_vnode_worker,start_link,undefined} at <0.4371.0> exit with reason no function clause matching riak_core_handoff_sender:visit_item({<<48,98,58,181,35,123,106,120,146,164,42,74,122,96,253,251,232,40,185>>,<<175,18,82,141,135,102,...>>}, <<53,1,0,0,0,38,131,108,0,0,0,1,104,2,109,0,0,0,12,100,4,162,222,89,118,233,214,0,0,43,103,104,...>>, {error,eio}) line 296 in context child_terminated 2023-09-13 11:48:20.323 [error] <0.25322.2>@riak_core_handoff_sender:start_fold:282 ownership transfer of riak_kv_vnode from 'riak@172.16.3.92' 528081646301205155991565808696313200461616250880 to 'riak@172.16.3.63' 528081646301205155991565808696313200461616250880 failed because of error:{badrecord,ho_acc} [{riak_core_handoff_sender,start_fold,5,[{file,"src/riak_core_handoff_sender.erl"},{line,204}]}]
error.log
2023-09-13 10:48:48.097 [error] <0.6963.0>@riak_core_handoff_sender:start_fold:282 ownership transfer of riak_kv_vnode from 'riak@172.16.3.92' 528081646301205155991565808696313200461616250880 to 'riak@172.16.3.63' 528081646301205155991565808696313200461616250880 failed because of error:{badrecord,ho_acc} [{riak_core_handoff_sender,start_fold,5,[{file,"src/riak_core_handoff_sender.erl"},{line,204}]}] 2023-09-13 11:48:20.321 [error] <0.4371.0> gen_server <0.4371.0> terminated with reason: no function clause matching riak_core_handoff_sender:visit_item({<<48,98,58,181,35,123,106,120,146,164,42,74,122,96,253,251,232,40,185>>,<<175,18,82,141,135,102,...>>}, <<53,1,0,0,0,38,131,108,0,0,0,1,104,2,109,0,0,0,12,100,4,162,222,89,118,233,214,0,0,43,103,104,...>>, {error,eio}) line 296 2023-09-13 11:48:20.321 [error] <0.4371.0> CRASH REPORT Process <0.4371.0> with 0 neighbours exited with reason: no function clause matching riak_core_handoff_sender:visit_item({<<48,98,58,181,35,123,106,120,146,164,42,74,122,96,253,251,232,40,185>>,<<175,18,82,141,135,102,...>>}, <<53,1,0,0,0,38,131,108,0,0,0,1,104,2,109,0,0,0,12,100,4,162,222,89,118,233,214,0,0,43,103,104,...>>, {error,eio}) line 296 in gen_server:terminate/6 line 744 2023-09-13 11:48:20.322 [error] <0.4368.0> Supervisor {<0.4368.0>,poolboy_sup} had child riak_core_vnode_worker started with {riak_core_vnode_worker,start_link,undefined} at <0.4371.0> exit with reason no function clause matching riak_core_handoff_sender:visit_item({<<48,98,58,181,35,123,106,120,146,164,42,74,122,96,253,251,232,40,185>>,<<175,18,82,141,135,102,...>>}, <<53,1,0,0,0,38,131,108,0,0,0,1,104,2,109,0,0,0,12,100,4,162,222,89,118,233,214,0,0,43,103,104,...>>, {error,eio}) line 296 in context child_terminated 2023-09-13 11:48:20.323 [error] <0.25322.2>@riak_core_handoff_sender:start_fold:282 ownership transfer of riak_kv_vnode from 'riak@172.16.3.92' 528081646301205155991565808696313200461616250880 to 'riak@172.16.3.63' 528081646301205155991565808696313200461616250880 failed because of error:{badrecord,ho_acc} [{riak_core_handoff_sender,start_fold,5,[{file,"src/riak_core_handoff_sender.erl"},{line,204}]}] 2023-09-13 12:16:23.882 [error] <0.170.0> Supervisor riak_core_handoff_sender_sup had child riak_core_handoff_sender started with {riak_core_handoff_sender,start_link,undefined} at <0.2844.4> exit with reason max_concurrency in context child_terminated
crash.log
2023-09-13 11:48:20 =CRASH REPORT==== crasher: initial call: riak_core_vnode_worker:init/1 pid: <0.4371.0> registered_name: [] exception exit: {{function_clause,[{riak_core_handoff_sender,visit_item,[{<<48,98,58,181,35,123,106,120,146,164,42,74,122,96,253,251,232,40,185>>,<<175,18,82,1 41,135,102,76,129,178,206,178,15,200,127,187,40,0,0,0,0>>},<<53,1,0,0,0,38,131,108,0,0,0,1,104,2,109,0,0,0,12,100,4,162,222,89,118,233,214,0,0,43,103,104,2,97,1,11 ,48,51,46,50,48,50,50,45,...>>,...],...},...]},...} ancestors: [<0.4368.0>,<0.4367.0>,<0.4366.0>,<0.1819.0>,riak_core_vnode_sup,riak_core_sup,<0.161.0>] messages: [] links: [<0.4367.0>,<0.4368.0>,<0.93.0>] dictionary: [{bitcask_file_mod,bitcask_file},{bitcask_time_fudge,no_testing},{is_visit_item_timer_set,true}] trap_exit: false status: running heap_size: 318187 stack_size: 27 reductions: 37404224106 neighbours: 2023-09-13 11:48:20 =SUPERVISOR REPORT==== Supervisor: {<0.4368.0>,poolboy_sup} Context: child_terminated Reason: {function_clause,[{riak_core_handoff_sender,visit_item,[{<<48,98,58,181,35,123,106,120,146,164,42,74,122,96,253,251,232,40,185>>,<<175,18,82,141,135,102,76,129,178,206,178,15,200,127,187,40,0,0,0,0>>},<<53,1,0,0,0,38,131,108,0,0,0,1,104,2,109,0,0,0,12,100,4,162,222,89,118,233,214,0,0,43,103,104,2,97,1,110,5, 51,46,50,48,50,50,45,...>>,...],...},...]} Offender: [{pid,<0.4371.0>},{name,riak_core_vnode_worker},{mfargs,{riak_core_vnode_worker,start_link,undefined}},{restart_type,temporary},{shutdown,5000},{child_type,worker}]
The text was updated successfully, but these errors were encountered: