Riak One partition handoff stall

Nicholas Adams nicholas.adams at tiot.jp
Mon May 28 09:41:35 EDT 2018


Dear Gaurav,

Standard troubleshooting – stalled handoffs can often be fixed by “riak-admin transfer limit 0” to stop all transfers and once you have confirmed that all transfers have stopped, run “riak-admin transfer limit 2” to set it back to the default value.

Another one you might want to investigate is repairing the VNode you list. For Riak KV 1.4.12, you would refer to the steps listed in http://docs.basho.com/riak/1.4.12/ops/running/recovery/repairing-partitions/#Running-a-Repair under Repairing a Single Partition and substituting in the VNode value you have below.

From my work as a CSE under Basho originally and now under TI Tokyo, can I ask why you are regularly getting nodes to leave the cluster? This is not common practice in production environments.

Finally, Riak KV 1.4.12 has been obsolete for quite a few years, I would strongly recommend that you update to LTS status Riak KV 2.0.9 as that is supported as a direct upgrade from 1.4.12 – see https://docs.basho.com/riak/kv/2.0.9/setup/upgrading/ for details. Once on the 2.0.x series, you can then look at a further upgrade to the 2.2.x series should you so wish.

Hope this helps,

Nicholas

From: riak-users <riak-users-bounces at lists.basho.com> On Behalf Of Gaurav Sood
Sent: 28 May 2018 22:11
To: Bryan Hunt <bryan.hunt at erlang-solutions.com>
Cc: riak-users at lists.basho.com
Subject: Re: Riak One partition handoff stall

Thanks Bryan

Below is the ouput of command riak-admin vnode_status. May be data transfer has stopped on the claimant node.

Output of all commands is constant.

1)

 VNode: 342539446249430371453988632667878832731859189760
Backend: riak_kv_eleveldb_backend
Status:
[{stats,<<"                               Compactions\nLevel  Files Size(MB) Time(sec) Read(MB) Write(MB)\n--------------------------------------------------\n  0        1        0         0        0         0\n">>},
 {read_block_error,<<"0">>},
 {fixed_indexes,true}]


2) 30GB data per server
4) I am not sure about the number of objects. Is there any way to get the count of objects.

On Mon, May 28, 2018 at 4:57 PM, Bryan Hunt <bryan.hunt at erlang-solutions.com<mailto:bryan.hunt at erlang-solutions.com>> wrote:
Are you constantly executing a particular riak command, in your system monitoring scripts, for example: `riak-admin vnode-status` ?

What size is your data per server ?

How many objects are you storing ?

---
Erlang Solutions cares about your data and privacy; please find all details about the basis for communicating with you and the way we process your data in our Privacy Policy.You can update your email preferences or opt-out from receiving Marketing emails here.


On 28 May 2018, at 08:29, Gaurav Sood <gaurav.sood at mediologysoftware.com<mailto:gaurav.sood at mediologysoftware.com>> wrote:

Hi All - Good Day!

I have a 7 Node Raik_KV cluster. Recently I have upgraded this cluster from 1.4.2  to 1.4.12 on Ubuntu 16.04. After upgrading the cluster whenever I leave a node from cluster one partition hand off stalled every time & Active transfers shows 'waiting to handoff 1 partitions", to complete this process I need to reboot the riak service on all nodes one by one.

I am not sure if it's configuration problem. Here is the current state of cluster.

#output of riak-admin member-status
================================= Membership ==================================
Status     Ring    Pending    Node
-------------------------------------------------------------------------------
leaving     0.0%      --      'riak at 192.168.2.10<mailto:riak at 192.168.2.10>'
valid      14.1%      --      'riak at 192.168.2.11<mailto:riak at 192.168.2.11>'
valid      14.1%      --      'riak at 192.168.2.12<mailto:riak at 192.168.2.12>'
valid      15.6%      --      'riak at 192.168.2.13<mailto:riak at 192.168.2.13>'
valid      14.1%      --      'riak at 192.168.2.14<mailto:riak at 192.168.2.14>'
valid      14.1%      --      'riak at 192.168.2.15<mailto:riak at 192.168.2.15>'
valid      14.1%      --      'riak at 192.168.2.16<mailto:riak at 192.168.2.16>'
valid      14.1%      --      'riak at 192.168.2.17<mailto:riak at 192.168.2.17>'
-------------------------------------------------------------------------------
Valid:7 / Leaving:1 / Exiting:0 / Joining:0 / Down:0
#output of riak-admin transfers

'riak at 192.168.2.10<mailto:riak at 192.168.2.10>' waiting to handoff 1 partitions

Active Transfers:

(nothing here)


#Output of riak-admin ring_status
================================== Claimant ===================================
Claimant:  'riak at 192.168.2.10<mailto:riak at 192.168.2.10>'
Status:     up
Ring Ready: true

============================== Ownership Handoff ==============================
No pending changes.

============================== Unreachable Nodes ==============================
All nodes are up and reachable

current Transfer Limit is 2.

Thanks
Gaurav
_______________________________________________
riak-users mailing list
riak-users at lists.basho.com<mailto:riak-users at lists.basho.com>
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20180528/84e08895/attachment.html>


More information about the riak-users mailing list