Riak One partition handoff stall

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Riak One partition handoff stall

Gaurav Sood
Hi All - Good Day!

I have a 7 Node Raik_KV cluster. Recently I have upgraded this cluster from 1.4.2  to 1.4.12 on Ubuntu 16.04. After upgrading the cluster whenever I leave a node from cluster one partition hand off stalled every time & Active transfers shows 'waiting to handoff 1 partitions", to complete this process I need to reboot the riak service on all nodes one by one.

I am not sure if it's configuration problem. Here is the current state of cluster.

#output of riak-admin member-status
================================= Membership ==================================
Status     Ring    Pending    Node
-------------------------------------------------------------------------------
leaving     0.0%      --      '[hidden email]'
valid      14.1%      --      '[hidden email]'
valid      14.1%      --      '[hidden email]'
valid      15.6%      --      '[hidden email]'
valid      14.1%      --      '[hidden email]'
valid      14.1%      --      '[hidden email]'
valid      14.1%      --      '[hidden email]'
valid      14.1%      --      '[hidden email]'
-------------------------------------------------------------------------------
Valid:7 / Leaving:1 / Exiting:0 / Joining:0 / Down:0

#output of riak-admin transfers

'[hidden email]' waiting to handoff 1 partitions

Active Transfers:

(nothing here)


#Output of riak-admin ring_status
================================== Claimant ===================================
Claimant:  '[hidden email]'
Status:     up
Ring Ready: true

============================== Ownership Handoff ==============================
No pending changes.

============================== Unreachable Nodes ==============================
All nodes are up and reachable

current Transfer Limit is 2.

Thanks
Gaurav

_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|

Re: Riak One partition handoff stall

Bryan Hunt-3
Are you constantly executing a particular riak command, in your system monitoring scripts, for example: `riak-admin vnode-status` ? 

What size is your data per server ? 

How many objects are you storing ? 

---
Erlang Solutions cares about your data and privacy; please find all details about the basis for communicating with you and the way we process your data in our Privacy Policy.You can update your email preferences or opt-out from receiving Marketing emails here.

On 28 May 2018, at 08:29, Gaurav Sood <[hidden email]> wrote:

Hi All - Good Day!

I have a 7 Node Raik_KV cluster. Recently I have upgraded this cluster from 1.4.2  to 1.4.12 on Ubuntu 16.04. After upgrading the cluster whenever I leave a node from cluster one partition hand off stalled every time & Active transfers shows 'waiting to handoff 1 partitions", to complete this process I need to reboot the riak service on all nodes one by one.

I am not sure if it's configuration problem. Here is the current state of cluster.

#output of riak-admin member-status
================================= Membership ==================================
Status     Ring    Pending    Node
-------------------------------------------------------------------------------
leaving     0.0%      --      '[hidden email]'
valid      14.1%      --      '[hidden email]'
valid      14.1%      --      '[hidden email]'
valid      15.6%      --      '[hidden email]'
valid      14.1%      --      '[hidden email]'
valid      14.1%      --      '[hidden email]'
valid      14.1%      --      '[hidden email]'
valid      14.1%      --      '[hidden email]'
-------------------------------------------------------------------------------
Valid:7 / Leaving:1 / Exiting:0 / Joining:0 / Down:0

#output of riak-admin transfers

'[hidden email]' waiting to handoff 1 partitions

Active Transfers:

(nothing here)


#Output of riak-admin ring_status
================================== Claimant ===================================
Claimant:  '[hidden email]'
Status:     up
Ring Ready: true

============================== Ownership Handoff ==============================
No pending changes.

============================== Unreachable Nodes ==============================
All nodes are up and reachable

current Transfer Limit is 2.

Thanks
Gaurav
_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|

Re: Riak One partition handoff stall

Gaurav Sood
Thanks Bryan

Below is the ouput of command riak-admin vnode_status. May be data transfer has stopped on the claimant node.

Output of all commands is constant.

1)

 VNode: 342539446249430371453988632667878832731859189760
Backend: riak_kv_eleveldb_backend
Status:
[{stats,<<"                               Compactions\nLevel  Files Size(MB) Time(sec) Read(MB) Write(MB)\n--------------------------------------------------\n  0        1        0         0        0         0\n">>},
 {read_block_error,<<"0">>},
 {fixed_indexes,true}]


2) 30GB data per server
4) I am not sure about the number of objects. Is there any way to get the count of objects.

On Mon, May 28, 2018 at 4:57 PM, Bryan Hunt <[hidden email]> wrote:
Are you constantly executing a particular riak command, in your system monitoring scripts, for example: `riak-admin vnode-status` ? 

What size is your data per server ? 

How many objects are you storing ? 

---
Erlang Solutions cares about your data and privacy; please find all details about the basis for communicating with you and the way we process your data in our Privacy Policy.You can update your email preferences or opt-out from receiving Marketing emails here.

On 28 May 2018, at 08:29, Gaurav Sood <[hidden email]> wrote:

Hi All - Good Day!

I have a 7 Node Raik_KV cluster. Recently I have upgraded this cluster from 1.4.2  to 1.4.12 on Ubuntu 16.04. After upgrading the cluster whenever I leave a node from cluster one partition hand off stalled every time & Active transfers shows 'waiting to handoff 1 partitions", to complete this process I need to reboot the riak service on all nodes one by one.

I am not sure if it's configuration problem. Here is the current state of cluster.

#output of riak-admin member-status
================================= Membership ==================================
Status     Ring    Pending    Node
-------------------------------------------------------------------------------
leaving     0.0%      --      '[hidden email]'
valid      14.1%      --      '[hidden email]'
valid      14.1%      --      '[hidden email]'
valid      15.6%      --      '[hidden email]'
valid      14.1%      --      '[hidden email]'
valid      14.1%      --      '[hidden email]'
valid      14.1%      --      '[hidden email]'
valid      14.1%      --      '[hidden email]'
-------------------------------------------------------------------------------
Valid:7 / Leaving:1 / Exiting:0 / Joining:0 / Down:0

#output of riak-admin transfers

'[hidden email]' waiting to handoff 1 partitions

Active Transfers:

(nothing here)


#Output of riak-admin ring_status
================================== Claimant ===================================
Claimant:  '[hidden email]'
Status:     up
Ring Ready: true

============================== Ownership Handoff ==============================
No pending changes.

============================== Unreachable Nodes ==============================
All nodes are up and reachable

current Transfer Limit is 2.

Thanks
Gaurav
_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com



_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|

RE: Riak One partition handoff stall

Nicholas Adams

Dear Gaurav,

 

Standard troubleshooting – stalled handoffs can often be fixed by “riak-admin transfer limit 0” to stop all transfers and once you have confirmed that all transfers have stopped, run “riak-admin transfer limit 2” to set it back to the default value.

 

Another one you might want to investigate is repairing the VNode you list. For Riak KV 1.4.12, you would refer to the steps listed in http://docs.basho.com/riak/1.4.12/ops/running/recovery/repairing-partitions/#Running-a-Repair under Repairing a Single Partition and substituting in the VNode value you have below.

 

From my work as a CSE under Basho originally and now under TI Tokyo, can I ask why you are regularly getting nodes to leave the cluster? This is not common practice in production environments.

 

Finally, Riak KV 1.4.12 has been obsolete for quite a few years, I would strongly recommend that you update to LTS status Riak KV 2.0.9 as that is supported as a direct upgrade from 1.4.12 – see https://docs.basho.com/riak/kv/2.0.9/setup/upgrading/ for details. Once on the 2.0.x series, you can then look at a further upgrade to the 2.2.x series should you so wish.

 

Hope this helps,

 

Nicholas

 

From: riak-users <[hidden email]> On Behalf Of Gaurav Sood
Sent: 28 May 2018 22:11
To: Bryan Hunt <[hidden email]>
Cc: [hidden email]
Subject: Re: Riak One partition handoff stall

 

Thanks Bryan

 

Below is the ouput of command riak-admin vnode_status. May be data transfer has stopped on the claimant node.

 

Output of all commands is constant.

 

1)

 

 VNode: 342539446249430371453988632667878832731859189760
Backend: riak_kv_eleveldb_backend
Status:
[{stats,<<"                               Compactions\nLevel  Files Size(MB) Time(sec) Read(MB) Write(MB)\n--------------------------------------------------\n  0        1        0         0        0         0\n">>},
 {read_block_error,<<"0">>},
 {fixed_indexes,true}]

 

 

2) 30GB data per server

4) I am not sure about the number of objects. Is there any way to get the count of objects.

 

On Mon, May 28, 2018 at 4:57 PM, Bryan Hunt <[hidden email]> wrote:

Are you constantly executing a particular riak command, in your system monitoring scripts, for example: `riak-admin vnode-status` ? 

 

What size is your data per server ? 

 

How many objects are you storing ? 

 

---

Erlang Solutions cares about your data and privacy; please find all details about the basis for communicating with you and the way we process your data in our Privacy Policy.You can update your email preferences or opt-out from receiving Marketing emails here.



On 28 May 2018, at 08:29, Gaurav Sood <[hidden email]> wrote:

 

Hi All - Good Day!

 

I have a 7 Node Raik_KV cluster. Recently I have upgraded this cluster from 1.4.2  to 1.4.12 on Ubuntu 16.04. After upgrading the cluster whenever I leave a node from cluster one partition hand off stalled every time & Active transfers shows 'waiting to handoff 1 partitions", to complete this process I need to reboot the riak service on all nodes one by one.

 

I am not sure if it's configuration problem. Here is the current state of cluster.

 

#output of riak-admin member-status

================================= Membership ==================================
Status     Ring    Pending    Node
-------------------------------------------------------------------------------
leaving     0.0%      --      '[hidden email]'
valid      14.1%      --      '[hidden email]'
valid      14.1%      --      '[hidden email]'
valid      15.6%      --      '[hidden email]'
valid      14.1%      --      '[hidden email]'
valid      14.1%      --      '[hidden email]'
valid      14.1%      --      '[hidden email]'
valid      14.1%      --      '[hidden email]'
-------------------------------------------------------------------------------
Valid:7 / Leaving:1 / Exiting:0 / Joining:0 / Down:0

#output of riak-admin transfers

 

'[hidden email]' waiting to handoff 1 partitions

Active Transfers:

 

(nothing here)

 

 

#Output of riak-admin ring_status

================================== Claimant ===================================
Claimant:  '[hidden email]'
Status:     up
Ring Ready: true

============================== Ownership Handoff ==============================
No pending changes.

============================== Unreachable Nodes ==============================
All nodes are up and reachable

 

current Transfer Limit is 2.

 

Thanks

Gaurav

_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

 

 


_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com