Uneven distribution of partitions in RIAK cluster

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Uneven distribution of partitions in RIAK cluster

rumcho
I have a 5-node cluster with 12 partitions in 4 of the nodes and 16 partitions in node #5. That is causing dangerously high disk utilization in that node. I plowed thru the documentation and Googled the hell out of it but I can’t find info on how rebalance the extra 4 partitions on the 4 underutilized nodes. The docs say the cluster balances itself but that’s apparently not the case here. Can anyone give any suggestions?
I run RIAK version 1.4.8 on Linux kernel 3.13
Ray

_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Uneven distribution of partitions in RIAK cluster

Drew Pirrone-Brusse
Hi Ray,

Riak's partition distribution is automatically calculated using our nondeterministic `claim` algorithm. That system is able to re-balance clusters, but is typically only run during membership operations; joining, leaving, or replacing nodes. The uneven partition distribution won't self-heal unless you add a new node to this cluster.

We can force a re-balance of this sort of uneven distribution by temporarily switching from `claim_v2` to `claim_v3`, and triggering a membership recalculation. `claim_v3` is still an experimental system that is much more aggressive about avoiding preflist violations and lumpy claims, without much regard for limiting the scope of membership changes. With `claim_v2`, the addition of a new node to an existing cluster will almost always only involve moving partitions off of existing nodes and onto the new node. With `claim_v3`, it's somewhat common to see partitions also being moved between existing partitions in order to prevent lumpy claims.

These unpredictable spikes in membership changes have caused serious problems for our customers in the past, and they are nearly impossible to plan for, so we don't advise using `claim_v3` for the majority of operations.

To enable `claim_v3` and trigger a re-balance of the ring,

1. Enable the use of `claim_v3` by opening a `riak attach` session on any node in this cluster, and running the below snippets,

    rpc:multicall(application, set_env, [riak_core, wants_claim_fun, {riak_core_claim, wants_claim_v3}]).
    rpc:multicall(application, set_env, [riak_core, choose_claim_fun, {riak_core_claim, choose_claim_v3}]).

(Please note, the `.`s are syntactically significant in Erlang, and you can exit `attach` sessions with `ctrl+g, q, enter`.)

2. Determine which node is currently the Claimant by running `riak-admin ring-status` on any node in the cluster. Look for the line similar to `Claimant: '[hidden email]'`.

3. Stop the claimant. In this case I would run `riak stop` on [hidden email].

4. Trigger the election of a new claimant by marking the current claimant DOWN in the ring. In this case, I would run `riak-admin down [hidden email]` on any active node in this cluster.

5. Verify the reelection with `riak-admin ring-status` (checking to make sure the claimant has changed), and restart the node that was previously stopped.

At this time the rebalance should have occurred and membership transfers started.

6. To disable `claim_v3`, open another `riak attach` session on any node in this cluster, and run the below snippets,

    rpc:multicall(application, set_env, [riak_core, wants_claim_fun, {riak_core_claim, default_wants_claim}]).
    rpc:multicall(application, set_env, [riak_core, choose_claim_fun, {riak_core_claim, default_choose_claim}]).

This can be done while the transfers are in-flight. The new plan will have already been injected into the ring.

I hope this helps.
Best regards,
-Drew

On Fri, Nov 11, 2016 at 2:13 PM, Semov, Raymond <[hidden email]> wrote:
I have a 5-node cluster with 12 partitions in 4 of the nodes and 16 partitions in node #5. That is causing dangerously high disk utilization in that node. I plowed thru the documentation and Googled the hell out of it but I can’t find info on how rebalance the extra 4 partitions on the 4 underutilized nodes. The docs say the cluster balances itself but that’s apparently not the case here. Can anyone give any suggestions?
I run RIAK version 1.4.8 on Linux kernel 3.13
Ray

_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com



_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Uneven distribution of partitions in RIAK cluster

rumcho
Drew,
Thank you for the response! I would love to consider the claim_v2 -> claim_v3 but since it’s experimental I’d rather not, I’m dealing with a RIAK cluster that is in production.
What I will end up doing is (after our team cleans up all the junk in the cluster) have a node leave the cluster and then rejoin. That’ll fix the fragmentation that will happen after the old data purge as well.

From: Drew Pirrone-Brusse <[hidden email]>
Date: Monday, November 14, 2016 at 10:05 AM
To: "Semov, Raymond" <[hidden email]>
Cc: "[hidden email]" <[hidden email]>
Subject: Re: Uneven distribution of partitions in RIAK cluster

Hi Ray,

Riak's partition distribution is automatically calculated using our nondeterministic `claim` algorithm. That system is able to re-balance clusters, but is typically only run during membership operations; joining, leaving, or replacing nodes. The uneven partition distribution won't self-heal unless you add a new node to this cluster.

We can force a re-balance of this sort of uneven distribution by temporarily switching from `claim_v2` to `claim_v3`, and triggering a membership recalculation. `claim_v3` is still an experimental system that is much more aggressive about avoiding preflist violations and lumpy claims, without much regard for limiting the scope of membership changes. With `claim_v2`, the addition of a new node to an existing cluster will almost always only involve moving partitions off of existing nodes and onto the new node. With `claim_v3`, it's somewhat common to see partitions also being moved between existing partitions in order to prevent lumpy claims.

These unpredictable spikes in membership changes have caused serious problems for our customers in the past, and they are nearly impossible to plan for, so we don't advise using `claim_v3` for the majority of operations.

To enable `claim_v3` and trigger a re-balance of the ring,

1. Enable the use of `claim_v3` by opening a `riak attach` session on any node in this cluster, and running the below snippets,

    rpc:multicall(application, set_env, [riak_core, wants_claim_fun, {riak_core_claim, wants_claim_v3}]).
    rpc:multicall(application, set_env, [riak_core, choose_claim_fun, {riak_core_claim, choose_claim_v3}]).

(Please note, the `.`s are syntactically significant in Erlang, and you can exit `attach` sessions with `ctrl+g, q, enter`.)

2. Determine which node is currently the Claimant by running `riak-admin ring-status` on any node in the cluster. Look for the line similar to `Claimant: '[hidden email]'`.

3. Stop the claimant. In this case I would run `riak stop` on [hidden email].

4. Trigger the election of a new claimant by marking the current claimant DOWN in the ring. In this case, I would run `riak-admin down [hidden email]` on any active node in this cluster.

5. Verify the reelection with `riak-admin ring-status` (checking to make sure the claimant has changed), and restart the node that was previously stopped.

At this time the rebalance should have occurred and membership transfers started.

6. To disable `claim_v3`, open another `riak attach` session on any node in this cluster, and run the below snippets,

    rpc:multicall(application, set_env, [riak_core, wants_claim_fun, {riak_core_claim, default_wants_claim}]).
    rpc:multicall(application, set_env, [riak_core, choose_claim_fun, {riak_core_claim, default_choose_claim}]).

This can be done while the transfers are in-flight. The new plan will have already been injected into the ring.

I hope this helps.
Best regards,
-Drew

On Fri, Nov 11, 2016 at 2:13 PM, Semov, Raymond <[hidden email]> wrote:
I have a 5-node cluster with 12 partitions in 4 of the nodes and 16 partitions in node #5. That is causing dangerously high disk utilization in that node. I plowed thru the documentation and Googled the hell out of it but I can’t find info on how rebalance the extra 4 partitions on the 4 underutilized nodes. The docs say the cluster balances itself but that’s apparently not the case here. Can anyone give any suggestions?
I run RIAK version 1.4.8 on Linux kernel 3.13
Ray

_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com



_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Loading...