Issues with partition distribution across nodes

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Issues with partition distribution across nodes

Denis Gudtsov
Hello

We have 6-nodes cluster with ring size 128 configured. The problem is that two partitions has replicas only on two nodes rather than three as required (n_val=3). We have tried several times to clean leveldb and ring directories and then rebuild cluster, but this issue is still present.
How can we diagnose where the issue is and fix it? Is there any way how we can assign partition to node manually?

Please find output of member-status below and screen from riak control ring status:
[root@riak01 ~]# riak-admin  member-status
================================= Membership ==================================
Status     Ring    Pending    Node
-------------------------------------------------------------------------------
valid      17.2%      --      'riak@riak01.
valid      17.2%      --      'riak@riak02.
valid      16.4%      --      'riak@riak03.
valid      16.4%      --      'riak@riak04.
valid      16.4%      --      'riak@riak05.
valid      16.4%      --      'riak@riak06.
-------------------------------------------------------------------------------
Valid:6 / Leaving:0 / Exiting:0 / Joining:0 / Down:0



Thank you.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Issues with partition distribution across nodes

Russell Brown-4
Hi,

This is just a quick reply since this is somewhat a current  topic on the ML.

On 24 May 2017, at 12:57, Denis Gudtsov <[hidden email]> wrote:

> Hello
>
> We have 6-nodes cluster with ring size 128 configured. The problem is that
> two partitions has replicas only on two nodes rather than three as required
> (n_val=3). We have tried several times to clean leveldb and ring directories
> and then rebuild cluster, but this issue is still present.

There was a fairly long discussion about this very issue recently (see http://lists.basho.com/pipermail/riak-users_lists.basho.com/2017-May/019281.html)

I ran a little code and the following {RingSize, NodeCount, IsViolated} tuples were the result. If you built any of these clusters from scratch (i.e. you started NodeCount nodes, and used riak-admin cluster join, riak-admin cluster plan, riak-admin cluster commit to create a cluster of NodeCount from scratch) then you have tail violations in your ring.

[{16,3,true},
 {16,5,true},
 {16,7,true},
 {16,13,true},
 {16,14,true},
 {32,3,true},
 {32,5,true},
 {32,6,true},
 {32,10,true},
 {64,3,true},
 {64,7,true},
 {64,9,true},
 {128,3,true},
 {128,5,true},
 {128,6,true},
 {128,7,true},
 {128,9,true},
 {128,14,true},
 {256,3,true},
 {256,5,true},
 {256,11,true},
 {512,3,true},
 {512,5,true},
 {512,6,true},
 {512,7,true},
 {512,10,true}]


> How can we diagnose where the issue is and fix it?

WRT your problem, a quick experiment looks like adding 2 new nodes will solve your problem, just adding one doesn’t look like it does. I tried just adding one new node and still had a single violated preflist, but I have just thrown a little experiment together so I could well be wrong. It doesn’t actually build any clusters, and uses the claim code out of context, ymmv

> Is there any way how we
> can assign partition to node manually?

I don’t know of a way, but that would be very useful.

Do you remember if this cluster was built all at once as a 6-node cluster, or has it grown over time? Have you run the command riak-admin diag ring_preflists as documented here http://docs.basho.com/riak/kv/2.2.3/setup/upgrading/checklist/#confirming-configuration-with-riaknostic?

Sorry I can’t be more help

Cheers

Russell

>
> Please find output of member-status below and screen from riak control ring
> status:
> [root@riak01 ~]# riak-admin  member-status
> ================================= Membership
> ==================================
> Status     Ring    Pending    Node
> -------------------------------------------------------------------------------
> valid      17.2%      --      'riak@riak01.
> valid      17.2%      --      'riak@riak02.
> valid      16.4%      --      'riak@riak03.
> valid      16.4%      --      'riak@riak04.
> valid      16.4%      --      'riak@riak05.
> valid      16.4%      --      'riak@riak06.
> -------------------------------------------------------------------------------
> Valid:6 / Leaving:0 / Exiting:0 / Joining:0 / Down:0
>
> <http://riak-users.197444.n3.nabble.com/file/n4035179/10.png>
>
> Thank you.
>
>
>
> --
> View this message in context: http://riak-users.197444.n3.nabble.com/Issues-with-partition-distribution-across-nodes-tp4035179.html
> Sent from the Riak Users mailing list archive at Nabble.com.
>
> _______________________________________________
> riak-users mailing list
> [hidden email]
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Issues with partition distribution across nodes

Denis Gudtsov
Hi Russell

Thank you for your suggestions. I found diag is saying: "The following preflists do not satisfy the n_val. Please add more nodes". It seems because ring size (128) divided in 6 nodes is hard to arrange.
The history of our cluster is long story, cause we testing it in our lab. Initially it was deployed with 5 nodes without issues. Then it was expanded to 6 nodes, without issues again. After some time all storage space on whole cluster was fully utilized and we had to remove all data from leveldb dir, flush ring dir and rebuild cluster. This has been done by adding all 6 nodes at one time, so this may be the case. We can try to flush cluster data once again and then add nodes one by one (committing cluster change each time), waiting to partition transfer to end each time.

2017-05-24 15:44 GMT+03:00 Russell Brown <[hidden email]>:
Hi,

This is just a quick reply since this is somewhat a current  topic on the ML.

On 24 May 2017, at 12:57, Denis Gudtsov <[hidden email]> wrote:

> Hello
>
> We have 6-nodes cluster with ring size 128 configured. The problem is that
> two partitions has replicas only on two nodes rather than three as required
> (n_val=3). We have tried several times to clean leveldb and ring directories
> and then rebuild cluster, but this issue is still present.

There was a fairly long discussion about this very issue recently (see http://lists.basho.com/pipermail/riak-users_lists.basho.com/2017-May/019281.html)

I ran a little code and the following {RingSize, NodeCount, IsViolated} tuples were the result. If you built any of these clusters from scratch (i.e. you started NodeCount nodes, and used riak-admin cluster join, riak-admin cluster plan, riak-admin cluster commit to create a cluster of NodeCount from scratch) then you have tail violations in your ring.

[{16,3,true},
 {16,5,true},
 {16,7,true},
 {16,13,true},
 {16,14,true},
 {32,3,true},
 {32,5,true},
 {32,6,true},
 {32,10,true},
 {64,3,true},
 {64,7,true},
 {64,9,true},
 {128,3,true},
 {128,5,true},
 {128,6,true},
 {128,7,true},
 {128,9,true},
 {128,14,true},
 {256,3,true},
 {256,5,true},
 {256,11,true},
 {512,3,true},
 {512,5,true},
 {512,6,true},
 {512,7,true},
 {512,10,true}]


> How can we diagnose where the issue is and fix it?

WRT your problem, a quick experiment looks like adding 2 new nodes will solve your problem, just adding one doesn’t look like it does. I tried just adding one new node and still had a single violated preflist, but I have just thrown a little experiment together so I could well be wrong. It doesn’t actually build any clusters, and uses the claim code out of context, ymmv

> Is there any way how we
> can assign partition to node manually?

I don’t know of a way, but that would be very useful.

Do you remember if this cluster was built all at once as a 6-node cluster, or has it grown over time? Have you run the command riak-admin diag ring_preflists as documented here http://docs.basho.com/riak/kv/2.2.3/setup/upgrading/checklist/#confirming-configuration-with-riaknostic?

Sorry I can’t be more help

Cheers

Russell

>
> Please find output of member-status below and screen from riak control ring
> status:
> [root@riak01 ~]# riak-admin  member-status
> ================================= Membership
> ==================================
> Status     Ring    Pending    Node
> -------------------------------------------------------------------------------
> valid      17.2%      --      'riak@riak01.
> valid      17.2%      --      'riak@riak02.
> valid      16.4%      --      'riak@riak03.
> valid      16.4%      --      'riak@riak04.
> valid      16.4%      --      'riak@riak05.
> valid      16.4%      --      'riak@riak06.
> -------------------------------------------------------------------------------
> Valid:6 / Leaving:0 / Exiting:0 / Joining:0 / Down:0
>
> <http://riak-users.197444.n3.nabble.com/file/n4035179/10.png>
>
> Thank you.
>
>
>
> --
> View this message in context: http://riak-users.197444.n3.nabble.com/Issues-with-partition-distribution-across-nodes-tp4035179.html
> Sent from the Riak Users mailing list archive at Nabble.com.
>
> _______________________________________________
> riak-users mailing list
> [hidden email]
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com



_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Issues with partition distribution across nodes

Russell Brown-4

On 24 May 2017, at 15:44, Denis <[hidden email]> wrote:

> Hi Russell
>
> Thank you for your suggestions. I found diag is saying: "The following preflists do not satisfy the n_val. Please add more nodes". It seems because ring size (128) divided in 6 nodes is hard to arrange.
> The history of our cluster is long story, cause we testing it in our lab. Initially it was deployed with 5 nodes without issues. Then it was expanded to 6 nodes, without issues again. After some time all storage space on whole cluster was fully utilized and we had to remove all data from leveldb dir, flush ring dir and rebuild cluster. This has been done by adding all 6 nodes at one time, so this may be the case. We can try to flush cluster data once again and then add nodes one by one (committing cluster change each time), waiting to partition transfer to end each time.

Or add two more nodes in one go, might be quicker, and if time is money, cheaper.

>
> 2017-05-24 15:44 GMT+03:00 Russell Brown <[hidden email]>:
> Hi,
>
> This is just a quick reply since this is somewhat a current  topic on the ML.
>
> On 24 May 2017, at 12:57, Denis Gudtsov <[hidden email]> wrote:
>
> > Hello
> >
> > We have 6-nodes cluster with ring size 128 configured. The problem is that
> > two partitions has replicas only on two nodes rather than three as required
> > (n_val=3). We have tried several times to clean leveldb and ring directories
> > and then rebuild cluster, but this issue is still present.
>
> There was a fairly long discussion about this very issue recently (see http://lists.basho.com/pipermail/riak-users_lists.basho.com/2017-May/019281.html)
>
> I ran a little code and the following {RingSize, NodeCount, IsViolated} tuples were the result. If you built any of these clusters from scratch (i.e. you started NodeCount nodes, and used riak-admin cluster join, riak-admin cluster plan, riak-admin cluster commit to create a cluster of NodeCount from scratch) then you have tail violations in your ring.
>
> [{16,3,true},
>  {16,5,true},
>  {16,7,true},
>  {16,13,true},
>  {16,14,true},
>  {32,3,true},
>  {32,5,true},
>  {32,6,true},
>  {32,10,true},
>  {64,3,true},
>  {64,7,true},
>  {64,9,true},
>  {128,3,true},
>  {128,5,true},
>  {128,6,true},
>  {128,7,true},
>  {128,9,true},
>  {128,14,true},
>  {256,3,true},
>  {256,5,true},
>  {256,11,true},
>  {512,3,true},
>  {512,5,true},
>  {512,6,true},
>  {512,7,true},
>  {512,10,true}]
>
>
> > How can we diagnose where the issue is and fix it?
>
> WRT your problem, a quick experiment looks like adding 2 new nodes will solve your problem, just adding one doesn’t look like it does. I tried just adding one new node and still had a single violated preflist, but I have just thrown a little experiment together so I could well be wrong. It doesn’t actually build any clusters, and uses the claim code out of context, ymmv
>
> > Is there any way how we
> > can assign partition to node manually?
>
> I don’t know of a way, but that would be very useful.
>
> Do you remember if this cluster was built all at once as a 6-node cluster, or has it grown over time? Have you run the command riak-admin diag ring_preflists as documented here http://docs.basho.com/riak/kv/2.2.3/setup/upgrading/checklist/#confirming-configuration-with-riaknostic?
>
> Sorry I can’t be more help
>
> Cheers
>
> Russell
>
> >
> > Please find output of member-status below and screen from riak control ring
> > status:
> > [root@riak01 ~]# riak-admin  member-status
> > ================================= Membership
> > ==================================
> > Status     Ring    Pending    Node
> > -------------------------------------------------------------------------------
> > valid      17.2%      --      'riak@riak01.
> > valid      17.2%      --      'riak@riak02.
> > valid      16.4%      --      'riak@riak03.
> > valid      16.4%      --      'riak@riak04.
> > valid      16.4%      --      'riak@riak05.
> > valid      16.4%      --      'riak@riak06.
> > -------------------------------------------------------------------------------
> > Valid:6 / Leaving:0 / Exiting:0 / Joining:0 / Down:0
> >
> > <http://riak-users.197444.n3.nabble.com/file/n4035179/10.png>
> >
> > Thank you.
> >
> >
> >
> > --
> > View this message in context: http://riak-users.197444.n3.nabble.com/Issues-with-partition-distribution-across-nodes-tp4035179.html
> > Sent from the Riak Users mailing list archive at Nabble.com.
> >
> > _______________________________________________
> > riak-users mailing list
> > [hidden email]
> > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>


_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Loading...