Testing netsplit in Riak

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Testing netsplit in Riak

Andrey Ershov
Hi, guys!

I'm testing netsplit in Riak and can not achieve satisfiable behaviour.
I've just two nodes cluster and bucket with the following settings n=3, w=2, r=2. And I have just a couple of entries.
Basically I have two problems:
1) After the split, writes on one side of the partition start lagging hard. It takes more than 1 minute for the first write to be become successful. I understand that this is related to the process of setting up backup vnodes in Riak, but is any way to speed up the process?  Which configuration parameters influence that?
2) More weird problem is after netsplit. "riak-admin transfers" command immediately reports that there should 5 partition transfers from one node to another and 5 partition transfers in the opposite direction. But active transfers output is empty!
I've put a watch on this command and active transfers are always empty. 
Finally, it takes several minutes for Riak to finish hinted handoff. Several minutes just for several keys!
What Riak is doing all this time? Anyway to speed up the process?
3) The reason why I'm concerned about hinted-handoff speed is because, I noticed that until this process finishes, I read stale data on both sides of ex-netsplit.


--
Thanks,
Andrey

_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|

Re: Testing netsplit in Riak

Alexander Sicular-2
1. Check the erlang vm variable "nettick", I believe. 

2. Hinted handoff resource allocation are configurable via config file or at runtime. 

On Wed, Feb 22, 2017 at 12:07 Andrey Ershov <[hidden email]> wrote:
Hi, guys!

I'm testing netsplit in Riak and can not achieve satisfiable behaviour.
I've just two nodes cluster and bucket with the following settings n=3, w=2, r=2. And I have just a couple of entries.
Basically I have two problems:
1) After the split, writes on one side of the partition start lagging hard. It takes more than 1 minute for the first write to be become successful. I understand that this is related to the process of setting up backup vnodes in Riak, but is any way to speed up the process?  Which configuration parameters influence that?
2) More weird problem is after netsplit. "riak-admin transfers" command immediately reports that there should 5 partition transfers from one node to another and 5 partition transfers in the opposite direction. But active transfers output is empty!
I've put a watch on this command and active transfers are always empty. 
Finally, it takes several minutes for Riak to finish hinted handoff. Several minutes just for several keys!
What Riak is doing all this time? Anyway to speed up the process?
3) The reason why I'm concerned about hinted-handoff speed is because, I noticed that until this process finishes, I read stale data on both sides of ex-netsplit.


--
Thanks,
Andrey
_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
--


Alexander Sicular
Solutions Architect
Basho Technologies
9175130679
@siculars

_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|

Re: Testing netsplit in Riak

Andrey Ershov
Alexander, thanks for your reply!

1) I've set erlang.distribution.nettick_time to 1 second and writes after netsplit are very fast now. So this point is resolved. Do you know how this parameter affects false positive ratio? Riak docs stay that every nettick_time seconds netkernal will initiate remote processes life-checking. However, it does not say anything about the mechanism. Do you know how this failure detector works?
2) As for hinted handoff, I still can not find any solution. Variables that I've tried to change:
   - vnode_management_timer from 10s to 1s
   - transfer_limit from 2 to 100
But still transfer take about a minute. Any other variables that I should take a look at?

2017-02-22 21:12 GMT+03:00 Alexander Sicular <[hidden email]>:
1. Check the erlang vm variable "nettick", I believe. 

2. Hinted handoff resource allocation are configurable via config file or at runtime. 

On Wed, Feb 22, 2017 at 12:07 Andrey Ershov <[hidden email]> wrote:
Hi, guys!

I'm testing netsplit in Riak and can not achieve satisfiable behaviour.
I've just two nodes cluster and bucket with the following settings n=3, w=2, r=2. And I have just a couple of entries.
Basically I have two problems:
1) After the split, writes on one side of the partition start lagging hard. It takes more than 1 minute for the first write to be become successful. I understand that this is related to the process of setting up backup vnodes in Riak, but is any way to speed up the process?  Which configuration parameters influence that?
2) More weird problem is after netsplit. "riak-admin transfers" command immediately reports that there should 5 partition transfers from one node to another and 5 partition transfers in the opposite direction. But active transfers output is empty!
I've put a watch on this command and active transfers are always empty. 
Finally, it takes several minutes for Riak to finish hinted handoff. Several minutes just for several keys!
What Riak is doing all this time? Anyway to speed up the process?
3) The reason why I'm concerned about hinted-handoff speed is because, I noticed that until this process finishes, I read stale data on both sides of ex-netsplit.


--
Thanks,
Andrey
_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
--


Alexander Sicular
Solutions Architect
Basho Technologies
9175130679
@siculars



--
С уважением,
Ершов Андрей

_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|

Re: Testing netsplit in Riak

Alexander Sicular
There's a reason the time is default higher. The larger the network the higher the probability nodes can't speak to each other momentarily. Too low too much gossip and too much flapping. Ymmv. 


@siculars

Sent from my iRotaryPhone

On Feb 22, 2017, at 13:17, Andrey Ershov <[hidden email]> wrote:

Alexander, thanks for your reply!

1) I've set erlang.distribution.nettick_time to 1 second and writes after netsplit are very fast now. So this point is resolved. Do you know how this parameter affects false positive ratio? Riak docs stay that every nettick_time seconds netkernal will initiate remote processes life-checking. However, it does not say anything about the mechanism. Do you know how this failure detector works?
2) As for hinted handoff, I still can not find any solution. Variables that I've tried to change:
   - vnode_management_timer from 10s to 1s
   - transfer_limit from 2 to 100
But still transfer take about a minute. Any other variables that I should take a look at?

2017-02-22 21:12 GMT+03:00 Alexander Sicular <[hidden email]>:
1. Check the erlang vm variable "nettick", I believe. 

2. Hinted handoff resource allocation are configurable via config file or at runtime. 

On Wed, Feb 22, 2017 at 12:07 Andrey Ershov <[hidden email]> wrote:
Hi, guys!

I'm testing netsplit in Riak and can not achieve satisfiable behaviour.
I've just two nodes cluster and bucket with the following settings n=3, w=2, r=2. And I have just a couple of entries.
Basically I have two problems:
1) After the split, writes on one side of the partition start lagging hard. It takes more than 1 minute for the first write to be become successful. I understand that this is related to the process of setting up backup vnodes in Riak, but is any way to speed up the process?  Which configuration parameters influence that?
2) More weird problem is after netsplit. "riak-admin transfers" command immediately reports that there should 5 partition transfers from one node to another and 5 partition transfers in the opposite direction. But active transfers output is empty!
I've put a watch on this command and active transfers are always empty. 
Finally, it takes several minutes for Riak to finish hinted handoff. Several minutes just for several keys!
What Riak is doing all this time? Anyway to speed up the process?
3) The reason why I'm concerned about hinted-handoff speed is because, I noticed that until this process finishes, I read stale data on both sides of ex-netsplit.


--
Thanks,
Andrey
_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
--


Alexander Sicular
Solutions Architect
Basho Technologies
9175130679
@siculars



--
С уважением,
Ершов Андрей
_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com