Cluster basics: N-val and distribution between nodes

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

Cluster basics: N-val and distribution between nodes

Mårten Gustafson
To verify my (mis)understanding of the N-value setting...

Does setting the N-value equal to the number of nodes in a cluster
guarantee that data will be saved on all nodes?

Example:
Given 3 physical nodes, each with 1 Riak instance running and all 3
Riak instances joined in the same cluster. Regardless of the number of
partitions, would a N-value of 3 guarantee that values will be stored
on all three instances assuming that all 3 instances are functional
and the network is available?



cheers
/mårten.

_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|

Re: Cluster basics: N-val and distribution between nodes

Sean Cribbs-2
As long as there are more than one node in the ring, no node will claim two adjacent portions of the ring.  Consistent hashing is used to select the initial partition for an object, and then replica nodes are selected by walking stepwise around the ring.

Sean Cribbs <[hidden email]>
Developer Advocate
Basho Technologies, Inc.
http://basho.com/

On Mar 7, 2010, at 6:24 PM, Mårten Gustafson wrote:

> Hi Sean and thanks.
>
> While I was aware of ring size and nodes claiming parts of that ring I can't recall seeing anything about guarantees that a value will be saved on N number of nodes (given there's >=N number of nodes in the cluster).
>
> I was under the impression that N only dictated the number of ring "slices" to store to and that those could very well be located on the same Riak instance depending om how the ring was distributed, how nodes joined/left the cluster etc.
>
>
>
> /Mårten
>
> On 7 mar 2010, at 21.10, Sean Cribbs <[hidden email]> wrote:
>
>> Mårten,
>>
>> In a sense, yes. The N-value specifies the number of partitions to which the value will be written/replicated.  Assuming you have at least N nodes in your cluster, it will save to N separate nodes.  If you have fewer than N nodes, some node is going to get an extra copy. A node will claim a number of partitions proportional to the ring size.
>>
>> Sean Cribbs <[hidden email]>
>> Developer Advocate
>> Basho Technologies, Inc.
>> http://basho.com/
>>
>> On Mar 7, 2010, at 2:02 PM, Mårten Gustafson wrote:
>>
>>> To verify my (mis)understanding of the N-value setting...
>>>
>>> Does setting the N-value equal to the number of nodes in a cluster
>>> guarantee that data will be saved on all nodes?
>>>
>>> Example:
>>> Given 3 physical nodes, each with 1 Riak instance running and all 3
>>> Riak instances joined in the same cluster. Regardless of the number of
>>> partitions, would a N-value of 3 guarantee that values will be stored
>>> on all three instances assuming that all 3 instances are functional
>>> and the network is available?
>>>
>>>
>>>
>>> cheers
>>> /mårten.
>>>
>>> _______________________________________________
>>> riak-users mailing list
>>> [hidden email]
>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>


_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|

Re: Cluster basics: N-val and distribution between nodes

Preston Marshall
I think thats a yes Marten :)
On Mar 7, 2010, at 8:09 PM, Sean Cribbs wrote:

> As long as there are more than one node in the ring, no node will claim two adjacent portions of the ring.  Consistent hashing is used to select the initial partition for an object, and then replica nodes are selected by walking stepwise around the ring.
>
> Sean Cribbs <[hidden email]>
> Developer Advocate
> Basho Technologies, Inc.
> http://basho.com/
>
> On Mar 7, 2010, at 6:24 PM, Mårten Gustafson wrote:
>
>> Hi Sean and thanks.
>>
>> While I was aware of ring size and nodes claiming parts of that ring I can't recall seeing anything about guarantees that a value will be saved on N number of nodes (given there's >=N number of nodes in the cluster).
>>
>> I was under the impression that N only dictated the number of ring "slices" to store to and that those could very well be located on the same Riak instance depending om how the ring was distributed, how nodes joined/left the cluster etc.
>>
>>
>>
>> /Mårten
>>
>> On 7 mar 2010, at 21.10, Sean Cribbs <[hidden email]> wrote:
>>
>>> Mårten,
>>>
>>> In a sense, yes. The N-value specifies the number of partitions to which the value will be written/replicated.  Assuming you have at least N nodes in your cluster, it will save to N separate nodes.  If you have fewer than N nodes, some node is going to get an extra copy. A node will claim a number of partitions proportional to the ring size.
>>>
>>> Sean Cribbs <[hidden email]>
>>> Developer Advocate
>>> Basho Technologies, Inc.
>>> http://basho.com/
>>>
>>> On Mar 7, 2010, at 2:02 PM, Mårten Gustafson wrote:
>>>
>>>> To verify my (mis)understanding of the N-value setting...
>>>>
>>>> Does setting the N-value equal to the number of nodes in a cluster
>>>> guarantee that data will be saved on all nodes?
>>>>
>>>> Example:
>>>> Given 3 physical nodes, each with 1 Riak instance running and all 3
>>>> Riak instances joined in the same cluster. Regardless of the number of
>>>> partitions, would a N-value of 3 guarantee that values will be stored
>>>> on all three instances assuming that all 3 instances are functional
>>>> and the network is available?
>>>>
>>>>
>>>>
>>>> cheers
>>>> /mårten.
>>>>
>>>> _______________________________________________
>>>> riak-users mailing list
>>>> [hidden email]
>>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>>
>
>
> _______________________________________________
> riak-users mailing list
> [hidden email]
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

smime.p7s (6K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Cluster basics: N-val and distribution between nodes

Sean Cribbs-2
Actually, it's more of a maybe.  In the degenerate scenario where there are fewer nodes than the N value, yes, a single node will get more than one replica, but that doesn't mean it claims adjacent portions of the ring, just that the walk around the ring touched two or more of its partitions.

Sean Cribbs <[hidden email]>
Developer Advocate
Basho Technologies, Inc.
http://basho.com/

On Mar 8, 2010, at 3:28 AM, Preston Marshall wrote:

> I think thats a yes Marten :)
> On Mar 7, 2010, at 8:09 PM, Sean Cribbs wrote:
>
>> As long as there are more than one node in the ring, no node will claim two adjacent portions of the ring.  Consistent hashing is used to select the initial partition for an object, and then replica nodes are selected by walking stepwise around the ring.
>>
>> Sean Cribbs <[hidden email]>
>> Developer Advocate
>> Basho Technologies, Inc.
>> http://basho.com/
>>
>> On Mar 7, 2010, at 6:24 PM, Mårten Gustafson wrote:
>>
>>> Hi Sean and thanks.
>>>
>>> While I was aware of ring size and nodes claiming parts of that ring I can't recall seeing anything about guarantees that a value will be saved on N number of nodes (given there's >=N number of nodes in the cluster).
>>>
>>> I was under the impression that N only dictated the number of ring "slices" to store to and that those could very well be located on the same Riak instance depending om how the ring was distributed, how nodes joined/left the cluster etc.
>>>
>>>
>>>
>>> /Mårten
>>>
>>> On 7 mar 2010, at 21.10, Sean Cribbs <[hidden email]> wrote:
>>>
>>>> Mårten,
>>>>
>>>> In a sense, yes. The N-value specifies the number of partitions to which the value will be written/replicated.  Assuming you have at least N nodes in your cluster, it will save to N separate nodes.  If you have fewer than N nodes, some node is going to get an extra copy. A node will claim a number of partitions proportional to the ring size.
>>>>
>>>> Sean Cribbs <[hidden email]>
>>>> Developer Advocate
>>>> Basho Technologies, Inc.
>>>> http://basho.com/
>>>>
>>>> On Mar 7, 2010, at 2:02 PM, Mårten Gustafson wrote:
>>>>
>>>>> To verify my (mis)understanding of the N-value setting...
>>>>>
>>>>> Does setting the N-value equal to the number of nodes in a cluster
>>>>> guarantee that data will be saved on all nodes?
>>>>>
>>>>> Example:
>>>>> Given 3 physical nodes, each with 1 Riak instance running and all 3
>>>>> Riak instances joined in the same cluster. Regardless of the number of
>>>>> partitions, would a N-value of 3 guarantee that values will be stored
>>>>> on all three instances assuming that all 3 instances are functional
>>>>> and the network is available?
>>>>>
>>>>>
>>>>>
>>>>> cheers
>>>>> /mårten.
>>>>>
>>>>> _______________________________________________
>>>>> riak-users mailing list
>>>>> [hidden email]
>>>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>>>
>>
>>
>> _______________________________________________
>> riak-users mailing list
>> [hidden email]
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>


_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|

Re: Cluster basics: N-val and distribution between nodes

Mårten Gustafson
Assuming all instances in a cluster are up and interconnected. A  
bucket whose N-val is equal to, or greater than, the number of Riak  
instances will always have its value saved to at least one partition  
on each instance.

Is this an accurate summary?



/Mårten

On 8 mar 2010, at 13.19, Sean Cribbs <[hidden email]> wrote:

> Actually, it's more of a maybe.  In the degenerate scenario where  
> there are fewer nodes than the N value, yes, a single node will get  
> more than one replica, but that doesn't mean it claims adjacent  
> portions of the ring, just that the walk around the ring touched two  
> or more of its partitions.
>
> Sean Cribbs <[hidden email]>
> Developer Advocate
> Basho Technologies, Inc.
> http://basho.com/
>
> On Mar 8, 2010, at 3:28 AM, Preston Marshall wrote:
>
>> I think thats a yes Marten :)
>> On Mar 7, 2010, at 8:09 PM, Sean Cribbs wrote:
>>
>>> As long as there are more than one node in the ring, no node will  
>>> claim two adjacent portions of the ring.  Consistent hashing is  
>>> used to select the initial partition for an object, and then  
>>> replica nodes are selected by walking stepwise around the ring.
>>>
>>> Sean Cribbs <[hidden email]>
>>> Developer Advocate
>>> Basho Technologies, Inc.
>>> http://basho.com/
>>>
>>> On Mar 7, 2010, at 6:24 PM, Mårten Gustafson wrote:
>>>
>>>> Hi Sean and thanks.
>>>>
>>>> While I was aware of ring size and nodes claiming parts of that  
>>>> ring I can't recall seeing anything about guarantees that a value  
>>>> will be saved on N number of nodes (given there's >=N number of  
>>>> nodes in the cluster).
>>>>
>>>> I was under the impression that N only dictated the number of  
>>>> ring "slices" to store to and that those could very well be  
>>>> located on the same Riak instance depending om how the ring was  
>>>> distributed, how nodes joined/left the cluster etc.
>>>>
>>>>
>>>>
>>>> /Mårten
>>>>
>>>> On 7 mar 2010, at 21.10, Sean Cribbs <[hidden email]> wrote:
>>>>
>>>>> Mårten,
>>>>>
>>>>> In a sense, yes. The N-value specifies the number of partitions  
>>>>> to which the value will be written/replicated.  Assuming you  
>>>>> have at least N nodes in your cluster, it will save to N  
>>>>> separate nodes.  If you have fewer than N nodes, some node is  
>>>>> going to get an extra copy. A node will claim a number of  
>>>>> partitions proportional to the ring size.
>>>>>
>>>>> Sean Cribbs <[hidden email]>
>>>>> Developer Advocate
>>>>> Basho Technologies, Inc.
>>>>> http://basho.com/
>>>>>
>>>>> On Mar 7, 2010, at 2:02 PM, Mårten Gustafson wrote:
>>>>>
>>>>>> To verify my (mis)understanding of the N-value setting...
>>>>>>
>>>>>> Does setting the N-value equal to the number of nodes in a  
>>>>>> cluster
>>>>>> guarantee that data will be saved on all nodes?
>>>>>>
>>>>>> Example:
>>>>>> Given 3 physical nodes, each with 1 Riak instance running and  
>>>>>> all 3
>>>>>> Riak instances joined in the same cluster. Regardless of the  
>>>>>> number of
>>>>>> partitions, would a N-value of 3 guarantee that values will be  
>>>>>> stored
>>>>>> on all three instances assuming that all 3 instances are  
>>>>>> functional
>>>>>> and the network is available?
>>>>>>
>>>>>>
>>>>>>
>>>>>> cheers
>>>>>> /mårten.
>>>>>>
>>>>>> _______________________________________________
>>>>>> riak-users mailing list
>>>>>> [hidden email]
>>>>>> http://lists.basho.com/mailman/listinfo/riak- 
>>>>>> users_lists.basho.com
>>>>>
>>>
>>>
>>> _______________________________________________
>>> riak-users mailing list
>>> [hidden email]
>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>
>

_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|

Re: Cluster basics: N-val and distribution between nodes

Justin Sheehy
On Mon, Mar 8, 2010 at 10:14 AM, Mårten Gustafson
<[hidden email]> wrote:

> Assuming all instances in a cluster are up and interconnected. A bucket
> whose N-val is equal to, or greater than, the number of Riak instances will
> always have its value saved to at least one partition on each instance.
>
> Is this an accurate summary?

Yes.

-Justin

_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|

Re: Cluster basics: N-val and distribution between nodes

Victor Martinez
In reply to this post by Mårten Gustafson
I'm sorry if I'm adding more confusion to a question that seemed closed.

Is this true even if the number of virtual nodes is not a multiple of N?

So for example ( I know it is a very degenerate case ) if the size of  
the ring is 4, and there are 3 riak instances and N is set to 3 for a  
given bucket, a given entry could end up not being stored on the three  
nodes.

The way I understand it, this could happen for any size of the ring  
which is not a multiple of N, not only in a ring with 4 partitions.

I'm basing this on the comments in riak_claim.erl

Is this correct?

_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|

Re: Cluster basics: N-val and distribution between nodes

Curtis Caravone
I have the same question.  According to the comments in riak_claim.erl, the claims will be arranged so that partition sequences of length at most target_n_val will have no repeated nodes, if possible, but that there will be cases where there may be repeats.  Is the sequence of N partitions taken verbatim as the replica list, or is there some logic to ensure that we actually get N distinct nodes?

I think in the original Dynamo paper, they said they had some extra logic to make sure the N virtual nodes they chose for replication actually corresponded to N distinct physical nodes.

Curtis


On Mon, Mar 8, 2010 at 10:20 AM, Victor Martinez <[hidden email]> wrote:
I'm sorry if I'm adding more confusion to a question that seemed closed.

Is this true even if the number of virtual nodes is not a multiple of N?

So for example ( I know it is a very degenerate case ) if the size of the ring is 4, and there are 3 riak instances and N is set to 3 for a given bucket, a given entry could end up not being stored on the three nodes.

The way I understand it, this could happen for any size of the ring which is not a multiple of N, not only in a ring with 4 partitions.

I'm basing this on the comments in riak_claim.erl

Is this correct?


_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
wde
Reply | Threaded
Open this post in threaded view
|

Re: Re: Cluster basics: N-val and distribution between nodes

wde
In reply to this post by Mårten Gustafson
>I have the same question.  According to the comments in riak_claim.erl, the
>claims will be arranged so that partition sequences of length at most
>target_n_val will have no repeated nodes, if possible, but that there will
>be cases where there may be repeats.  Is the sequence of N partitions taken
>verbatim as the replica list, or is there some logic to ensure that we
>actually get N distinct nodes?
>
>I think in the original Dynamo paper, they said they had some extra logic to
>make sure the N virtual nodes they chose for replication actually
>corresponded to N distinct physical nodes.

Yes, I think they add an additional layer to ensure (if needed) that replicas will be on differents
physical nodes. Virtualization is (sometimes) good, but when the customer don't know where are physically the replicas (dude, where's my data ?), it could be problem. For example if you want to ensure that at least 2 replicas are hosted in two datacenters to ensure that you respect the SLA and the site disaster clause.


   

>
>Curtis
>
>
>On Mon, Mar 8, 2010 at 10:20 AM, Victor Martinez <[hidden email]>wrote:
>
>> I'm sorry if I'm adding more confusion to a question that seemed closed.
>>
>> Is this true even if the number of virtual nodes is not a multiple of N?
>>
>> So for example ( I know it is a very degenerate case ) if the size of the
>> ring is 4, and there are 3 riak instances and N is set to 3 for a given
>> bucket, a given entry could end up not being stored on the three nodes.
>>
>> The way I understand it, this could happen for any size of the ring which
>> is not a multiple of N, not only in a ring with 4 partitions.
>>
>> I'm basing this on the comments in riak_claim.erl
>>
>> Is this correct?
>>
>>
>> _______________________________________________
>> riak-users mailing list
>> [hidden email]
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>
>
>_______________________________________________
>riak-users mailing list
>[hidden email]
>http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>



_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com