Using Riak with EC-2

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Using Riak with EC-2

Tyler Smart
Hi all!

I am new to this list and have just installed Riak on an EC-2 instance (Ubuntu 8.04). I am researching how to best use Riak with a rails app that is deployed on multiple EC-2 instances.

I have been reading the Wiki and this is what I understand so far.

  • I will need a minimum of 3 nodes to run a Riak DB in replication
  • Each riak node needs to be attached to a Riak server
  • These servers must be separate

I am thinking I will need 3 amazon ec-2 instances and maybe 3 storage disks to keep the riak data. I can attach each ec-2 instance to a disk and have Riak write to that disk. I can then configure each Ec-2 instance to be in a common setup with the other Riak servers (synced). Am I understanding this correctly?

Sincerely,
Tyler

_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|

Re: Using Riak with EC-2

Sean Cribbs-2
Tyler,

A few corrections:

1) Riak will replicate regardless of whether you have more nodes than the n_val.  So, if your n_val is 3, it makes most sense to have at least 3 nodes.  Otherwise, you'll have duplication in the data stored on each node.

2) Each node _is_ a Riak server.  "Attaching" is when you connect to the Erlang shell for a running node so you can interact directly using the Erlang client.

3) It is a best practice to run one Riak node per machine (in a single cluster). So each of your EC2 instances that participates in the cluster would have a single Riak node.

You might want to use a configuration management system like Chef to generate per-node configuration and automatically join new nodes to the cluster. (When you say "syncing", we say "joining".)

Sean Cribbs <[hidden email]>
Developer Advocate
Basho Technologies, Inc.

On Mar 29, 2010, at 3:49 PM, Tyler Smart wrote:

Hi all!

I am new to this list and have just installed Riak on an EC-2 instance (Ubuntu 8.04). I am researching how to best use Riak with a rails app that is deployed on multiple EC-2 instances.

I have been reading the Wiki and this is what I understand so far.

  • I will need a minimum of 3 nodes to run a Riak DB in replication
  • Each riak node needs to be attached to a Riak server
  • These servers must be separate

I am thinking I will need 3 amazon ec-2 instances and maybe 3 storage disks to keep the riak data. I can attach each ec-2 instance to a disk and have Riak write to that disk. I can then configure each Ec-2 instance to be in a common setup with the other Riak servers (synced). Am I understanding this correctly?

Sincerely,
Tyler
_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|

Re: Using Riak with EC-2

Tyler Smart
Sean,

Thank you for getting back to me so soon. I am understanding more of what you are talking about now. SO let's say I set the n_val to 2 (because I have 2 EC2 nodes with disks attached to them). I set this n_val to 2 for every bucket. This means that I will have complete data duplication, right?

If I cannot afford Two big ec/2 machines (the 64 bit large), would you recommend 3  32bit machines instead? Will I notice any Riak degradation of any sort, or is it worth it to have the extra node? I am also considering putting the RIAK data ON the ec-2 machine rather than on an EBS block to save money, and was wondering if you had any experience with that.

As an extra precaution, I am considering mirroring all the data from the cloud to a local machine in the office, and was wondering if Riak supports an external join like that?

I am now off looking into chef to get this stuff going!

Thank you very much,
Tyler


On Mon, Mar 29, 2010 at 4:02 PM, Sean Cribbs <[hidden email]> wrote:
Tyler,

A few corrections:

1) Riak will replicate regardless of whether you have more nodes than the n_val.  So, if your n_val is 3, it makes most sense to have at least 3 nodes.  Otherwise, you'll have duplication in the data stored on each node.

2) Each node _is_ a Riak server.  "Attaching" is when you connect to the Erlang shell for a running node so you can interact directly using the Erlang client.

3) It is a best practice to run one Riak node per machine (in a single cluster). So each of your EC2 instances that participates in the cluster would have a single Riak node.

You might want to use a configuration management system like Chef to generate per-node configuration and automatically join new nodes to the cluster. (When you say "syncing", we say "joining".)

Sean Cribbs <[hidden email]>
Developer Advocate
Basho Technologies, Inc.

On Mar 29, 2010, at 3:49 PM, Tyler Smart wrote:

Hi all!

I am new to this list and have just installed Riak on an EC-2 instance (Ubuntu 8.04). I am researching how to best use Riak with a rails app that is deployed on multiple EC-2 instances.

I have been reading the Wiki and this is what I understand so far.

  • I will need a minimum of 3 nodes to run a Riak DB in replication
  • Each riak node needs to be attached to a Riak server
  • These servers must be separate

I am thinking I will need 3 amazon ec-2 instances and maybe 3 storage disks to keep the riak data. I can attach each ec-2 instance to a disk and have Riak write to that disk. I can then configure each Ec-2 instance to be in a common setup with the other Riak servers (synced). Am I understanding this correctly?

Sincerely,
Tyler
_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com



_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|

Re: Using Riak with EC-2

Sean Cribbs-2

On Mar 29, 2010, at 4:21 PM, Tyler Smart wrote:

> Sean,
>
> Thank you for getting back to me so soon. I am understanding more of what you are talking about now. SO let's say I set the n_val to 2 (because I have 2 EC2 nodes with disks attached to them). I set this n_val to 2 for every bucket. This means that I will have complete data duplication, right?
>

My provisional answer is yes.  Riak doesn't give absolute guarantees that data will be evenly distributed, it just happens that way most of the time because it is effectively random.  Because of the way the ring claim algorithm works, you also might experience an imbalance in data distribution where the number of nodes in the cluster is close to the N value.  It's better to keep the size of the cluster significantly larger.

Leslie Lamport, father of vector clocks, would argue that you need at _least_ 3 systems to have fault tolerance.

> If I cannot afford Two big ec/2 machines (the 64 bit large), would you recommend 3  32bit machines instead? Will I notice any Riak degradation of any sort, or is it worth it to have the extra node? I am also considering putting the RIAK data ON the ec-2 machine rather than on an EBS block to save money, and was wondering if you had any experience with that.
>

Your main problem with smaller nodes will be getting CPU and I/O time.  EC2 is great for quick deployment, but a regular VPS provider might be more economical and have more predictable performance in the beginning. I've used both Slicehost and Linode.

You could run your cluster in instance storage only, but if enough nodes go down, your data will be lost.  In the end you may want EBS anyway because it can have better performance than instance storage (although YMMV).

> As an extra precaution, I am considering mirroring all the data from the cloud to a local machine in the office, and was wondering if Riak supports an external join like that?
>

The EnterpriseDS product has long-haul replication to other clusters, but the open-source version does not.  You could ship backups to that computer, but that will be no more reliable than pushing them out to S3.

> I am now off looking into chef to get this stuff going!
>

Please do not hesitate to ask if you have any more questions.

Cheers,

Sean Cribbs <[hidden email]>
Developer Advocate
Basho Technologies, Inc.
http://basho.com/
_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|

Re: Using Riak with EC-2

Aníbal Rojas
Sean,

   Thanks for your recommendation, pretty interesting your points
regarding availability vs performance, and stage of the project.

--
Aníbal Rojas
Ruby on Rails Web Developer
http://www.google.com/profiles/anibalrojas



On Tue, Mar 30, 2010 at 8:24 AM, Sean Cribbs <[hidden email]> wrote:

>
> On Mar 29, 2010, at 4:21 PM, Tyler Smart wrote:
>
>> Sean,
>>
>> Thank you for getting back to me so soon. I am understanding more of what you are talking about now. SO let's say I set the n_val to 2 (because I have 2 EC2 nodes with disks attached to them). I set this n_val to 2 for every bucket. This means that I will have complete data duplication, right?
>>
>
> My provisional answer is yes.  Riak doesn't give absolute guarantees that data will be evenly distributed, it just happens that way most of the time because it is effectively random.  Because of the way the ring claim algorithm works, you also might experience an imbalance in data distribution where the number of nodes in the cluster is close to the N value.  It's better to keep the size of the cluster significantly larger.
>
> Leslie Lamport, father of vector clocks, would argue that you need at _least_ 3 systems to have fault tolerance.
>
>> If I cannot afford Two big ec/2 machines (the 64 bit large), would you recommend 3  32bit machines instead? Will I notice any Riak degradation of any sort, or is it worth it to have the extra node? I am also considering putting the RIAK data ON the ec-2 machine rather than on an EBS block to save money, and was wondering if you had any experience with that.
>>
>
> Your main problem with smaller nodes will be getting CPU and I/O time.  EC2 is great for quick deployment, but a regular VPS provider might be more economical and have more predictable performance in the beginning. I've used both Slicehost and Linode.
>
> You could run your cluster in instance storage only, but if enough nodes go down, your data will be lost.  In the end you may want EBS anyway because it can have better performance than instance storage (although YMMV).
>
>> As an extra precaution, I am considering mirroring all the data from the cloud to a local machine in the office, and was wondering if Riak supports an external join like that?
>>
>
> The EnterpriseDS product has long-haul replication to other clusters, but the open-source version does not.  You could ship backups to that computer, but that will be no more reliable than pushing them out to S3.
>
>> I am now off looking into chef to get this stuff going!
>>
>
> Please do not hesitate to ask if you have any more questions.
>
> Cheers,
>
> Sean Cribbs <[hidden email]>
> Developer Advocate
> Basho Technologies, Inc.
> http://basho.com/
> _______________________________________________
> riak-users mailing list
> [hidden email]
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>

_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com