Load question

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Load question

Mike Oxford
On a given Riak ring we already know that the majority of the buckets will *not* "live locally" when a request comes in to a given ring member.

Now, obviously the more machines you have the greater the chance poor locality for that particular dataset.

A request comes in to node 1 of a 500 node ring; most of the other nodes will end up doing the work due to data locality ... so how much
overhead does the "request node" incur above and beyond the other nodes?  (Numbers are completely arbitrary; I know 500 is a bit high.)

Has anyone modelled this, or run load tests against this?

Moreover, has anyone looked at the "load" of a system with a single access-point vs a system with fully-meshed access points?  As your ring-size grows at what point do you stop load-balancing against every node and is there a known "magic" for how many access points work best?

I don't see anything on Google, but that doesn't mean it doesn't exist somewhere ... anyone know?

Thanks!

-mox

_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|

Re: Load question

Runar Jordahl
The questions you raise are important. I would add that for some
scenarios, processing your data locally (not using Riak, but your own
client program) could improve performance. In such a setup, each box
would run both Riak and your own software.

The Dynamo paper discusses data locality, and points at two strategies:
“(…) (1) route its request through a generic load balancer that will
select a node based on load information, or (2) use a partition-aware
client library that routes requests directly to the appropriate
coordinator nodes. The advantage of the first approach is that the
client does not have to link any code specific to Dynamo in its
application, whereas the second strategy can achieve lower latency
because it skips a potential forwarding step.”
http://www.allthingsdistributed.com/2007/10/amazons_dynamo.html

So far, I have not seen any Riak client library using strategy (2).
What I have seen is a lot of discussion about using (generic) load
balancing (1). I am in the process of writing a client library myself,
but the library only supports specifying an IP address / host name to
contact.

It would be helpful if a wiki page (under Best Practices) was created
to discuss various load balance configurations. I am also wondering if
a Riak client could use strategy (2), like Dynamo clients can.

Kind regards
Runar Jordahl
http://blog.epigent.com/

_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|

Re: Load question

Justin Sheehy
Hi, Runar.

On Tue, Apr 12, 2011 at 3:22 AM, Runar Jordahl <[hidden email]> wrote:

> It would be helpful if a wiki page (under Best Practices) was created
> to discuss various load balance configurations. I am also wondering if
> a Riak client could use strategy (2), like Dynamo clients can.

There is not currently any client that uses strategy #2 of partition-awareness.

To make it practical, we would need to extend the client-facing
protocol so that an incoming client could ask to be redirected to an
"ideal" incoming node.  This is quite doable, though would have the
downside of making such clients more complex and thus possibly more
fragile.

-Justin

_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|

Re: Load question

Mike Oxford
Even exposing CPU load through the client interface would be a big win, as that logic could be
acquired and cached application-side, poll-style.

I would spend an 1 extra call per second to be able to say "you know, NodeX is over 70% CPU, let's kick to 
nodeY instead."  All requests in a 1 second window would use that "snapshot" of the ring.

It doesn't truly reflect locality of the dynamo-based data but it would help mitigate "hot buckets" by not
imposing extra load, giving the hot node some CPU time to work the bucket(s) instead of dealing with routing and 
connection handling.

Just a thought. :)

-mox


On Tue, Apr 12, 2011 at 1:07 PM, Justin Sheehy <[hidden email]> wrote:
Hi, Runar.

On Tue, Apr 12, 2011 at 3:22 AM, Runar Jordahl <[hidden email]> wrote:

> It would be helpful if a wiki page (under Best Practices) was created
> to discuss various load balance configurations. I am also wondering if
> a Riak client could use strategy (2), like Dynamo clients can.

There is not currently any client that uses strategy #2 of partition-awareness.

To make it practical, we would need to extend the client-facing
protocol so that an incoming client could ask to be redirected to an
"ideal" incoming node.  This is quite doable, though would have the
downside of making such clients more complex and thus possibly more
fragile.

-Justin


_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com