connection concurrency limitations?

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

connection concurrency limitations?

David Weldon
Are there any concurrency limitations to erlang client connections
that I should know about? Let's say my erlang app maintains one client
connection and many processes are all using it at the same time. Is
that acceptable?

Dave

_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|

Re: connection concurrency limitations?

Ulf Wiger
David Weldon wrote:
> Are there any concurrency limitations to erlang client connections
> that I should know about? Let's say my erlang app maintains one client
> connection and many processes are all using it at the same time. Is
> that acceptable?

There is of course no substitute for real profiling, but with that
general disclaimer, here are a few thoughts on the subject:

(With "client connection", I assume you mean a TCP socket, but
basically, the discussion holds for any kind of Erlang port.)

Incoming messages on a socket will always go to the port owner.
Thus, all messages are serialized in the port owner's mailbox.
For the port owner not to become a bottleneck, it should focus on
simply dispatching the messages and nothing else. It may be
beneficial to set this process to high priority. This holds for
both non-SMP Erlang and SMP Erlang. However, setting a process
priority to 'high' is slightly less risky in SMP - a high-prio
process can block the scheduler if written badly, which can
completely disable a non-SMP system, but in SMP Erlang, there
are several schedulers, so the consequences of such a bug are
less dire.

Sending of messages to the port can be done from any process.
This implies a mutex on the port, so there is risk of lock
contention. However, there will be some serialization by the
internal scheduling algorithm in Erlang, so the degree of
concurrency is rather the number of /scheduler threads/, not
the number of concurrent Erlang processes. How many processes
try to send to the socket will impact the frequency by which
each scheduler thread tries to go for the mutex, so this will
have some impact on performance too.

This is where benchmarking comes in. I am not exactly sure
how "wide" the mutex is on send. There is a socket option,
{delay_send, boolean()}, which makes message sending to the
socket more asynchronous (see the man page on inet:setopts/2).
That may well improve throughput in this kind of scenario.

I hope this helps somewhat.

BR,
Ulf W
--
Ulf Wiger
CTO, Erlang Solutions Ltd, formerly Erlang Training & Consulting Ltd
http://www.erlang-solutions.com
---------------------------------------------------

---------------------------------------------------

WE'VE CHANGED NAMES!

Since January 1st 2010 Erlang Training and Consulting Ltd. has become ERLANG SOLUTIONS LTD.

www.erlang-solutions.com


_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|

Re: connection concurrency limitations?

Sean Cribbs-2
In reply to this post by David Weldon
Actually, the distinction is subtle.  It's ok to have connection pooling, but presumably only one thread/process should use a given client at any one time.  I suppose it depends on how many concurrent writes you expect to have per client node.  Yes, having more client ids contributes to vector clock growth, but the vclocks are also periodically pruned.

It will largely depend on your application's data as well; in some cases you may not need fine-grained conflict resolution.  If you don't, then it won't matter if you have one client id per node or many.

Sean Cribbs <[hidden email]>
Developer Advocate
Basho Technologies, Inc.
http://basho.com/

On Apr 3, 2010, at 1:47 PM, David Weldon wrote:

> Hey Sean, thanks for the reply. How does your reply relate to what
> Bryan says at the end of:
> http://lists.basho.com/pipermail/riak-users_lists.basho.com/2010-February/000497.html
>
> It seems like you are saying opposite things but I'm really unsure. So
> if I have a webmachine app that has 1,000 queries per second coming in
> and I need each one to do a get and a put to riak, you think that
> instead of keeping a single client connection I should open a new
> connection for each one? That's easier for me to do anyway, but I was
> worried about "vector clock growth" and the cost of creating a new
> connection, which sounds like its small anyway.
>
> On another note, I assume that ERL_MAX_PORTS on the riak server should
> be related to the number of incoming client connections and the number
> of connected nodes. Is that right?
>
> Thanks!
>
> Dave
>
> On Sat, Apr 3, 2010 at 8:43 AM, Sean Cribbs <[hidden email]> wrote:
>> Dave,
>>
>> When you call riak:client_connect/1,2, several things happen.  First, a connection to the cluster is established via the built-in Erlang distribution facilities, assuming your node is not already connected, and verifies that Riak is running on the node.  Second, it returns a parameterized riak_client module to you with the connected node and Client ID.  From there, any calls to functions in the parameterized module send Erlang messages to processes on the connected node, tagged with the Client ID.
>>
>> The important thing here is the Client ID - you probably don't want your independent client processes using the same ID, because this will make resolving conflicts caused by race conditions or network splits more difficult.  Have your processes create new clients when they start up, as the expense is not much more than sending pure Erlang messages, and when a conflict occurs, you can tease out the problem more easily.
>>
>> Sean Cribbs <[hidden email]>
>> Developer Advocate
>> Basho Technologies, Inc.
>> http://basho.com/
>>
>> On Apr 3, 2010, at 1:15 AM, David Weldon wrote:
>>
>>> Are there any concurrency limitations to erlang client connections
>>> that I should know about? Let's say my erlang app maintains one client
>>> connection and many processes are all using it at the same time. Is
>>> that acceptable?
>>>
>>> Dave
>>>
>>> _______________________________________________
>>> riak-users mailing list
>>> [hidden email]
>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>
>>


_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|

Re: connection concurrency limitations?

David Weldon
Sounds good - thanks Sean. On a related issue, imagine I have a bucket
where I don't really care about conflict resolution so I've set
allow_mult to false. Whenever I have an update, let's say I choose to
always do a put with a new object (I don't use
riak_object:update_value/2 even when the object already exists). This
seems to work in tests but do I pay any kind of penalty for doing this
when the database is under load?

Dave

On Sat, Apr 3, 2010 at 2:01 PM, Sean Cribbs <[hidden email]> wrote:

> Actually, the distinction is subtle.  It's ok to have connection pooling, but presumably only one thread/process should use a given client at any one time.  I suppose it depends on how many concurrent writes you expect to have per client node.  Yes, having more client ids contributes to vector clock growth, but the vclocks are also periodically pruned.
>
> It will largely depend on your application's data as well; in some cases you may not need fine-grained conflict resolution.  If you don't, then it won't matter if you have one client id per node or many.
>
> Sean Cribbs <[hidden email]>
> Developer Advocate
> Basho Technologies, Inc.
> http://basho.com/
>
> On Apr 3, 2010, at 1:47 PM, David Weldon wrote:
>
>> Hey Sean, thanks for the reply. How does your reply relate to what
>> Bryan says at the end of:
>> http://lists.basho.com/pipermail/riak-users_lists.basho.com/2010-February/000497.html
>>
>> It seems like you are saying opposite things but I'm really unsure. So
>> if I have a webmachine app that has 1,000 queries per second coming in
>> and I need each one to do a get and a put to riak, you think that
>> instead of keeping a single client connection I should open a new
>> connection for each one? That's easier for me to do anyway, but I was
>> worried about "vector clock growth" and the cost of creating a new
>> connection, which sounds like its small anyway.
>>
>> On another note, I assume that ERL_MAX_PORTS on the riak server should
>> be related to the number of incoming client connections and the number
>> of connected nodes. Is that right?
>>
>> Thanks!
>>
>> Dave
>>
>> On Sat, Apr 3, 2010 at 8:43 AM, Sean Cribbs <[hidden email]> wrote:
>>> Dave,
>>>
>>> When you call riak:client_connect/1,2, several things happen.  First, a connection to the cluster is established via the built-in Erlang distribution facilities, assuming your node is not already connected, and verifies that Riak is running on the node.  Second, it returns a parameterized riak_client module to you with the connected node and Client ID.  From there, any calls to functions in the parameterized module send Erlang messages to processes on the connected node, tagged with the Client ID.
>>>
>>> The important thing here is the Client ID - you probably don't want your independent client processes using the same ID, because this will make resolving conflicts caused by race conditions or network splits more difficult.  Have your processes create new clients when they start up, as the expense is not much more than sending pure Erlang messages, and when a conflict occurs, you can tease out the problem more easily.
>>>
>>> Sean Cribbs <[hidden email]>
>>> Developer Advocate
>>> Basho Technologies, Inc.
>>> http://basho.com/
>>>
>>> On Apr 3, 2010, at 1:15 AM, David Weldon wrote:
>>>
>>>> Are there any concurrency limitations to erlang client connections
>>>> that I should know about? Let's say my erlang app maintains one client
>>>> connection and many processes are all using it at the same time. Is
>>>> that acceptable?
>>>>
>>>> Dave
>>>>
>>>> _______________________________________________
>>>> riak-users mailing list
>>>> [hidden email]
>>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>>
>>>
>
>

_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|

Re: connection concurrency limitations?

Dave Smith
On Sat, Apr 3, 2010 at 9:59 PM, David Weldon <[hidden email]> wrote:
Sounds good - thanks Sean. On a related issue, imagine I have a bucket
where I don't really care about conflict resolution so I've set
allow_mult to false. Whenever I have an update, let's say I choose to
always do a put with a new object (I don't use
riak_object:update_value/2 even when the object already exists). This
seems to work in tests but do I pay any kind of penalty for doing this
when the database is under load?

Yes, you do pay a penalty due to the fact that your updates will not include any vclock info. Even when allow_mult is false, the server must still track vclock information. As such, each new write will construct a new vclock which the server must then reconcile against the old vclock (versus just updating a vclock). Over time, you'll get a chain of these new vclocks built up in the server until the server is forced to GC them and get rid of the really old ones.

Bottom line, use update_value/2 if possible and especially if you're concerned about performance. It minimizes the bookkeeping the server needs to do to complete the update.

Hope that helps,

D.

_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com