Erlang API vs Erlang PBC

classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

Erlang API vs Erlang PBC

Ryan Maclear
Hi,

When creating an erlang client, is is better to use the riak_client module (ie. Erlang API) or the riakc_pb_socket module (PBC)? I've seen a number of code snippets and samples that use the erlang API and not the PBC (eg. http://wiki.basho.com/MapReduce.html) and for some erlang code inside riak_function_contrib.

Obviously, to use the Erlang API I would need to have all the necessary beam files in my erlang path,  whereas the PBC only requires a limited set of beam files.  The Wiki says that the primary client for code not inside riak should be the Erlang PBC. However, is the anything technically wrong with using the erlang API?

Thanks,
Ryan
 
_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|

Re: Erlang API vs Erlang PBC

Ryan Zezeski
I think one thing to note is that by using riak_client all your data will go over the distributed socket, which I suppose could interfere with other messages and act as a bottleneck.  With PBC you can open multiple sockets as needed and your node won't be connected to the riak cluster, which probably becomes more important as the cluster becomes larger because each new node to the cluster requires N-1 sockets where N = number of nodes.

-Ryan

On Thu, Jan 20, 2011 at 6:32 AM, Ryan Maclear <[hidden email]> wrote:
Hi,

When creating an erlang client, is is better to use the riak_client module (ie. Erlang API) or the riakc_pb_socket module (PBC)? I've seen a number of code snippets and samples that use the erlang API and not the PBC (eg. http://wiki.basho.com/MapReduce.html) and for some erlang code inside riak_function_contrib.

Obviously, to use the Erlang API I would need to have all the necessary beam files in my erlang path,  whereas the PBC only requires a limited set of beam files.  The Wiki says that the primary client for code not inside riak should be the Erlang PBC. However, is the anything technically wrong with using the erlang API?

Thanks,
Ryan

_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|

Re: Erlang API vs Erlang PBC

Ryan Maclear
Thanks, that makes sense. I guess most of the sample code is based on the dev setup outlined in the Fast Track on the wiki, which only sets up four local nodes. 

Cheers,
Ryan

On 20 Jan 2011, at 2:46 PM, Ryan Zezeski wrote:

I think one thing to note is that by using riak_client all your data will go over the distributed socket, which I suppose could interfere with other messages and act as a bottleneck.  With PBC you can open multiple sockets as needed and your node won't be connected to the riak cluster, which probably becomes more important as the cluster becomes larger because each new node to the cluster requires N-1 sockets where N = number of nodes.

-Ryan

On Thu, Jan 20, 2011 at 6:32 AM, Ryan Maclear <[hidden email]> wrote:
Hi,

When creating an erlang client, is is better to use the riak_client module (ie. Erlang API) or the riakc_pb_socket module (PBC)? I've seen a number of code snippets and samples that use the erlang API and not the PBC (eg. http://wiki.basho.com/MapReduce.html) and for some erlang code inside riak_function_contrib.

Obviously, to use the Erlang API I would need to have all the necessary beam files in my erlang path,  whereas the PBC only requires a limited set of beam files.  The Wiki says that the primary client for code not inside riak should be the Erlang PBC. However, is the anything technically wrong with using the erlang API?

Thanks,
Ryan

_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com



_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|

Re: Erlang API vs Erlang PBC

Bob Ippolito
Additionally, the native Erlang client is more sensitive to internal
changes in Riak than any well defined protocol would be.

On Thu, Jan 20, 2011 at 9:35 PM, Ryan Maclear <[hidden email]> wrote:

> Thanks, that makes sense. I guess most of the sample code is based on the
> dev setup outlined in the Fast Track on the wiki, which only sets up four
> local nodes.
> Cheers,
> Ryan
> On 20 Jan 2011, at 2:46 PM, Ryan Zezeski wrote:
>
> I think one thing to note is that by using riak_client all your data will go
> over the distributed socket, which I suppose could interfere with other
> messages and act as a bottleneck.  With PBC you can open multiple sockets as
> needed and your node won't be connected to the riak cluster, which probably
> becomes more important as the cluster becomes larger because each new node
> to the cluster requires N-1 sockets where N = number of nodes.
> -Ryan
>
> On Thu, Jan 20, 2011 at 6:32 AM, Ryan Maclear <[hidden email]> wrote:
>>
>> Hi,
>>
>> When creating an erlang client, is is better to use the riak_client module
>> (ie. Erlang API) or the riakc_pb_socket module (PBC)? I've seen a number of
>> code snippets and samples that use the erlang API and not the PBC (eg.
>> http://wiki.basho.com/MapReduce.html) and for some erlang code inside
>> riak_function_contrib.
>>
>> Obviously, to use the Erlang API I would need to have all the necessary
>> beam files in my erlang path,  whereas the PBC only requires a limited set
>> of beam files.  The Wiki says that the primary client for code not inside
>> riak should be the Erlang PBC. However, is the anything technically wrong
>> with using the erlang API?
>>
>> Thanks,
>> Ryan
>>
>> _______________________________________________
>> riak-users mailing list
>> [hidden email]
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>
>
> _______________________________________________
> riak-users mailing list
> [hidden email]
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>

_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|

Re: Erlang API vs Erlang PBC

Mojito Sorbet
To me the major concern is that if you use the native (non-PB)
interface, your application cluster and the Riak cluster become merged
into one big Erlang cluster.   The number of TCP connections can start
getting out of hand, and the work put on the cluster manager starts to
become significant.


_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|

Re: Erlang API vs Erlang PBC

Ryan Maclear
Agreed. So it therefore makes sense to start using the PBC from the outset, allowing for future moving of the client app off any cluster node(s) it might be residing on, as well as not being affecting by any subtle changes to the internals of the riak_kv code base (specifically the non-PB modules).

On 20 Jan 2011, at 9:37 PM, Mojito Sorbet wrote:

> To me the major concern is that if you use the native (non-PB)
> interface, your application cluster and the Riak cluster become merged
> into one big Erlang cluster.   The number of TCP connections can start
> getting out of hand, and the work put on the cluster manager starts to
> become significant.
>
>
> _______________________________________________
> riak-users mailing list
> [hidden email]
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|

Re: Erlang API vs Erlang PBC

Bob Ippolito
Another issue we've run into is that the Erlang native client allows
you to store non-binary values, which can not be accessed from the
PBC.... so if you're not careful or don't know better, you'll be in
for some migration if you're trying to use other clients.

The only real problem is that the PBC needs some additional software
around it to pool connections, where the Erlang native client got that
for free because it was leveraging Erlang distribution.

On Fri, Jan 21, 2011 at 3:57 AM, Ryan Maclear <[hidden email]> wrote:

> Agreed. So it therefore makes sense to start using the PBC from the outset, allowing for future moving of the client app off any cluster node(s) it might be residing on, as well as not being affecting by any subtle changes to the internals of the riak_kv code base (specifically the non-PB modules).
>
> On 20 Jan 2011, at 9:37 PM, Mojito Sorbet wrote:
>
>> To me the major concern is that if you use the native (non-PB)
>> interface, your application cluster and the Riak cluster become merged
>> into one big Erlang cluster.   The number of TCP connections can start
>> getting out of hand, and the work put on the cluster manager starts to
>> become significant.
>>
>>
>> _______________________________________________
>> riak-users mailing list
>> [hidden email]
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>
> _______________________________________________
> riak-users mailing list
> [hidden email]
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>

_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|

Re: Erlang API vs Erlang PBC

David Dawson
I am not sure if this is any help but I have uploaded a protocol buffer pool client for riak which requires you to pass a client_id for each operation.

https://github.com/DangerDawson/riakc_pb_pool

It is very very basic, but does most of the useful things:

        - put / get / delete
        - riak disconnects,
        - Clients that die after leasing a riak connection ( connection is returned to the pool )
        - Dynamically increasing the size of the connection pool
        - Queueing requests if there are no connections available
        - Useful Statistics

Of course the documentation is very sparse and needs improving, which I will get round to.

Dave


On 21 Jan 2011, at 00:43, Bob Ippolito wrote:

> Another issue we've run into is that the Erlang native client allows
> you to store non-binary values, which can not be accessed from the
> PBC.... so if you're not careful or don't know better, you'll be in
> for some migration if you're trying to use other clients.
>
> The only real problem is that the PBC needs some additional software
> around it to pool connections, where the Erlang native client got that
> for free because it was leveraging Erlang distribution.
>
> On Fri, Jan 21, 2011 at 3:57 AM, Ryan Maclear <[hidden email]> wrote:
>> Agreed. So it therefore makes sense to start using the PBC from the outset, allowing for future moving of the client app off any cluster node(s) it might be residing on, as well as not being affecting by any subtle changes to the internals of the riak_kv code base (specifically the non-PB modules).
>>
>> On 20 Jan 2011, at 9:37 PM, Mojito Sorbet wrote:
>>
>>> To me the major concern is that if you use the native (non-PB)
>>> interface, your application cluster and the Riak cluster become merged
>>> into one big Erlang cluster.   The number of TCP connections can start
>>> getting out of hand, and the work put on the cluster manager starts to
>>> become significant.
>>>
>>>
>>> _______________________________________________
>>> riak-users mailing list
>>> [hidden email]
>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>
>>
>> _______________________________________________
>> riak-users mailing list
>> [hidden email]
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>
>
> _______________________________________________
> riak-users mailing list
> [hidden email]
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|

Re: Erlang API vs Erlang PBC

Ryan Zezeski
I was hesitant to mention it at first, but since you broke the ice...I also wrote my own pool.  However, I took a different approach.  My pool only cares about doling out connections and making sure they are alive.  One or more processes could use the same conn at a given time (side question: is that a problem?).  My pool relies on the fact that each conn has N-1 other conns after it before it gets reused, where N = the size of the pool.

That said, I hacked mine together in 15 minutes for something I needed at work.  It seemed to handle a reasonable load (100+ concurrent connections) just fine, so I'm using it, but I don't know if I'd suggest anyone else using it.  I'm mainly putting it up here as contrast to your (David's) pool and I was curious to get feedback if I'm doing anything insanely stupid?


-Ryan

On Fri, Jan 21, 2011 at 7:12 AM, David Dawson <[hidden email]> wrote:
I am not sure if this is any help but I have uploaded a protocol buffer pool client for riak which requires you to pass a client_id for each operation.

https://github.com/DangerDawson/riakc_pb_pool

It is very very basic, but does most of the useful things:

       - put / get / delete
       - riak disconnects,
       - Clients that die after leasing a riak connection ( connection is returned to the pool )
       - Dynamically increasing the size of the connection pool
       - Queueing requests if there are no connections available
       - Useful Statistics

Of course the documentation is very sparse and needs improving, which I will get round to.

Dave


On 21 Jan 2011, at 00:43, Bob Ippolito wrote:

> Another issue we've run into is that the Erlang native client allows
> you to store non-binary values, which can not be accessed from the
> PBC.... so if you're not careful or don't know better, you'll be in
> for some migration if you're trying to use other clients.
>
> The only real problem is that the PBC needs some additional software
> around it to pool connections, where the Erlang native client got that
> for free because it was leveraging Erlang distribution.
>
> On Fri, Jan 21, 2011 at 3:57 AM, Ryan Maclear <[hidden email]> wrote:
>> Agreed. So it therefore makes sense to start using the PBC from the outset, allowing for future moving of the client app off any cluster node(s) it might be residing on, as well as not being affecting by any subtle changes to the internals of the riak_kv code base (specifically the non-PB modules).
>>
>> On 20 Jan 2011, at 9:37 PM, Mojito Sorbet wrote:
>>
>>> To me the major concern is that if you use the native (non-PB)
>>> interface, your application cluster and the Riak cluster become merged
>>> into one big Erlang cluster.   The number of TCP connections can start
>>> getting out of hand, and the work put on the cluster manager starts to
>>> become significant.
>>>
>>>
>>> _______________________________________________
>>> riak-users mailing list
>>> [hidden email]
>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>
>>
>> _______________________________________________
>> riak-users mailing list
>> [hidden email]
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>
>
> _______________________________________________
> riak-users mailing list
> [hidden email]
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|

Re: Erlang API vs Erlang PBC

Anthony Molinaro
Just thought I'm chime is as I also just recently wrote a client pool, which
works in a very different way from these two.

The problem that I see with both solutions presented, is that in order
to get a client you have to route through a gen_server call, so you are
bottlenecking yourself if you ever have very high traffic.

My solution uses a simple one for one supervisor to manage the
riakc_socket_pb gen_servers and a pg2 for the pooling.  pg2 is
actually a distributed process pool, but has a very nice call
called get_closest_pid/1 which will return a random pid from your
pool (and prefer to the local pids [ie, those on the local node]),
but without the overhead of a gen_server call.

Instead it uses ETS to manage the pids which are part of a pool,
then just queries ETS (which can happen in the process itself).
If randomly selecting members of the pool isn't what you want, you
can check out this article

http://lethain.com/entry/2009/sep/12/load-balancing-across-erlang-process-groups/

for a strategy which uses message queue length to determine where to
route a message.  The fact is that each process already has a built
in queue in its mailbox, so assuming you are provisioning the right
number of processes, there's no need to queue elsewhere, and using
this to route to the least loaded connection should work and be doable
in each process without the bottle neck of a single gen_server queue.

I stuck the code here https://gist.github.com/790556 although I edited
it down and did some renaming so it might not compile if you want to
play with it.  But basic usage is

1> pool_sup:start_link ([]).
2> Pid = pool_manager:get_connection().
3> riakc_socket_pb:get(Pid,....).

Let me know if there's any comments or questions about it.

-Anthony

On Fri, Jan 21, 2011 at 07:40:15AM -0500, Ryan Zezeski wrote:

> I was hesitant to mention it at first, but since you broke the ice...I also
> wrote my own pool.  However, I took a different approach.  My pool only
> cares about doling out connections and making sure they are alive.  One or
> more processes could use the same conn at a given time (side question: is
> that a problem?).  My pool relies on the fact that each conn has N-1 other
> conns after it before it gets reused, where N = the size of the pool.
>
> That said, I hacked mine together in 15 minutes for something I needed at
> work.  It seemed to handle a reasonable load (100+ concurrent connections)
> just fine, so I'm using it, but I don't know if I'd suggest anyone else
> using it.  I'm mainly putting it up here as contrast to your (David's) pool
> and I was curious to get feedback if I'm doing anything insanely stupid?
>
> https://gist.github.com/789616
>
> -Ryan
>
> On Fri, Jan 21, 2011 at 7:12 AM, David Dawson <[hidden email]>wrote:
>
> > I am not sure if this is any help but I have uploaded a protocol buffer
> > pool client for riak which requires you to pass a client_id for each
> > operation.
> >
> > https://github.com/DangerDawson/riakc_pb_pool
> >
> > It is very very basic, but does most of the useful things:
> >
> >        - put / get / delete
> >        - riak disconnects,
> >        - Clients that die after leasing a riak connection ( connection is
> > returned to the pool )
> >        - Dynamically increasing the size of the connection pool
> >        - Queueing requests if there are no connections available
> >        - Useful Statistics
> >
> > Of course the documentation is very sparse and needs improving, which I
> > will get round to.
> >
> > Dave
> >
> >
> > On 21 Jan 2011, at 00:43, Bob Ippolito wrote:
> >
> > > Another issue we've run into is that the Erlang native client allows
> > > you to store non-binary values, which can not be accessed from the
> > > PBC.... so if you're not careful or don't know better, you'll be in
> > > for some migration if you're trying to use other clients.
> > >
> > > The only real problem is that the PBC needs some additional software
> > > around it to pool connections, where the Erlang native client got that
> > > for free because it was leveraging Erlang distribution.
> > >
> > > On Fri, Jan 21, 2011 at 3:57 AM, Ryan Maclear <[hidden email]>
> > wrote:
> > >> Agreed. So it therefore makes sense to start using the PBC from the
> > outset, allowing for future moving of the client app off any cluster node(s)
> > it might be residing on, as well as not being affecting by any subtle
> > changes to the internals of the riak_kv code base (specifically the non-PB
> > modules).
> > >>
> > >> On 20 Jan 2011, at 9:37 PM, Mojito Sorbet wrote:
> > >>
> > >>> To me the major concern is that if you use the native (non-PB)
> > >>> interface, your application cluster and the Riak cluster become merged
> > >>> into one big Erlang cluster.   The number of TCP connections can start
> > >>> getting out of hand, and the work put on the cluster manager starts to
> > >>> become significant.
> > >>>
> > >>>
> > >>> _______________________________________________
> > >>> riak-users mailing list
> > >>> [hidden email]
> > >>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
> > >>
> > >>
> > >> _______________________________________________
> > >> riak-users mailing list
> > >> [hidden email]
> > >> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
> > >>
> > >
> > > _______________________________________________
> > > riak-users mailing list
> > > [hidden email]
> > > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
> >
> >
> > _______________________________________________
> > riak-users mailing list
> > [hidden email]
> > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
> >

> _______________________________________________
> riak-users mailing list
> [hidden email]
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


--
------------------------------------------------------------------------
Anthony Molinaro                           <[hidden email]>

_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|

Re: Erlang API vs Erlang PBC

Seth Falcon-3
In reply to this post by Ryan Zezeski
On Fri, Jan 21, 2011 at 4:40 AM, Ryan Zezeski <[hidden email]> wrote:
> I was hesitant to mention it at first, but since you broke the ice...I also
> wrote my own pool.  However, I took a different approach.  My pool only
> cares about doling out connections and making sure they are alive.  One or
> more processes could use the same conn at a given time (side question: is
> that a problem?).

Yes, I believe that is a problem.  Each pb client will have a unique
client ID used in the vector clock.  You want to avoid the same client
ID from making concurrent modifications.

+ seth

--
Seth Falcon | @sfalcon | http://userprimary.net/

_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com