support for 'multi_get'?

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

support for 'multi_get'?

Tux Racer
Hello Riak Users,

This is another newbie question:
I am reading the doc at
http://riak.basho.com/edoc/riak_client.html
and was wondering if there was support for a multiget function, i,.e.
having a list of (bucket_i,key_i) (or (bucket,(key_1,key_2,key_3...)) do
concurrently a get on each (bucket,key).
As erlang is presented as a 'concurrency oriented' programming language,
I'd guess it should be rather easy to do.
Or is such a use case not recommended as being too slow? What would be
the limiting speed factor in such a case: disk speed (random access) or
network speed?

Also I am wondering about how links walking (or mapred) internally
works. Does riak do concurrent multi gets based on link keys, wait for
all the answers, and then present the whole result ?

Thanks in advance
TuX


_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|

Re: support for 'multi_get'?

Grant Schofield

On Mar 30, 2010, at 4:50 AM, TuX RaceR wrote:

> Hello Riak Users,
>
> This is another newbie question:
> I am reading the doc at
> http://riak.basho.com/edoc/riak_client.html
> and was wondering if there was support for a multiget function, i,.e. having a list of (bucket_i,key_i) (or (bucket,(key_1,key_2,key_3...)) do concurrently a get on each (bucket,key).
> As erlang is presented as a 'concurrency oriented' programming language, I'd guess it should be rather easy to do.
> Or is such a use case not recommended as being too slow? What would be the limiting speed factor in such a case: disk speed (random access) or network speed?
>

There currently isn't a multiget function, but there aren't any major limitations preventing us from adding it. The one concern might be error reporting (IE you request 100 keys and 1 fails), but that shouldn't be too hard to work around.  I have opened a bug (http://issues.basho.com/show_bug.cgi?id=96) to track this feature request.  

One way to work around the lack of multiget would be to pass the keys you want as input to a map/reduce query and return the result, but you wouldn't have the ability to change the R value like you do with a standard get request.

> Also I am wondering about how links walking (or mapred) internally works. Does riak do concurrent multi gets based on link keys, wait for all the answers, and then present the whole result ?

Multiget isn't done for map/reduce and links. A get is performed for each of the objects sent to the map phase, but those gets and the map code are done on the local node on which the key exists.

Thanks for the feature request.

Grant Schofield
Developer Advocate
Basho Technologies, Inc.
_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|

Re: support for 'multi_get'?

John Lynch
TuX,

You can also issue multiple concurrent HTTP requests yourself, I have an example of doing this with Ruby/Typhoeus here:    http://bit.ly/9h1eF9


Regards,

John Lynch, CTO
Rigel Group, LLC
[hidden email]


On Tue, Mar 30, 2010 at 8:47 AM, Grant Schofield <[hidden email]> wrote:

On Mar 30, 2010, at 4:50 AM, TuX RaceR wrote:

> Hello Riak Users,
>
> This is another newbie question:
> I am reading the doc at
> http://riak.basho.com/edoc/riak_client.html
> and was wondering if there was support for a multiget function, i,.e. having a list of (bucket_i,key_i) (or (bucket,(key_1,key_2,key_3...)) do concurrently a get on each (bucket,key).
> As erlang is presented as a 'concurrency oriented' programming language, I'd guess it should be rather easy to do.
> Or is such a use case not recommended as being too slow? What would be the limiting speed factor in such a case: disk speed (random access) or network speed?
>

There currently isn't a multiget function, but there aren't any major limitations preventing us from adding it. The one concern might be error reporting (IE you request 100 keys and 1 fails), but that shouldn't be too hard to work around.  I have opened a bug (http://issues.basho.com/show_bug.cgi?id=96) to track this feature request.

One way to work around the lack of multiget would be to pass the keys you want as input to a map/reduce query and return the result, but you wouldn't have the ability to change the R value like you do with a standard get request.

> Also I am wondering about how links walking (or mapred) internally works. Does riak do concurrent multi gets based on link keys, wait for all the answers, and then present the whole result ?

Multiget isn't done for map/reduce and links. A get is performed for each of the objects sent to the map phase, but those gets and the map code are done on the local node on which the key exists.

Thanks for the feature request.

Grant Schofield
Developer Advocate
Basho Technologies, Inc.
_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|

Re: support for 'multi_get'?

Tux Racer
In reply to this post by Grant Schofield
Thanks Grant, for logging the enhancement request.
FYI some other key values stores (eg:
https://issues.apache.org/jira/browse/CASSANDRA-70
http://issues.apache.org/jira/browse/HBASE-1845
)
have or are trying to implement multi gets.
so it would be nice to have any way
TuX
Grant Schofield wrote:
>
> There currently isn't a multiget function, but there aren't any major limitations preventing us from adding it. The one concern might be error reporting (IE you request 100 keys and 1 fails), but that shouldn't be too hard to work around.  I have opened a bug (http://issues.basho.com/show_bug.cgi?id=96) to track this feature request.  
>
> One way to work around the lack of multiget would be to pass the keys you want as input to a map/reduce query and return the result, but you wouldn't have the ability to change the R value like you do with a standard get request.
>  


_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|

Re: support for 'multi_get'?

Tux Racer
In reply to this post by John Lynch
Thanks John, for the link to your ruby code.
To get a faster answer probably an native (erlang) solution would beat a
HTTP solution.
If you request 100 keys, then you'll need to open 100 HTTP sockets and
then 100 erlang sockets (or maybe there is a ruse of connections).
Also I am wondering how this scales.
The optimistic me would say that doing N gets in parallel would take
just as long as doing 1 get. ;)
This may be true for N low (say 10 or 20) but probably false for larger
values of N (e.g. 100) ;)

Thanks
TuX


John Lynch wrote:

> TuX,
>
> You can also issue multiple concurrent HTTP requests yourself, I have
> an example of doing this with Ruby/Typhoeus here:    http://bit.ly/9h1eF9
>
>
> Regards,
>
> John Lynch, CTO
> Rigel Group, LLC
> [hidden email] <mailto:[hidden email]>
>
>
> On Tue, Mar 30, 2010 at 8:47 AM, Grant Schofield <[hidden email]
> <mailto:[hidden email]>> wrote:
>
>
>     On Mar 30, 2010, at 4:50 AM, TuX RaceR wrote:
>
>     > Hello Riak Users,
>     >
>     > This is another newbie question:
>     > I am reading the doc at
>     > http://riak.basho.com/edoc/riak_client.html
>     > and was wondering if there was support for a multiget function,
>     i,.e. having a list of (bucket_i,key_i) (or
>     (bucket,(key_1,key_2,key_3...)) do concurrently a get on each
>     (bucket,key).
>     > As erlang is presented as a 'concurrency oriented' programming
>     language, I'd guess it should be rather easy to do.
>     > Or is such a use case not recommended as being too slow? What
>     would be the limiting speed factor in such a case: disk speed
>     (random access) or network speed?
>     >
>
>     There currently isn't a multiget function, but there aren't any
>     major limitations preventing us from adding it. The one concern
>     might be error reporting (IE you request 100 keys and 1 fails),
>     but that shouldn't be too hard to work around.  I have opened a
>     bug (http://issues.basho.com/show_bug.cgi?id=96) to track this
>     feature request.
>
>     One way to work around the lack of multiget would be to pass the
>     keys you want as input to a map/reduce query and return the
>     result, but you wouldn't have the ability to change the R value
>     like you do with a standard get request.
>
>     > Also I am wondering about how links walking (or mapred)
>     internally works. Does riak do concurrent multi gets based on link
>     keys, wait for all the answers, and then present the whole result ?
>
>     Multiget isn't done for map/reduce and links. A get is performed
>     for each of the objects sent to the map phase, but those gets and
>     the map code are done on the local node on which the key exists.
>
>     Thanks for the feature request.
>
>     Grant Schofield
>     Developer Advocate
>     Basho Technologies, Inc.
>     _______________________________________________
>     riak-users mailing list
>     [hidden email] <mailto:[hidden email]>
>     http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>


_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|

Re: support for 'multi_get'?

Gareth Stokes
Hrrm, Im finding a need for this feature as well. 
Thinking the MapReduce solution is probably the safest bet seeing as I can't really limit my users to a N gets of under 20 or so. 

In the hypothetical world, if I perform a MapReduce over say 1k different keys, are you saying that would open 1k local sockets on my riak cluster?

On 31 March 2010 03:13, TuX RaceR <[hidden email]> wrote:
Thanks John, for the link to your ruby code.
To get a faster answer probably an native (erlang) solution would beat a HTTP solution.
If you request 100 keys, then you'll need to open 100 HTTP sockets and then 100 erlang sockets (or maybe there is a ruse of connections).
Also I am wondering how this scales.
The optimistic me would say that doing N gets in parallel would take just as long as doing 1 get. ;)
This may be true for N low (say 10 or 20) but probably false for larger values of N (e.g. 100) ;)

Thanks
TuX


John Lynch wrote:
TuX,

You can also issue multiple concurrent HTTP requests yourself, I have an example of doing this with Ruby/Typhoeus here:    http://bit.ly/9h1eF9


Regards,

John Lynch, CTO
Rigel Group, LLC
[hidden email] <mailto:[hidden email]>



On Tue, Mar 30, 2010 at 8:47 AM, Grant Schofield <[hidden email] <mailto:[hidden email]>> wrote:


   On Mar 30, 2010, at 4:50 AM, TuX RaceR wrote:

   > Hello Riak Users,
   >
   > This is another newbie question:
   > I am reading the doc at
   > http://riak.basho.com/edoc/riak_client.html
   > and was wondering if there was support for a multiget function,
   i,.e. having a list of (bucket_i,key_i) (or
   (bucket,(key_1,key_2,key_3...)) do concurrently a get on each
   (bucket,key).
   > As erlang is presented as a 'concurrency oriented' programming
   language, I'd guess it should be rather easy to do.
   > Or is such a use case not recommended as being too slow? What
   would be the limiting speed factor in such a case: disk speed
   (random access) or network speed?
   >

   There currently isn't a multiget function, but there aren't any
   major limitations preventing us from adding it. The one concern
   might be error reporting (IE you request 100 keys and 1 fails),
   but that shouldn't be too hard to work around.  I have opened a
   bug (http://issues.basho.com/show_bug.cgi?id=96) to track this
   feature request.

   One way to work around the lack of multiget would be to pass the
   keys you want as input to a map/reduce query and return the
   result, but you wouldn't have the ability to change the R value
   like you do with a standard get request.

   > Also I am wondering about how links walking (or mapred)
   internally works. Does riak do concurrent multi gets based on link
   keys, wait for all the answers, and then present the whole result ?

   Multiget isn't done for map/reduce and links. A get is performed
   for each of the objects sent to the map phase, but those gets and
   the map code are done on the local node on which the key exists.

   Thanks for the feature request.

   Grant Schofield
   Developer Advocate
   Basho Technologies, Inc.
   _______________________________________________
   riak-users mailing list
   [hidden email] <mailto:[hidden email]>


_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|

Re: support for 'multi_get'?

Sean Cribbs-2
Gareth,

Sorry for the delayed reply.  No, it will not open 1k local sockets in the cluster - all requests within the cluster are done via Erlang messaging, which is abstracted away from the application and managed by the Erlang VM.

Sean Cribbs <[hidden email]>
Developer Advocate
Basho Technologies, Inc.

On Apr 7, 2010, at 8:21 PM, Gareth Stokes wrote:

Hrrm, Im finding a need for this feature as well. 
Thinking the MapReduce solution is probably the safest bet seeing as I can't really limit my users to a N gets of under 20 or so. 

In the hypothetical world, if I perform a MapReduce over say 1k different keys, are you saying that would open 1k local sockets on my riak cluster?

On 31 March 2010 03:13, TuX RaceR <[hidden email]> wrote:
Thanks John, for the link to your ruby code.
To get a faster answer probably an native (erlang) solution would beat a HTTP solution.
If you request 100 keys, then you'll need to open 100 HTTP sockets and then 100 erlang sockets (or maybe there is a ruse of connections).
Also I am wondering how this scales.
The optimistic me would say that doing N gets in parallel would take just as long as doing 1 get. ;)
This may be true for N low (say 10 or 20) but probably false for larger values of N (e.g. 100) ;)

Thanks
TuX


John Lynch wrote:
TuX,

You can also issue multiple concurrent HTTP requests yourself, I have an example of doing this with Ruby/Typhoeus here:    http://bit.ly/9h1eF9


Regards,

John Lynch, CTO
Rigel Group, LLC
[hidden email] <mailto:[hidden email]>



On Tue, Mar 30, 2010 at 8:47 AM, Grant Schofield <[hidden email] <mailto:[hidden email]>> wrote:


   On Mar 30, 2010, at 4:50 AM, TuX RaceR wrote:

   > Hello Riak Users,
   >
   > This is another newbie question:
   > I am reading the doc at
   > http://riak.basho.com/edoc/riak_client.html
   > and was wondering if there was support for a multiget function,
   i,.e. having a list of (bucket_i,key_i) (or
   (bucket,(key_1,key_2,key_3...)) do concurrently a get on each
   (bucket,key).
   > As erlang is presented as a 'concurrency oriented' programming
   language, I'd guess it should be rather easy to do.
   > Or is such a use case not recommended as being too slow? What
   would be the limiting speed factor in such a case: disk speed
   (random access) or network speed?
   >

   There currently isn't a multiget function, but there aren't any
   major limitations preventing us from adding it. The one concern
   might be error reporting (IE you request 100 keys and 1 fails),
   but that shouldn't be too hard to work around.  I have opened a
   bug (http://issues.basho.com/show_bug.cgi?id=96) to track this
   feature request.

   One way to work around the lack of multiget would be to pass the
   keys you want as input to a map/reduce query and return the
   result, but you wouldn't have the ability to change the R value
   like you do with a standard get request.

   > Also I am wondering about how links walking (or mapred)
   internally works. Does riak do concurrent multi gets based on link
   keys, wait for all the answers, and then present the whole result ?

   Multiget isn't done for map/reduce and links. A get is performed
   for each of the objects sent to the map phase, but those gets and
   the map code are done on the local node on which the key exists.

   Thanks for the feature request.

   Grant Schofield
   Developer Advocate
   Basho Technologies, Inc.
   _______________________________________________
   riak-users mailing list
   [hidden email] <mailto:[hidden email]>


_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com