Listing keys in buckets

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Listing keys in buckets

Alexander Sicular
Hi Team Basho,

If an m/r job can take a bucket as an argument during the map phase and then analyze all the keys in that bucket according to user defined parameters, how does the map phase know how to get all the keys without doing a costly get keys function on the bucket? Or is it doing just that?

So I've been playing around stress testing, trying to grok wtw is going on, and it seems that building my own index of keys and saving that in another bucket is orders of magnitude faster than just asking riak for the keys. I might even skip the bucket entirely and use redis sets (push/pop/exists atomic operations). In that vein, I've heard that triggers may be coming.... and if so, perhaps riak could automagically start keeping a list of its keys in a special key(s) place somewhere.

Grats on all the great work so far and the latest release.

-Alexander

(most of this is cut right out of irc)
_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|

Re: Listing keys in buckets

Sean Cribbs-2
Alexander,

Yes, in general it is going to be faster to maintain your own index with links or some other reference to the original object, than to list all and filter.  You wouldn't do the latter in a relational database, nor should you in Riak if you can avoid it.  Think of it this way - an indexed lookup is going to be O(k) [where k is 1 + the number of objects linked from the index object], whereas a full-bucket scan is going to be O(N).  The fact that maps are done in parallel doesn't reduce the complexity, it just makes the elapsed wall-time shorter.

I've also been told that triggers are coming, and this use-case is one that is on the devs' mind.  In the meantime, you could write a reduce phase in Erlang that creates the index, or do it manually on write in your application's language.

Sean Cribbs <[hidden email]>
Developer Advocate
Basho Technologies, Inc.
http://basho.com/

On Mar 11, 2010, at 2:57 AM, Alexander Sicular wrote:

> Hi Team Basho,
>
> If an m/r job can take a bucket as an argument during the map phase and then analyze all the keys in that bucket according to user defined parameters, how does the map phase know how to get all the keys without doing a costly get keys function on the bucket? Or is it doing just that?
>
> So I've been playing around stress testing, trying to grok wtw is going on, and it seems that building my own index of keys and saving that in another bucket is orders of magnitude faster than just asking riak for the keys. I might even skip the bucket entirely and use redis sets (push/pop/exists atomic operations). In that vein, I've heard that triggers may be coming.... and if so, perhaps riak could automagically start keeping a list of its keys in a special key(s) place somewhere.
>
> Grats on all the great work so far and the latest release.
>
> -Alexander
>
> (most of this is cut right out of irc)
> _______________________________________________
> riak-users mailing list
> [hidden email]
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com