Getting all keys in a bucket

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Getting all keys in a bucket

Lucas Di Pentima
Hello again,

I inserted a few thousands (aprox 12000) records to a Riak embedded installation on my OSX Snow Leopard, and now I tried to get all the keys by using curl and ruby, and it takes a lot of time (some minutes!)

I suppose 12k keys should not be lots of data, can you tell me why does it took so long? Is there a configuration I should be tweaking?

Thanks
--
Lucas Di Pentima - Santa Fe, Argentina
Jabber: [hidden email]
MSN: [hidden email]





_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Getting all keys in a bucket

Sean Cribbs
Lucas,

Listing keys is kind of expensive right now.  If you use the streaming
feature, however, you should get keys back gradually, in chunks.

Example:

$ curl http://localhost:8098/raw/bucket?keys=stream

Sean

On 2/9/10 8:05 PM, Lucas Di Pentima wrote:

> Hello again,
>
> I inserted a few thousands (aprox 12000) records to a Riak embedded installation on my OSX Snow Leopard, and now I tried to get all the keys by using curl and ruby, and it takes a lot of time (some minutes!)
>
> I suppose 12k keys should not be lots of data, can you tell me why does it took so long? Is there a configuration I should be tweaking?
>
> Thanks
> --
> Lucas Di Pentima - Santa Fe, Argentina
> Jabber: [hidden email]
> MSN: [hidden email]
>
>
>
>
>
> _______________________________________________
> riak-users mailing list
> [hidden email]
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>    


_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Getting all keys in a bucket

Stephen C. Gilardi
In reply to this post by Lucas Di Pentima

On Feb 9, 2010, at 8:05 PM, Lucas Di Pentima wrote:

I inserted a few thousands (aprox 12000) records to a Riak embedded installation on my OSX Snow Leopard, and now I tried to get all the keys by using curl and ruby, and it takes a lot of time (some minutes!)

I suppose 12k keys should not be lots of data, can you tell me why does it took so long? Is there a configuration I should be tweaking?

In some testing we did, we put a bunch of records into riak with random (UUID) keys and needed to list the keys to retrieve them.

Key listing keys appeared to be an O(N^2) operation. In one 3-node cluster of m1-small machines at EC2, a rough formula for how long it took to list the keys in a single bucket was:

    time in minutes = (keys in the bucket / 10,000) ^ 2

This held for 10,000, 20,000, 30,000 (1, 4, and 9 minutes).

Asking one node for the list of keys appeared to result in significant CPU time being used only on that node.

To get better key listing performance, I found we could split up the key-value pairs into multiple buckets and then request the keys for the buckets from several threads in parallel. (We were still only asking one node for the lists, but many such requests were pending in parallel.). This appeared to engage many nodes in the task and aggregate performance became quite good.

I'd love to hear more and better ideas for fast key listing.

--Steve


_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Getting all keys in a bucket

Daniel Widgren
Stephen C. Gilardi wrote:

>
> On Feb 9, 2010, at 8:05 PM, Lucas Di Pentima wrote:
>
>> I inserted a few thousands (aprox 12000) records to a Riak embedded
>> installation on my OSX Snow Leopard, and now I tried to get all the
>> keys by using curl and ruby, and it takes a lot of time (some minutes!)
>>
>> I suppose 12k keys should not be lots of data, can you tell me why
>> does it took so long? Is there a configuration I should be tweaking?
>
> In some testing we did, we put a bunch of records into riak with
> random (UUID) keys and needed to list the keys to retrieve them.
>
> Key listing keys appeared to be an O(N^2) operation. In one 3-node
> cluster of m1-small machines at EC2, a rough formula for how long it
> took to list the keys in a single bucket was:
>
>     time in minutes = (keys in the bucket / 10,000) ^ 2
>
> This held for 10,000, 20,000, 30,000 (1, 4, and 9 minutes).
>
> Asking one node for the list of keys appeared to result in significant
> CPU time being used only on that node.
>
> To get better key listing performance, I found we could split up the
> key-value pairs into multiple buckets and then request the keys for
> the buckets from several threads in parallel. (We were still only
> asking one node for the lists, but many such requests were pending in
> parallel.). This appeared to engage many nodes in the task and
> aggregate performance became quite good.
>
> I'd love to hear more and better ideas for fast key listing.
>
> --Steve
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> riak-users mailing list
> [hidden email]
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>  

One idea we had when we worked on doing a e-commerce framework using
nitrogen and riak was to have one object in a bucket that just stored a
list of keys.

If we had a productbucket with we say 5 products and they had the key 1
to 5 the list object would have the key list and values [1,2,3,4,5].
This helped us getting all keys in a bucket in a fast way. Everytime you
removed or added a product the list would be updated.

We did try to have 5000 products and it started to go really slow when
you tried to get every product for it self. It is about 5000 calls to Riak.

Here we had two discussions in the end, one was that we did the same
for  the key list but with all products. That is if you want to get all
products.

[{product1, all data for product1}, {product2, all data for product2},
{product3, all data for product3}, {product4, all data for product4},
{product5, all data for product5}]

This will maybe help if you want all data from a bucket. Like in a
webshop when you list all products in a category. But not sure if it is
a good idea.

/Daniel

_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Getting all keys in a bucket

Lucas Di Pentima
In reply to this post by Sean Cribbs
Hi Sean,

I tried the streamed version, it worked pretty nice, a lot faster that the not streamed version, can you tell me why?

Regards

El 09/02/2010, a las 22:14, Sean Cribbs escribió:

> Lucas,
>
> Listing keys is kind of expensive right now.  If you use the streaming feature, however, you should get keys back gradually, in chunks.
>
> Example:
>
> $ curl http://localhost:8098/raw/bucket?keys=stream
>

--
Lucas Di Pentima - Santa Fe, Argentina
Jabber: [hidden email]
MSN: [hidden email]





_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Getting all keys in a bucket

Alexandre CONRAD-3
2010/2/10 Lucas Di Pentima <[hidden email]>:
> I tried the streamed version, it worked pretty nice, a lot faster that the not streamed version, can you tell me why?

As far as I understand, the streamed version allows you to start
receiving results by chunks as soon as the first results are fetched
by Riak.

I've never actually tried it, please correct me if I wrong.

Regards,

Alex
twitter.com/alexconrad

_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Getting all keys in a bucket

Sean Cribbs
In reply to this post by Lucas Di Pentima
Lucas,

It doesn't block while trying to list the keys, but instead sends them
to the client as they come back from the vnodes.  It's also not trying
to build a large JSON object in memory, but builds a bunch of small ones
which come across one-per-chunk.

Sean

On 2/10/10 9:42 AM, Lucas Di Pentima wrote:

> Hi Sean,
>
> I tried the streamed version, it worked pretty nice, a lot faster that the not streamed version, can you tell me why?
>
> Regards
>
> El 09/02/2010, a las 22:14, Sean Cribbs escribió:
>
>    
>> Lucas,
>>
>> Listing keys is kind of expensive right now.  If you use the streaming feature, however, you should get keys back gradually, in chunks.
>>
>> Example:
>>
>> $ curl http://localhost:8098/raw/bucket?keys=stream
>>
>>      
> --
> Lucas Di Pentima - Santa Fe, Argentina
> Jabber: [hidden email]
> MSN: [hidden email]
>
>
>
>
>
> _______________________________________________
> riak-users mailing list
> [hidden email]
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>    



_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Loading...