one big bucket with a lot of keys or many buckets with less keys?

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

one big bucket with a lot of keys or many buckets with less keys?

Tux Racer
Hello Riak Users,

This is a newbie question about schema design.
I have data that can be partitioned by users (e.g users post blog articles)

So I am thinking about two different schemas:

1) 1st schema
----------------------
a big bucket "users"
a big bucket "posts"

and users linking to posts

or

2) 2nd schema
---------------------
 a big bucket "users"
and a lot of small buckets:
postsuser1
postsuser2
....
postsuserN

The advantage of the 2nd schema may be that some mapreduce jobs dealing
with the users posts may run faster.
But is there a memory implication/disk implication of having many buckets.

Other KV stores do not work well with the 2nd schema: for instance in
Hbase you begin to distribute the load once the size of the table
(~bucket) is larger than 250Mb.

Thanks in advance
TuX

_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|

Re: one big bucket with a lot of keys or many buckets with less keys?

Justin Sheehy
Hello, TuX.

Either of your two schemas can work.  If you intend to list the entire
contents of your conceptual smaller buckets often, then schema 2 will
definitely help with that.

You'll only run into trouble when you have very many buckets,
depending on backend, as some storage engines will create separate
files per bucket which can get a bit tricky once the numbers get
large.

The HBase concern is not a worry here.  No matter how small your
buckets, the distribution properties will be the same.

Have fun!

-Justin

_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com