Benchmarks of backends

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Benchmarks of backends

Anthony Molinaro
Hi,

  I'm wondering if anyone has done any testing with regards to memory
usage of various backends.  After recent emails about the large overhead
of bitcask keydir indexes, and by comparing with my current production
nodes. I find that the overhead per key ends up being too large for
small keys.

  So I'm in the market for a new backend, and was wondering if anyone
out there has done any measurements on memory overhead per key, and
access times.

I'm also wondering if there are any backends floating out there I haven't
found. I've done some google searches to come across

  https://github.com/krestenkrab/riak_btree_backend
  https://github.com/cstar/riak_redis_backend

but I'm assuming there might be others.

Also, I figure it would be interesting to understand the overhead for
the built in backends and innnostore and possibly look at other stores
I've found which seem to have erlang wrappers like

LevelDB:
  https://github.com/basho/e_leveldb
  https://github.com/davisp/erleveldb
Tokyo Cabinet:
  https://github.com/rabbitmq/toke
Berkeley DB:
  https://github.com/krestenkrab/bets

So anyone know anything about these backends or other k/v stores in terms
of memory versus disk for large datasets?

The thing prompting this is a cassandra cluster with about 14 billion
entries (7 billion with replication factor of 2), which uses 60 machines.
I was trying to determine how many bitcask backed machines it would take
to store this data and it ends up being about 150.  This is mostly because
of the 84 bytes of overhead per key (43 bytes by calculations determined
on this list a few weeks ago, another 41 by measuring my current production
setup).  Even with the keys of 17 bytes, thats 101 bytes of overhead,
so just wondering if there's anything better.

Anyway, I'm trying to get some hardware to run basho_bench with and will
try out some different things, but if anyone has done any of this work
already it might be interesting to know.

Thanks,

-Anthony

--
------------------------------------------------------------------------
Anthony Molinaro                           <[hidden email]>

_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|

Re: Benchmarks of backends

Justin Sheehy
Hi, Anthony.

Most people using Riak today use either Bitcask or Innostore, as I suspect you know. Bitcask has excellent performance, but the limitation that you are aware of with a hard limit on number of keys per unit of available RAM. Innostore does not have that limitation, but is much harder to achieve equivalent performance on.

You've noticed that multiple people (including Basho's own Dizzy and also the estimable Paul Davis) have produced wrappers for LevelDB, and indeed we are currently evaluating this as another alternative storage engine behind Riak.  We will be posting some performance thoughts on LevelDB shortly, and generally it looks promising.  The main blocker at this point is portability; we would like for the backend to run well on all of Riak's existing main platforms.

Expect more from us on this soon. The short answer is that if you have too many keys for bitcask, the answer today is usually Innostore but soon might be LevelDB instead.

Best,

-Justin




On Jun 17, 2011, at 7:12 PM, Anthony Molinaro wrote:

> Hi,
>
>  I'm wondering if anyone has done any testing with regards to memory
> usage of various backends.  After recent emails about the large overhead
> of bitcask keydir indexes, and by comparing with my current production
> nodes. I find that the overhead per key ends up being too large for
> small keys.
>
>  So I'm in the market for a new backend, and was wondering if anyone
> out there has done any measurements on memory overhead per key, and
> access times.
>
> I'm also wondering if there are any backends floating out there I haven't
> found. I've done some google searches to come across
>
>  https://github.com/krestenkrab/riak_btree_backend
>  https://github.com/cstar/riak_redis_backend
>
> but I'm assuming there might be others.
>
> Also, I figure it would be interesting to understand the overhead for
> the built in backends and innnostore and possibly look at other stores
> I've found which seem to have erlang wrappers like
>
> LevelDB:
>  https://github.com/basho/e_leveldb
>  https://github.com/davisp/erleveldb
> Tokyo Cabinet:
>  https://github.com/rabbitmq/toke
> Berkeley DB:
>  https://github.com/krestenkrab/bets
>
> So anyone know anything about these backends or other k/v stores in terms
> of memory versus disk for large datasets?
>
> The thing prompting this is a cassandra cluster with about 14 billion
> entries (7 billion with replication factor of 2), which uses 60 machines.
> I was trying to determine how many bitcask backed machines it would take
> to store this data and it ends up being about 150.  This is mostly because
> of the 84 bytes of overhead per key (43 bytes by calculations determined
> on this list a few weeks ago, another 41 by measuring my current production
> setup).  Even with the keys of 17 bytes, thats 101 bytes of overhead,
> so just wondering if there's anything better.
>
> Anyway, I'm trying to get some hardware to run basho_bench with and will
> try out some different things, but if anyone has done any of this work
> already it might be interesting to know.
>
> Thanks,
>
> -Anthony
>
> --
> ------------------------------------------------------------------------
> Anthony Molinaro                           <[hidden email]>
>
> _______________________________________________
> riak-users mailing list
> [hidden email]
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com