Bitcask vs innostore, again

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Bitcask vs innostore, again

Dmitry Demeshchuk
Sorry, guys, I know this subject has been discussed thousand times.

So, briefly:

Innostore:

1. Time-proven technology
2. Very tunable.
3. Have been being used in production by us (Mochi Media) and other
companies for a long time.
4. Some tasty hidden features that can be implemented in Riak
(first/last key for a bucket, fast removal of a bucket, etc)
But:
5. Potentially slower than bitcask in many cases (but still at least comparable)
6. Not considered as the mainstream storage, so in fact no longer
being improved.
7. One bucket equals to one InnoDB table, which means a separate file.
Million of buckets == million of files.
8. Unlike bitcask, may require time to be repaired upon node failure.

Bitcask:

1. Fast, sometimes ridiculously fast.
2. Doesn't generate thousands of files.
3. Now being considered as the main Riak storage.
4. Has some nice features, for instance, LRU-like mechanism based on
removing old values upon merging. Most likely, will have more of them
in the future.
But:
5. Requires all the keys to fit in memory.
6. Can make your database grow fast if you make frequent value updates
(however, merges tuning helps, more or less).
7. Still immature compared to innostore.
8. There were production complaints about it some time ago.

To clarify the last point, I've been having myself some problems with
bitcask previously (running out of file descriptors, bad merges) and
heard that some people periodically try to migrate from innostore to
bitcask, and stick to innostore, keeping disappointing in bitcask. I
mean no offense to Basho team here, a lot of problems have been
successfully fixed during bitcask's lifetime. And no one can create a
perfect product in no time. Still, bitcask is very impressive, having
just a 1-year history. Dave Smith and guys have done a lot so far.

What I haven't heard about bitcask yet is any production success
stories. Which storage does Wikia use, for example? Or Vibrant Media?

So, I look for stories of at least 2-3 months experience of using
bitcask, with 10-20GB total data or larger. What problems have you
faced? Have you managed to solve them? What advantages have you got
using bitcask compared to innostore? Any details of the data sets you
use(updates/deletes/puts frequency, keys/buckets number, etc)? Do you
use any other backends along with bitcask?

Thank you.

--
Best regards,
Dmitry Demeshchuk

_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|

Re: Bitcask vs innostore, again

Justin Sheehy
Hi, Dmitry.

I will try to reply to some of the questions you raised about bitcask.

On Thu, Apr 7, 2011 at 12:30 AM, Dmitry Demeshchuk <[hidden email]> wrote:

> Now being considered as the main Riak storage.

It's not just being considered, it is the main Riak storage.  We are
very confident in bitcask's quality and it has been the default
storage engine now for some time.  Some people may of course still
choose innostore for various reasons but at Basho we believe that
Bitcask will better suit the needs of the majority of users.

> I've been having myself some problems with
> bitcask previously (running out of file descriptors, bad merges) and
> heard that some people periodically try to migrate from innostore to
> bitcask, and stick to innostore, keeping disappointing in bitcask.

We honestly don't hear much of any real problems with bitcask.  It is
true that depending on your setup riak can quickly run out of file
descriptors if you haven't set your ulimit properly, but that is
easily fixed.  (and is also true under innostore, just in slightly
different scenarios)

I am not sure what you mean by bad merges or any failed migrations --
I'd need to hear more details to reply to that part.

> What I haven't heard about bitcask yet is any production success
> stories. Which storage does Wikia use, for example? Or Vibrant Media?

I will leave it to each individual user to describe any details of
their own production configuration as that is not our privilege to
disclose.  However, I can certainly say that the majority of
production deployments are running bitcask.  There are a few notable
exceptions, certainly -- but bitcask is the typical storage engine for
Riak in production these days.  This certainly includes a number of
businesses with the volume and duration you described.

Others might share their anecdotes; what I can provide is an aggregate
view.  And from that perspective we are very happy with the
performance and stability that bitcask's known users are experiencing.

Best regards,

-Justin

_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|

Re: Bitcask vs innostore, again

matthew hawthorne
In reply to this post by Dmitry Demeshchuk
hi dmitry,

comments below:

On Thu, Apr 7, 2011 at 12:30 AM, Dmitry Demeshchuk <[hidden email]> wrote:
> What I haven't heard about bitcask yet is any production success
> stories. Which storage does Wikia use, for example? Or Vibrant Media?

> So, I look for stories of at least 2-3 months experience of using
> bitcask, with 10-20GB total data or larger. What problems have you
> faced? Have you managed to solve them? What advantages have you got
> using bitcask compared to innostore? Any details of the data sets you
> use(updates/deletes/puts frequency, keys/buckets number, etc)? Do you
> use any other backends along with bitcask?

I work for Comcast Interactive Media, and we've been using Riak (with
Bitcask) as the primary data store for one of our production systems
for about 6 months.  another group at Comcast recently launched a
Riak/Bitcask cluster into production also, but I don't know the
details of their setup.

we're storing well over 20GB of data, with a significant amount of
load, although nothing insane.  we've had zero problems -- our only
issue is a lack of visibility into our keyspace since we use a single
Riak bucket and are hesitant to list all of our keys via HTTP.  we're
working on a way to do that offline via directly scanning copies of
the Bitcask files.

we switched from Innostore to Bitcask at Basho's recommendation and it
seems like the right choice, even just to avoid the hassle of having
to build Innostore separately (and patch it since we use Solaris).

hopefully this helped.

-matt

_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com