Riak: leveldb vs multi backend disk usage

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Riak: leveldb vs multi backend disk usage

Daniel Miller
Hi Riak Users,

I am in the process of migrating a few Riak CS clusters from mutli to leveldb backend. I am aware this is not an officially supported configuration, but I feel it will be better for my (very limited) hardware constraints, especially RAM, and I am not too concerned about the lower performance of leveldb vs bitcask.

First a note on riak configuration (riak.conf): I have changed the storage_backend from the default value of "multi" to "leveldb" and I have removed the advanced.conf file from the config dir. According to the documentation, it seems this is the recommended way to configure Riak to use the leveldb backend. The rest of the configuration is using defaults recommended for Riak CS. I could not find any specific documentation on how to configure Riak CS with leveldb backend, although this is not surprising since it is not officially supported.

Cluster migration process:
- setup new nodes with the new leveldb backend configuration
- for each new node (in serial):
  - join the node to the cluster (riak-admin cluster join)
  - replace an old node (riak-admin cluster replace)
  - wait for replace to complete and ring ready
  - proceed to next node

The most significant thing I have noticed after migrating a cluster is that the new leveldb-backend nodes are using significantly less disk space than the old multi-backend nodes. For example, disk usage is down from 55% to 13% (same size disks on old and new nodes). Is this dramatic difference expected? I can formulate explanations in my head, but they're based more on loose assumptions than known behaviors. For example: leveldb uses compression, bitcask does not, and we have a highly compressible data set (mostly XML documents). If CS is storing a copy of each document in each backend in the multi configuration, then it stands to reason that the disk usage could drop significantly since the data is highly compressible.

I have spot-checked various documents and all data that I've checked is present on the fully migrated cluster. There are no errors in the logs. I can see the vnode handoffs triggered by node replacements in the logs and there are no errors or warnings there, in both the old as well as the new nodes' logs. The data dir on the old node is nearly empty once a node replacement has completed, which means all data is being deleted from the old node during the replacement.

Cluster size is 5 nodes. N-value is the default (3). These have not changed during the migration.

Thanks in advance for any information you can provide.

Daniel Miller

_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|

Re: Riak: leveldb vs multi backend disk usage

Alexander Sicular-2
Riak CS stores data chunks in bitcask and the index/metadata file in leveldb. Bitcask, as noted, has no compression. When you force Riak to use level for the data chunks you get compression for that data which may or may not be good for your use case. If it's not good for your use case I believe you can turn level compression off.

-Alexander


@siculars
http://siculars.posthaven.com

Sent from my iRotaryPhone

> On Jan 27, 2017, at 04:29, Daniel Miller <[hidden email]> wrote:
>
> For

_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com