Riak: leveldb vs multi backend disk usage

Daniel Miller dmiller at dimagi.com
Thu Jan 26 16:29:46 EST 2017

Hi Riak Users,

I am in the process of migrating a few Riak CS clusters from mutli to
leveldb backend. I am aware this is not an officially supported
configuration, but I feel it will be better for my (very limited) hardware
constraints, especially RAM, and I am not too concerned about the lower
performance of leveldb vs bitcask.

First a note on riak configuration (riak.conf): I have changed the
storage_backend from the default value of "multi" to "leveldb" and I have
removed the advanced.conf file from the config dir. According to the
documentation, it seems this is the recommended way to configure Riak to
use the leveldb backend. The rest of the configuration is using defaults
recommended for Riak CS. I could not find any specific documentation on how
to configure Riak CS with leveldb backend, although this is not surprising
since it is not officially supported.

Cluster migration process:
- setup new nodes with the new leveldb backend configuration
- for each new node (in serial):
  - join the node to the cluster (riak-admin cluster join)
  - replace an old node (riak-admin cluster replace)
  - wait for replace to complete and ring ready
  - proceed to next node

The most significant thing I have noticed after migrating a cluster is that
the new leveldb-backend nodes are using significantly less disk space than
the old multi-backend nodes. For example, disk usage is down from 55% to
13% (same size disks on old and new nodes). Is this dramatic difference
expected? I can formulate explanations in my head, but they're based more
on loose assumptions than known behaviors. For example: leveldb uses
compression, bitcask does not, and we have a highly compressible data set
(mostly XML documents). If CS is storing a copy of each document in each
backend in the multi configuration, then it stands to reason that the disk
usage could drop significantly since the data is highly compressible.

I have spot-checked various documents and all data that I've checked is
present on the fully migrated cluster. There are no errors in the logs. I
can see the vnode handoffs triggered by node replacements in the logs and
there are no errors or warnings there, in both the old as well as the new
nodes' logs. The data dir on the old node is nearly empty once a node
replacement has completed, which means all data is being deleted from the
old node during the replacement.

Cluster size is 5 nodes. N-value is the default (3). These have not changed
during the migration.

Thanks in advance for any information you can provide.

Daniel Miller
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20170126/1611054a/attachment-0002.html>

More information about the riak-users mailing list