Deleting items from search index increases disk usage

Ryan Zezeski rzezeski at basho.com
Tue Oct 30 15:47:51 EDT 2012


Jeremy,

This is how Merge Index (the index store behind Riak Search) works.  It is
log-based meaning deletes are first logical before they become physical.
 It does not update in-place as you stated in one of your replies.  When
you performed those deletes new logs were created containing logical
deletes (tombstones) causing more disk to be used.  Assuming other buckets
are still being indexed then compaction should be occurring and tombstones
should be reaped.  Meaning both the logical delete and the datum should be
removed from disk.  If no new indexes are arriving then nothing will be
compacted as there is no time-based trigger on Merge Index.

Merge Index could be doing a bad job of picking which segments to merge,
leaving a high % of tombstones on disk longer than necessary.  I'm curious,
what is the output from the following commands.

find /var/lib/riak/merge_index -name 'buffer.*' | xargs ls -lah

find /var/lib/riak/merge_index -name 'segment.*' | xargs ls -lah


On Mon, Oct 29, 2012 at 8:19 AM, Jeremy Raymond <jeraymond at gmail.com> wrote:

> So the only way to actually free the disk space consumed by the
> tombstones in the search index is to bring down the cluster and blow
> away the merge index (at /var/lib/riak/merge_index)?
>


If, and only if, you are no longer indexing _any_ buckets then this would
be the thing to do.  If you are still indexing some buckets then deleting
these files would break their indexes.

-Z
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20121030/7145d498/attachment.html>


More information about the riak-users mailing list