Why is Riak Search using the leveldb backend?

Elias Levy fearsome.lucidity at gmail.com
Sat Nov 12 16:14:38 EST 2011


On Fri, Nov 11, 2011 at 5:57 PM, Ryan Zezeski <rzezeski at basho.com> wrote:

> This is an implementation detail of Search.  It stores something we call a
> "proxy object" under the bucket _rsid_<index name> [1].  It does this so it
> knows which entries to remove from the index when an object is
> updated/deleted.  To achieve your goal you should be able to set the
> buckets `_rsid_bucket1` and `_rsid_bucket2` to use the `bucket1` and
> `bucket2` backends, respectively.
>

Ryan,

Thanks.  That makes sense. I actually wondered how you took care of that.
 Fetching the objects before an update and retokenizing them,  just so you
could delete the previous search index posting seemed very inefficient.

Its interesting that the proxy objects are taking about the same disk space
as the actually data, and significantly less than the search index itself
in the merge_index backend, which is about a third the size of the proxy
objects in leveldb.  We do have lots of indexes with long keys, and while
merge_index compresses its data, from discussion on the list I believe you
turned off Snappy in leveldb because of portability issues.

Are there plans to enable Snappy some time in the future?  It probably give
a good performance bump in environments like EC2 where the CPU to IO
tradeoff is so lopsided.

Elias
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20111112/e0a8ca6d/attachment.html>


More information about the riak-users mailing list