Riak Search

Ryan Zezeski rzezeski at basho.com
Tue Oct 16 08:44:04 EDT 2012

On Sun, Oct 14, 2012 at 12:33 AM, Pavel Kogan <pavel.kogan at cortica.com>wrote:
> 1) Is search enabling has any impact on read latency/throughput?

If you are reading and searching at the same time there is a good chance it
will.  It will cause more disk seeks.

> 2) Is search enabling has any impact on RAM usage?

Yes, the index engine behind Riak Search makes heavy usage of Erlang ETS
tables.  Each partition has an in-memory buffer as well as an in-memory
offset table for every segment.  It also uses a temporary ETS table for
every write to store posting data.  The ETS system limit can even become an
issue in overload scenarios.

> 3) In production we have no search enabled. What is the best way to
>     enable search without stop production? I thought about something like:
>     1) Enable search node after node.

You could change the app env dynamically but that's only half the problem.
 The other half is then starting the Riak Search application.  I think
application:start(merge_index) followed by application:start(riak_search)
should work but I'm not 100% sure and this has not been tested.  You'll
also want to make sure to edit all app.configs so that it is persistent.

    2) Execute some night script that runs on all keys and overwrite them
> back
>         with proper mime type.

Yes, you'll want to install the commit hook on the buckets you wish to
index.  Then you'll want to do a streaming list-keys or bucket map-reduce
and re-write the data.

4) If we see that search overhead is something we can't handle, is there
> simple
>     way to disable it without stop production?

I think the best course of action in this case would be to disable the
commit hook.  But you would have to keep track of anything written during
this time and re-write it after re-installing the hook.  If you don't then
you'll have to re-index everything because you don't know what you missed.

5) In what case we would need repair? It is said - on replica loss, but if
> I understand
>     correct we have 3 replicas on different nodes don't we? If it happens
> how difficult and
>     long would it be for large cluster (about 100 nodes)?

Repair is on a per partition basis.  Number of nodes doesn't come into
play.  Repair is very specific in that it requires the adjacent partitions
to be in a good, convergent state.  If they aren't then repair isn't much

A lot of these entropy issues go away in Yokozuna.  Repairing indexes is
done automatically, in the background, in an efficient manner.  There is no
need to re-write data or run manual repair commands.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20121016/59e86010/attachment.html>

More information about the riak-users mailing list