Riak Search

Pavel Kogan pavel.kogan at cortica.com
Tue Oct 16 08:49:36 EDT 2012


Thanks a lot.

On Tue, Oct 16, 2012 at 2:44 PM, Ryan Zezeski <rzezeski at basho.com> wrote:

>
>
> On Sun, Oct 14, 2012 at 12:33 AM, Pavel Kogan <pavel.kogan at cortica.com>wrote:
>>
>>
>> 1) Is search enabling has any impact on read latency/throughput?
>>
>
> If you are reading and searching at the same time there is a good chance
> it will.  It will cause more disk seeks.
>
>
>> 2) Is search enabling has any impact on RAM usage?
>>
>
> Yes, the index engine behind Riak Search makes heavy usage of Erlang ETS
> tables.  Each partition has an in-memory buffer as well as an in-memory
> offset table for every segment.  It also uses a temporary ETS table for
> every write to store posting data.  The ETS system limit can even become an
> issue in overload scenarios.
>
>
>> 3) In production we have no search enabled. What is the best way to
>>     enable search without stop production? I thought about something like:
>>     1) Enable search node after node.
>>
>
> You could change the app env dynamically but that's only half the problem.
>  The other half is then starting the Riak Search application.  I think
> application:start(merge_index) followed by application:start(riak_search)
> should work but I'm not 100% sure and this has not been tested.  You'll
> also want to make sure to edit all app.configs so that it is persistent.
>
>
>>
>     2) Execute some night script that runs on all keys and overwrite them
>> back
>>         with proper mime type.
>>
>
> Yes, you'll want to install the commit hook on the buckets you wish to
> index.  Then you'll want to do a streaming list-keys or bucket map-reduce
> and re-write the data.
>
>
>>
> 4) If we see that search overhead is something we can't handle, is there
>> simple
>>     way to disable it without stop production?
>>
>
> I think the best course of action in this case would be to disable the
> commit hook.  But you would have to keep track of anything written during
> this time and re-write it after re-installing the hook.  If you don't then
> you'll have to re-index everything because you don't know what you missed.
>
> 5) In what case we would need repair? It is said - on replica loss, but if
>> I understand
>>     correct we have 3 replicas on different nodes don't we? If it happens
>> how difficult and
>>     long would it be for large cluster (about 100 nodes)?
>>
>
> Repair is on a per partition basis.  Number of nodes doesn't come into
> play.  Repair is very specific in that it requires the adjacent partitions
> to be in a good, convergent state.  If they aren't then repair isn't much
> help.
>
> A lot of these entropy issues go away in Yokozuna.  Repairing indexes is
> done automatically, in the background, in an efficient manner.  There is no
> need to re-write data or run manual repair commands.
>
> -Z
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20121016/345dfe6e/attachment.html>


More information about the riak-users mailing list