Riak Search

Pavel Kogan pavel.kogan at cortica.com
Sun Oct 14 00:33:49 EDT 2012


Hi Ryan,

Thanks for your reply. I have few more questions:

1) Is search enabling has any impact on read latency/throughput?
2) Is search enabling has any impact on RAM usage?
3) In production we have no search enabled. What is the best way to
    enable search without stop production? I thought about something like:
    1) Enable search node after node.
    2) Execute some night script that runs on all keys and overwrite them
back
        with proper mime type.
4) If we see that search overhead is something we can't handle, is there
simple
    way to disable it without stop production?
5) In what case we would need repair? It is said - on replica loss, but if
I understand
    correct we have 3 replicas on different nodes don't we? If it happens
how difficult and
    long would it be for large cluster (about 100 nodes)?

Thanks,
   Pavel


On Sun, Oct 14, 2012 at 5:03 AM, Ryan Zezeski <rzezeski at basho.com> wrote:

> Pavel,
>
> On Sat, Oct 13, 2012 at 12:59 AM, Pavel Kogan <pavel.kogan at cortica.com>wrote:
>
>
>> Those limitations leaves us a single option of Riak Search and I have a
>> few questions about it.
>>
>
> I'm working on a new solution, named Yokozuna, that integrates Riak and
> Solr.  It's not currently part of Riak but my goal is to make it so.
>
> https://github.com/rzezeski/yokozuna
>
>
>>  1) We saw, that after enabling search option and adding search
>> precommit hook, store speed (our tests were
>>     done on single test node) became x10 slower. Is it normal?
>>
>
> Degradation of throughput after enabling Riak Search is normal.  Riak
> Search does a lot of work during index time.  It has to analyze the data
> and each indexed document has a good chance of causing writes to every node
> in the cluster because of term-based partitioning.
>
>
>> 2) If we have dedicated node in cluster for search (which would not be
>> used for KV store/get operations) would
>>     it do some impact of general cluster performance?
>>
>
> You cannot dedicate a node for Riak Search.  Every node is required to
> participate in KV.  Every node must have Riak Search enabled for search to
> work.
>
>
>> 3) For 1M keys in cluster search runs very fast. How it would scale for
>> 100M (or even much more keys)?
>>     How it would scale with number of nodes in cluster?
>>
>
> This depends on the query.  A single-term query with a reasonable result
> set typically has a latency of a disk seek plus a few milliseconds.  This
> is the benefit of term-based partitioning, at query time.  A boolean query
> containing one large result set, even if the others are small, can bite you
> because of sub-optimal logic in the intersection code.  A range query can
> produce latency variance because it requires connecting to a covering set
> of vnodes.
>
> Scaling out will not lower the absolute latency for single-term queries,
> since it queries one node, but it can potentially reduce concurrent query
> contention thus reducing latency variance and improving throughput.
>
> Scaling out may hurt range queries as it will require more nodes to
> participate in coverage.  TCP incast could become an issue with enough load.
>
> -Z
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20121014/aa534e32/attachment.html>


More information about the riak-users mailing list