Riak Search

Ryan Zezeski rzezeski at basho.com
Sat Oct 13 23:03:04 EDT 2012


Pavel,

On Sat, Oct 13, 2012 at 12:59 AM, Pavel Kogan <pavel.kogan at cortica.com>wrote:


> Those limitations leaves us a single option of Riak Search and I have a
> few questions about it.
>

I'm working on a new solution, named Yokozuna, that integrates Riak and
Solr.  It's not currently part of Riak but my goal is to make it so.

https://github.com/rzezeski/yokozuna


> 1) We saw, that after enabling search option and adding search precommit
> hook, store speed (our tests were
>     done on single test node) became x10 slower. Is it normal?
>

Degradation of throughput after enabling Riak Search is normal.  Riak
Search does a lot of work during index time.  It has to analyze the data
and each indexed document has a good chance of causing writes to every node
in the cluster because of term-based partitioning.


> 2) If we have dedicated node in cluster for search (which would not be
> used for KV store/get operations) would
>     it do some impact of general cluster performance?
>

You cannot dedicate a node for Riak Search.  Every node is required to
participate in KV.  Every node must have Riak Search enabled for search to
work.


> 3) For 1M keys in cluster search runs very fast. How it would scale for
> 100M (or even much more keys)?
>     How it would scale with number of nodes in cluster?
>

This depends on the query.  A single-term query with a reasonable result
set typically has a latency of a disk seek plus a few milliseconds.  This
is the benefit of term-based partitioning, at query time.  A boolean query
containing one large result set, even if the others are small, can bite you
because of sub-optimal logic in the intersection code.  A range query can
produce latency variance because it requires connecting to a covering set
of vnodes.

Scaling out will not lower the absolute latency for single-term queries,
since it queries one node, but it can potentially reduce concurrent query
contention thus reducing latency variance and improving throughput.

Scaling out may hurt range queries as it will require more nodes to
participate in coverage.  TCP incast could become an issue with enough load.

-Z
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20121013/877f09f4/attachment.html>


More information about the riak-users mailing list