slow mapred_search key lookups for single terms
bryan at basho.com
Thu Apr 5 15:51:51 EDT 2012
On Wed, Apr 4, 2012 at 11:28 AM, Michael Radford <mrad at blorf.com> wrote:
> Aha, I just noticed that the native erlang client is still using
> luke_flow to implement its map-reduce, rather than riak_pipe. On some
> level, this must be the reason for the differing behavior...either a
> bug in riak_pipe, or a bug in the usage of riak_pipe somewhere in the
Maybe not "bug", but "naïveté", I think, is a pretty good bet.
This is a behavior that we changed for list_keys just before 1.0 and
for 2i in 1.1. The implementation is naïve: have a query send all
results to this process, which then enqueues them one at a time in the
Switching to a model where this queueing is done in parallel (as in
riak_kv_pipe_listkeys and …_index) reduces time dramatically, because
there's no need to hold up enqueuing an input on node X while an input
is being enqueued on node Y. The system is fluid enough that such
serialization often means that each stage of the pipeline is
processing ~1 input at a time, in aggregate, instead of ~($Partitions)
inputs at a time.
This *might* be the wrong intuition for Search, since there is
funneling happening to process the query anyway, but it's likely a
good place to start.
More information about the riak-users