slow mapred_search key lookups for single terms

Bryan Fink bryan at basho.com
Thu Apr 5 15:51:51 EDT 2012


On Wed, Apr 4, 2012 at 11:28 AM, Michael Radford <mrad at blorf.com> wrote:
> Aha, I just noticed that the native erlang client is still using
> luke_flow to implement its map-reduce, rather than riak_pipe.  On some
> level, this must be the reason for the differing behavior...either a
> bug in riak_pipe, or a bug in the usage of riak_pipe somewhere in the
> chain?

Maybe not "bug", but "naïveté", I think, is a pretty good bet.

https://github.com/basho/riak_kv/blob/master/src/riak_kv_mrc_pipe.erl#L551-L593

This is a behavior that we changed for list_keys just before 1.0 and
for 2i in 1.1.  The implementation is naïve: have a query send all
results to this process, which then enqueues them one at a time in the
pipe.

Switching to a model where this queueing is done in parallel (as in
riak_kv_pipe_listkeys and …_index) reduces time dramatically, because
there's no need to hold up enqueuing an input on node X while an input
is being enqueued on node Y.  The system is fluid enough that such
serialization often means that each stage of the pipeline is
processing ~1 input at a time, in aggregate, instead of ~($Partitions)
inputs at a time.

This *might* be the wrong intuition for Search, since there is
funneling happening to process the query anyway, but it's likely a
good place to start.

-Bryan




More information about the riak-users mailing list