slow mapred_search key lookups for single terms

Michael Radford mrad at blorf.com
Mon Apr 2 14:33:55 EDT 2012


I modified the pb client code to print the elapsed time when it
receives the first map-reduce result (in the function
wait_for_mapred), and tried to run a more apples-to-apples comparison
of the same map-reduce job via the local client and the pb client
connecting to 127.0.0.1.

In a nutshell, the majority of the extra delay from the pb client
occurs before it receives the first result (e.g., 6 out of 7 seconds).
(It's still troubling that it takes another second to receive the rest
of the results when reading from localhost, but maybe I can avoid that
by calling a different reduce phase that somehow packages my results
into a single message?)

Next, it seems like I should try instrumenting the server side of the
protobufs interface with timing...any pointers on the best place(s) to
do that would be appreciated.

Here is the function I used with the pb client to localhost, and the results:
TP =
  fun (Bucket, Query) ->
    {ok, Pid} = riakc_pb_socket:start_link("127.0.0.1", 8087),
    Inputs = {modfun, riak_search, mapred_search, [Bucket, Query]},
    Phases =
      [ {reduce,
        {modfun, riak_kv_mapreduce, reduce_identity},
        [{reduce_phase_only_1, true}],
        true} ],
    {MuSec, {ok, [{0, Results}]}} =
      timer:tc (fun () -> riakc_pb_socket:mapred(Pid, Inputs, Phases) end),
    riakc_pb_socket:stop (Pid),
    io:format(user, "~b results, ~f sec~n", [length(Results), MuSec/1000000])
  end.

3> TP(Bucket,<<"full_text:flower">>).
first M-R result in 5848490 musec
14378 results, 7.235226 sec
ok
4> TP(Bucket,<<"full_text:flower">>).
first M-R result in 5978044 musec
14378 results, 7.423933 sec
ok
5> TP(Bucket,<<"full_text:flower">>).
first M-R result in 1219112 musec
14378 results, 2.563079 sec
ok
6> TP(Bucket,<<"full_text:flower">>).
first M-R result in 6017087 musec
14378 results, 7.326352 sec
ok
7> TP(Bucket,<<"full_text:green">>).
first M-R result in 9546654 musec
24046 results, 13.561474 sec
ok
8> TP(Bucket,<<"full_text:green">>).
first M-R result in 1878109 musec
24047 results, 5.188646 sec
ok
9>
9> TP(Bucket,<<"full_text:green">>).
first M-R result in 9724651 musec
24047 results, 12.692941 sec
ok
10> TP(Bucket,<<"full_text:green">>).
first M-R result in 9627592 musec
24047 results, 12.769256 sec
ok

And here is the function I used for the local client, and the results:
TL =
  fun (Bucket, Query) ->
    {ok, Client} = riak:local_client(),
    Inputs = {modfun, riak_search, mapred_search, [Bucket, Query]},
    Phases =
      [ {reduce,
        {modfun, riak_kv_mapreduce, reduce_identity},
        [{reduce_phase_only_1, true}],
        true} ],
    {MuSec, {ok, Results}} =
      timer:tc(fun () -> Client:mapred(Inputs, Phases) end),
    io:format(user, "~b results, ~f sec~n", [length(Results), MuSec/1000000])
  end.

(riak at 10.174.223.37)27> TL(Bucket,<<"full_text:flower">>).
14340 results, 0.106422 sec
ok
(riak at 10.174.223.37)28> TL(Bucket,<<"full_text:flower">>).
14340 results, 0.102255 sec
ok
(riak at 10.174.223.37)29> TL(Bucket,<<"full_text:flower">>).
14340 results, 0.099877 sec
ok
(riak at 10.174.223.37)30> TL(Bucket,<<"full_text:blue">>).
25017 results, 0.210381 sec
ok
(riak at 10.174.223.37)31> TL(Bucket,<<"full_text:blue">>).
25017 results, 0.200112 sec
ok
(riak at 10.174.223.37)32> TL(Bucket,<<"full_text:blue">>).
25018 results, 0.207553 sec
ok
(riak at 10.174.223.37)33> TL(Bucket,<<"full_text:green">>).
23945 results, 0.173061 sec
ok
(riak at 10.174.223.37)34> TL(Bucket,<<"full_text:green">>).
23945 results, 0.191595 sec
ok
(riak at 10.174.223.37)35> TL(Bucket,<<"full_text:green">>).
23945 results, 0.177821 sec
ok

Mike

On Mon, Apr 2, 2012 at 8:46 AM, Michael Radford <mrad at blorf.com> wrote:
> Hi Ryan,
>
> This is getting interesting: the same queries when executed using
> local clients from 'riak attach' are taking only 100-250 ms.
>
> However, I just tried the same test I was running remotely on Saturday
> on one of the machines in the Riak cluster, using the protobufs client
> to connect to 127.0.0.1, and it's still taking 6-7 seconds per query.
> (Still with an occasional dip down to 2-3 seconds).
>
> These machines are on Amazon EC2, so I have little control over the
> network layout. But that includes the 4 Riak boxes, so if their
> inter-node communication was suffering from similar issues I would
> have expected to see it affecting other map-reduce queries. (We're
> running lots of multi-key lookups via map-reduce that return thousands
> of objects in a few ms, much larger than the keys returned from these
> searches.)
>
> And again, there is a huge difference between the same query using the
> local client (200 ms), and using the protobufs client from the exact
> same Riak machine connecting to localhost (6-7 sec but occasionally
> 2-3 sec).
>
> Mike
>
> On Mon, Apr 2, 2012 at 7:44 AM, Ryan Zezeski <rzezeski at basho.com> wrote:
>> Hi Michael, you'll find my responses inline...
>>
>> On Sat, Mar 31, 2012 at 5:04 PM, Michael Radford <mrad at blorf.com> wrote:
>>>
>>> I'm seeing very slow performance from Riak search even when querying
>>> single terms, and I'd appreciate any advice on how to get insight into
>>> where the time is going.
>>>
>>> Right now, I'm using this function to time queries with the Erlang pb
>>> client:
>>>
>>> TS =
>>>  fun (Pid, Bucket, Query) ->
>>>    T0 = now(),
>>>    {ok, Results} = riakc_pb_socket:search(Pid, Bucket, Query),
>>>    MuSec = timer:now_diff(now(), T0),
>>>    io:format(user, "~b results, ~f sec~n", [length(Results),
>>> MuSec/1000000])
>>>  end.
>>
>>
>> Just an FYI, you should checkout `timer:tc`.
>>>
>>>
>>> The bucket I'm querying currently has ~300k keys total (each 16
>>> bytes). (The whole cluster has maybe 1.5M entries in about a dozen
>>> buckets. It's running 1.0.2, 4 nodes on 4 8-core c1.xlarge EC2
>>> instances.)
>>>
>>> For single-term queries that return 10k+ keys, I'm routinely seeing
>>> times above 6 seconds to run the above function. Intermittently,
>>> however, I'll see the same queries take only 2 seconds:
>>>
>>> > TS(Pid,Bucket,<<"full_text:flower">>).
>>> 12574 results, 6.094149 sec
>>> ok
>>> > TS(Pid,Bucket,<<"full_text:flower">>).
>>> 12574 results, 1.938894 sec
>>> ok
>>> > TS(Pid,Bucket,<<"full_text:flower">>).
>>> 12574 results, 1.981492 sec
>>> ok
>>> > TS(Pid,Bucket,<<"full_text:flower">>).
>>> 12574 results, 6.120589 sec
>>> ok
>>>
>>> > TS(Pid,Bucket,<<"full_text:red">>).
>>> 13461 results, 6.572473 sec
>>> ok
>>> > TS(Pid,Bucket,<<"full_text:red">>).
>>> 13461 results, 6.626049 sec
>>> ok
>>> > TS(Pid,Bucket,<<"full_text:red">>).
>>> 13461 results, 2.155847 sec
>>> ok
>>>
>>> Queries with fewer results are still slow, but not quite as slow as 6
>>> seconds:
>>>
>>> > TS(Pid,Bucket,<<"full_text:ring">>).
>>> 6446 results, 2.831806 sec
>>> ok
>>> > TS(Pid,Bucket,<<"full_text:ring">>).
>>> 6446 results, 3.037162 sec
>>> ok
>>> > TS(Pid,Bucket,<<"full_text:ring">>).
>>> 6447 results, 0.780944 sec
>>> ok
>>>
>>> And queries with no matches only take a few milliseconds:
>>>
>>> > TS(Pid,Bucket,<<"full_text:blorf">>).
>>> 0 results, 0.003269 sec
>>> ok
>>>
>>> During the slow queries, none of the 4 machines seems to be fully
>>> taxing even one cpu, or doing almost any disk i/o.
>>
>>
>> What does intra/inter network look like?
>>
>>>
>>>
>>> As far as I can tell from looking at the riak_kv/riak_search source,
>>> my query should only be hitting the index and streaming back the keys,
>>> not trying to read every document from disk or sort by score. Is that
>>> correct?
>>
>>
>> It will not read the documents at all but it will sort on score.  Currently
>> there is no way to disable sorting.
>>
>>>
>>>
>>> Assuming that's the case, I can't imagine why it takes so long to look
>>> up 10k keys from the index for a single term, or why the times seem to
>>> be bimodal (which seems like a big clue). Any pointers welcome!
>>
>>
>> Where is your client sitting in regards to your cluster?  Is it in the local
>> network?  Could you try attaching to one of your riak nodes, running the
>> query there and compare results?
>>
>> e.g.
>>
>> riak attach
>>
>>> search:search(Bucket, Query).
>>
>> -Ryan
>>




More information about the riak-users mailing list