slow mapred_search key lookups for single terms

Michael Radford mrad at blorf.com
Mon Apr 2 11:46:10 EDT 2012


Hi Ryan,

This is getting interesting: the same queries when executed using
local clients from 'riak attach' are taking only 100-250 ms.

However, I just tried the same test I was running remotely on Saturday
on one of the machines in the Riak cluster, using the protobufs client
to connect to 127.0.0.1, and it's still taking 6-7 seconds per query.
(Still with an occasional dip down to 2-3 seconds).

These machines are on Amazon EC2, so I have little control over the
network layout. But that includes the 4 Riak boxes, so if their
inter-node communication was suffering from similar issues I would
have expected to see it affecting other map-reduce queries. (We're
running lots of multi-key lookups via map-reduce that return thousands
of objects in a few ms, much larger than the keys returned from these
searches.)

And again, there is a huge difference between the same query using the
local client (200 ms), and using the protobufs client from the exact
same Riak machine connecting to localhost (6-7 sec but occasionally
2-3 sec).

Mike

On Mon, Apr 2, 2012 at 7:44 AM, Ryan Zezeski <rzezeski at basho.com> wrote:
> Hi Michael, you'll find my responses inline...
>
> On Sat, Mar 31, 2012 at 5:04 PM, Michael Radford <mrad at blorf.com> wrote:
>>
>> I'm seeing very slow performance from Riak search even when querying
>> single terms, and I'd appreciate any advice on how to get insight into
>> where the time is going.
>>
>> Right now, I'm using this function to time queries with the Erlang pb
>> client:
>>
>> TS =
>>  fun (Pid, Bucket, Query) ->
>>    T0 = now(),
>>    {ok, Results} = riakc_pb_socket:search(Pid, Bucket, Query),
>>    MuSec = timer:now_diff(now(), T0),
>>    io:format(user, "~b results, ~f sec~n", [length(Results),
>> MuSec/1000000])
>>  end.
>
>
> Just an FYI, you should checkout `timer:tc`.
>>
>>
>> The bucket I'm querying currently has ~300k keys total (each 16
>> bytes). (The whole cluster has maybe 1.5M entries in about a dozen
>> buckets. It's running 1.0.2, 4 nodes on 4 8-core c1.xlarge EC2
>> instances.)
>>
>> For single-term queries that return 10k+ keys, I'm routinely seeing
>> times above 6 seconds to run the above function. Intermittently,
>> however, I'll see the same queries take only 2 seconds:
>>
>> > TS(Pid,Bucket,<<"full_text:flower">>).
>> 12574 results, 6.094149 sec
>> ok
>> > TS(Pid,Bucket,<<"full_text:flower">>).
>> 12574 results, 1.938894 sec
>> ok
>> > TS(Pid,Bucket,<<"full_text:flower">>).
>> 12574 results, 1.981492 sec
>> ok
>> > TS(Pid,Bucket,<<"full_text:flower">>).
>> 12574 results, 6.120589 sec
>> ok
>>
>> > TS(Pid,Bucket,<<"full_text:red">>).
>> 13461 results, 6.572473 sec
>> ok
>> > TS(Pid,Bucket,<<"full_text:red">>).
>> 13461 results, 6.626049 sec
>> ok
>> > TS(Pid,Bucket,<<"full_text:red">>).
>> 13461 results, 2.155847 sec
>> ok
>>
>> Queries with fewer results are still slow, but not quite as slow as 6
>> seconds:
>>
>> > TS(Pid,Bucket,<<"full_text:ring">>).
>> 6446 results, 2.831806 sec
>> ok
>> > TS(Pid,Bucket,<<"full_text:ring">>).
>> 6446 results, 3.037162 sec
>> ok
>> > TS(Pid,Bucket,<<"full_text:ring">>).
>> 6447 results, 0.780944 sec
>> ok
>>
>> And queries with no matches only take a few milliseconds:
>>
>> > TS(Pid,Bucket,<<"full_text:blorf">>).
>> 0 results, 0.003269 sec
>> ok
>>
>> During the slow queries, none of the 4 machines seems to be fully
>> taxing even one cpu, or doing almost any disk i/o.
>
>
> What does intra/inter network look like?
>
>>
>>
>> As far as I can tell from looking at the riak_kv/riak_search source,
>> my query should only be hitting the index and streaming back the keys,
>> not trying to read every document from disk or sort by score. Is that
>> correct?
>
>
> It will not read the documents at all but it will sort on score.  Currently
> there is no way to disable sorting.
>
>>
>>
>> Assuming that's the case, I can't imagine why it takes so long to look
>> up 10k keys from the index for a single term, or why the times seem to
>> be bimodal (which seems like a big clue). Any pointers welcome!
>
>
> Where is your client sitting in regards to your cluster?  Is it in the local
> network?  Could you try attaching to one of your riak nodes, running the
> query there and compare results?
>
> e.g.
>
> riak attach
>
>> search:search(Bucket, Query).
>
> -Ryan
>




More information about the riak-users mailing list