Issues with high node load and very slow response

Jordan West jwest at basho.com
Mon Jul 28 12:46:11 EDT 2014


Chaim,

>From the output you provided I think the handoff issue a symptom of the CPU
usage (the check that is performed is failing to complete before timing
out) . Could you run `riak-debug` on the nodes (or at least the ones with
high CPU usage) and make the archives available to me for download. If
you'd prefer to keep that off-list please email me directly.

Jordan


On Mon, Jul 28, 2014 at 8:59 AM, Chaim Solomon <chaim at itcentralstation.com>
wrote:

>
> On Mon, Jul 28, 2014 at 6:50 PM, Jordan West <jwest at basho.com> wrote:
>
>> Unfortunately, there is more than one case that can cause this message to
>> occur. Can you try one more command (from any node in the cluster):
>>
>> ```
>> rpc:multicall([node() | nodes()], yz_solr, cores, [], 5000).
>> ```
>>
>>
> (riak at 10.128.138.25)1> rpc:multicall([node() | nodes()], yz_solr, cores,
> [], 5000).
> {[{ok,[<<"content-items">>,<<"linkmeta">>,
>        <<"products_search_results">>,
>        <<"single_product_search_results">>,<<"sites">>]},
>   {ok,[<<"content-items">>,<<"linkmeta">>,
>        <<"products_search_results">>,
>        <<"single_product_search_results">>,<<"sites">>]},
>   {ok,[<<"content-items">>,<<"linkmeta">>,
>        <<"products_search_results">>,
>        <<"single_product_search_results">>,<<"sites">>]},
>   {ok,[<<"content-items">>,<<"linkmeta">>,
>        <<"products_search_results">>,
>        <<"single_product_search_results">>,<<"sites">>]}],
>  ['riak at 10.128.179.86','riak at 10.128.220.135',
>   'riak at 10.128.137.185']}
>
>
>
>> I'm working on a patch to improve the logging here (so running these
>> commands won't be necessary) but that may take a bit to get reviewed.
>>
>>
>>> Only one node (10.128.220.135) is giving me this:
>>> 2014-07-28 10:31:31.618 [error] <0.847.706>@yz_kv:index:206 failed to
>>> index object {<<"linkmeta">>,<<"http://scn.sap.com/thread/3262664">>}
>>> with error {"Failed to index docs",{error,req_timedout}} because
>>> [{yz_solr,index,3,[{file,"src/yz_solr.erl"},{line,192}]},{yz_kv,index,7,[{file,"src/yz_kv.erl"},{line,258}]},{yz_kv,index,3,[{file,"src/yz_kv.erl"},{line,193}]},{riak_kv_vnode,actual_put,6,[{file,"src/riak_kv_vnode.erl"},{line,1416}]},{riak_kv_vnode,perform_put,3,[{file,"src/riak_kv_vnode.erl"},{line,1404}]},{riak_kv_vnode,do_put,7,[{file,"src/riak_kv_vnode.erl"},{line,1199}]},{riak_kv_vnode,handle_command,3,[{file,"src/riak_kv_vnode.erl"},{line,485}]},{riak_core_vnode,vnode_command,3,[{file,"src/riak_core_vnode.erl"},{line,345}]}]
>>>
>>>
>> My suspicion is the command above will shed some light on the error here.
>>
>> Also, so we don't leave behind the main issue of CPU usage, is this node
>> one of the two having problems? Which is the other?
>>
>
> Here is the top from top for the nodes:
> 10.128.137.185
> 20170 riak      20   0 16.543g 1.113g   3780 S 193.2 56.9  11991:45
> 19745 riak      20   0 2390844 241996   3352 S   6.7 11.8 503:56.44
>
> 10.128.138.25
>   628 riak      20   0 2841444 499964   2000 S 123.7 24.4   5779:46
>  1074 riak      20   0 14.265g 1.014g    928 S   1.0 51.8 254:03.19
>
> 10.128.220.135
>  3153 riak      20   0 16.479g 978376      0 S 100.2 47.7   6357:07
>  2729 riak      20   0 2650512 481620   2828 S  11.0 23.5   1398:58
>
> Chaim Solomon
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20140728/adfa75df/attachment.html>


More information about the riak-users mailing list