Issues with high node load and very slow response

Chaim Solomon chaim at itcentralstation.com
Mon Jul 28 01:54:16 EDT 2014


Hi,


Responses inline

On Mon, Jul 28, 2014 at 5:30 AM, Jordan West <jwest at basho.com> wrote:

> Unfortunately, there are several conditions that could cause this. To rule
> out the obvious ones can you verify that `riak at 10.128.137.185` has search
> enabled. If it is the case that search is enabled we can try a few
> different things to debug further.
>
>
All nodes have search=on in the riak.conf and I see the Java process.

Attaching to 'riak at 10.128.138.25' (via "riak attach") and using the shell,
> if you could provide the output of:
>
> ```
> rpc:multicall([node() | nodes()], app_helper, get_env, [yokozuna, enabled,
> false], 5000).
> ```
>
> Eshell V5.10.3  (abort with ^G)
(riak at 10.128.138.25)1> rpc:multicall([node() | nodes()], app_helper,
get_env, [yokozuna, enabled, false], 5000).
{[true,true,true,true,true,true,true],[]}

As well as:
>
> ```
> [riak_core_node_watcher:services(N) || N <- [node() | nodes()]].
> ```
>
> (riak at 10.128.138.25)2> [riak_core_node_watcher:services(N) || N <-
[node() | nodes()]].
[[yokozuna,riak_pipe,riak_kv],
 [yokozuna,riak_pipe,riak_kv],
 [yokozuna,riak_pipe,riak_kv],
 [yokozuna,riak_pipe,riak_kv],
 [yokozuna,riak_pipe,riak_kv],
 [yokozuna,riak_pipe,riak_kv],
 [yokozuna,riak_pipe,riak_kv]]

And finally:
>
> ```
> yz_kv:is_metadata_consistent('riak at 10.127.137.185').
> ```
>
> (riak at 10.128.138.25)3> yz_kv:is_metadata_consistent('riak at 10.127.137.185
').
** exception exit: {{nodedown,'riak at 10.127.137.185'},
                    {gen_server,call,
                        [{riak_core_metadata_hashtree,
                             'riak at 10.127.137.185'},
                         {prefix_hash,{core,bucket_types}},
                         1000]}}
     in function  gen_server:call/3 (gen_server.erl, line 188)
     in call from yz_kv:is_metadata_consistent/1 (src/yz_kv.erl, line 433)

Based on the output of those we can debug further.
>
> Additionally, its entirely possible that when we get rid of that error it
> will not resolve the primary issue (CPU usage/request slowness). However,
> the check that is the source of that log message can block in the request
> path, so something funky could certainly be going on.
>


Chaim Solomon
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20140728/3dbcc12a/attachment.html>


More information about the riak-users mailing list