Random timeouts on Riak

Russell Brown russell.brown at me.com
Mon Dec 29 10:04:06 EST 2014


Alrighty.

On 29 Dec 2014, at 12:29, Russell Brown <russell.brown at me.com> wrote:

> Hi Jason,
> 
> I opened https://github.com/basho/riak_kv/issues/1069. Feel free to add any information to it that you think is pertinent.
> 
> On 29 Dec 2014, at 12:26, Jason Ryan <jason.ryan at trustev.com> wrote:
> 
>> No - those settings were set during the setup of the cluster.
>> 
>> >> Anyway you can get each of the N values from their backends separately so I can see there content lists?
>> 
>> The N value is 3 - or is it a particular command you want us to run on each node?
> 
> I will try and get this for you. It will involve attaching to the nodes in question and running some erlang in the console, probably.

If you can attach to a vnode and run

BKey = {<<“you_bucket”>>, <<“your_key”>>}.
Preflist = fun(BK) ->
  {ok, Ring} = riak_core_ring_manager:get_my_ring(),
  DocIdx = riak_core_util:chash_key(BK),
  UpNodes = riak_core_node_watcher:nodes(riak_kv),
  Preflist = riak_core_apl:get_apl_ann(DocIdx, 3, Ring, UpNodes),
  [IndexNode || {IndexNode, _Type} <- Preflist]
end.
PL = Preflist(BKey).
ReqId = <<"my_req_id">>.
flush().
riak_kv_vnode:get(PL, BKey, ReqId).
flush().

Be aware this will dump the erlang terms for the 3 riak objects on disk to console. Please be careful to remove any sensitive information before replying with the output. Please feel free to reply directly to my email if you prefer.

It would be a huge help to me recreating the bug, since it would give me exactly the objects that lead to this error occurring.

Cheers

Russell



> 
> Cheers
> 
> Russell
> 
>> 
>> Thanks,
>> Jason
>> 
>> 
>> On 29 December 2014 at 12:19, Russell Brown <russell.brown at me.com> wrote:
>> 
>> On 29 Dec 2014, at 12:09, Jason Ryan <jason.ryan at trustev.com> wrote:
>> 
>>> All types/buckets we use are set to allow_mult: false - last_write_wins:true
>> 
>> Did you change to this setting after these keys were written?
>> 
>> Looks like a bug in Riak, so I’m going to open a ticket: hd([]) should never be called in reconcile. But any further help we can get from you would be appreciated. Anyway you can get each of the N values from their backends separately so I can see there content lists?
>> 
>>> 
>>> 
>>> On 29 December 2014 at 12:08, Sargun Dhillon <sargun at sargun.me> wrote:
>>> The bucket (type) that you're working with -- what are your
>>> allow_mult, and last_write_wins settings?
>>> 
>>> On Mon, Dec 29, 2014 at 4:05 AM, Jason Ryan <jason.ryan at trustev.com> wrote:
>>> > It seems to move between 4 keys in particular, these keys are actually empty
>>> > at the moment (i.e. an empty JSON document).
>>> >
>>> > CPU utilization is close to zero.
>>> >
>>> > Can't see anything in particular, bar the error message I just posted
>>> > before.
>>> >
>>> > Jason
>>> >
>>> >
>>> > On 29 December 2014 at 11:58, Ciprian Manea <ciprian at basho.com> wrote:
>>> >>
>>> >> Hi Jason,
>>> >>
>>> >> Are these random timeouts happening for only one key, or is common for
>>> >> more?
>>> >>
>>> >> What is the CPU utilisation in the cluster when you're experience these
>>> >> timeouts?
>>> >>
>>> >> Can you spot anything peculiar in your server's $ dmesg outputs? Any I/O
>>> >> errors there?
>>> >>
>>> >>
>>> >> Regards,
>>> >> Ciprian
>>> >>
>>> >> On Mon, Dec 29, 2014 at 1:55 PM, Sargun Dhillon <sargun at sargun.me> wrote:
>>> >>>
>>> >>> Several things:
>>> >>> 1) I recommend you have a 5-node cluster:
>>> >>> http://basho.com/why-your-riak-cluster-should-have-at-least-five-nodes/
>>> >>> 2) What version of Riak are you using?
>>> >>> 3) What backend(s) are you using?
>>> >>> 4) What's the size of your keyspace?
>>> >>> 5) Are you actively rewriting keys, or writing keys to the cluster?
>>> >>> 6) Do you know how much I/O the cluster is currently doing?
>>> >>>
>>> >>> On Mon, Dec 29, 2014 at 2:51 AM, Jason Ryan <jason.ryan at trustev.com>
>>> >>> wrote:
>>> >>> > Hi,
>>> >>> >
>>> >>> > We are getting random timeouts from our application (>60seconds) when
>>> >>> > we try
>>> >>> > to retrieve a key from our Riak cluster (4 nodes with a load balancer
>>> >>> > in
>>> >>> > front of them). Our application just uses the standard REST API to
>>> >>> > query
>>> >>> > Riak.
>>> >>> >
>>> >>> > We are pretty new to Riak - so would like to understand how best to
>>> >>> > debug
>>> >>> > this issue? Is there any good pointers on what to start with? This is
>>> >>> > our
>>> >>> > production cluster.
>>> >>> >
>>> >>> > Thanks,
>>> >>> > Jason
>>> >>> >
>>> >>> >
>>> >>> > This message is for the named person's use only. If you received this
>>> >>> > message in error, please immediately delete it and all copies and
>>> >>> > notify the
>>> >>> > sender. You must not, directly or indirectly, use, disclose,
>>> >>> > distribute,
>>> >>> > print, or copy any part of this message if you are not the intended
>>> >>> > recipient. Any views expressed in this message are those of the
>>> >>> > individual
>>> >>> > sender and not Trustev Ltd. Trustev is registered in Ireland No. 516425
>>> >>> > and
>>> >>> > trades from 2100 Cork Airport Business Park, Cork, Ireland.
>>> >>> >
>>> >>> >
>>> >>> > _______________________________________________
>>> >>> > riak-users mailing list
>>> >>> > riak-users at lists.basho.com
>>> >>> > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>> >>> >
>>> >>>
>>> >>> _______________________________________________
>>> >>> riak-users mailing list
>>> >>> riak-users at lists.basho.com
>>> >>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>> >>
>>> >>
>>> >
>>> >
>>> > This message is for the named person's use only. If you received this
>>> > message in error, please immediately delete it and all copies and notify the
>>> > sender. You must not, directly or indirectly, use, disclose, distribute,
>>> > print, or copy any part of this message if you are not the intended
>>> > recipient. Any views expressed in this message are those of the individual
>>> > sender and not Trustev Ltd. Trustev is registered in Ireland No. 516425 and
>>> > trades from 2100 Cork Airport Business Park, Cork, Ireland.
>>> 
>>> 
>>> This message is for the named person's use only. If you received this message in error, please immediately delete it and all copies and notify the sender. You must not, directly or indirectly, use, disclose, distribute, print, or copy any part of this message if you are not the intended recipient. Any views expressed in this message are those of the individual sender and not Trustev Ltd. Trustev is registered in Ireland No. 516425 and trades from 2100 Cork Airport Business Park, Cork, Ireland.
>>> 
>>> _______________________________________________
>>> riak-users mailing list
>>> riak-users at lists.basho.com
>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>> 
>> 
>> 
>> This message is for the named person's use only. If you received this message in error, please immediately delete it and all copies and notify the sender. You must not, directly or indirectly, use, disclose, distribute, print, or copy any part of this message if you are not the intended recipient. Any views expressed in this message are those of the individual sender and not Trustev Ltd. Trustev is registered in Ireland No. 516425 and trades from 2100 Cork Airport Business Park, Cork, Ireland.
>> 
> 
> _______________________________________________
> riak-users mailing list
> riak-users at lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20141229/4a5a1f92/attachment.html>


More information about the riak-users mailing list