Sudden frequent node crashes

Allen Landsidel landsidel.allen at gmail.com
Thu Oct 8 16:05:02 EDT 2015


Understood, I'll give it a shot with erlang16.

With the FreeBSD ports system there are often pre-build patches to make 
things more BSD friendly.  About a month ago I was doing some 
maintenance and decided to update riak from 1.4.12 to 2.1.1.  There was 
not a port of 2.1.1 available at the time, so I made a duplicate of the 
existing port and then fixed up the existing patches.

The patches don't modify any of the erl files so the line numbers 
shouldn't have changed as a result of that.

Riak 2.1.1 was added as a port about two weeks ago, so I'll try building 
from that first.  It looks like it's setup to use a specific erlang port 
called 'erlang-riak' which is also new and is the custom erlang from basho.

I'll give the new port a see if that takes care of things.  Should have 
looked to see if there was an 'official' port before posting. :)

We don't store any mission critical data in it (yet) so just clearing it 
out and starting fresh isn't too big of a deal.

On 10/8/2015 15:40, Russell Brown wrote:
> Hi Allen,
> Riak only supports erlang r16 at this time. Probably best to use the erlang that riak ships with, or build our basho OTP fork, instructions here: http://docs.basho.com/riak/latest/ops/building/installing/erlang/#Installing-on-FreeBSD-Solaris
>
> We’re working on erlang 17 and beyond support.
>
> Also, can you confirm what install you did of riak (source? checkout of a tag on github?) Those line numbers in your crash don’t match up for me
>
> Cheers
>
> Russell
>
>> On 8 Oct 2015, at 20:35, Allen Landsidel <landsidel.allen at gmail.com> wrote:
>>
>> Oops sorry, forgot about that.
>>
>> I'm running Erlang 17 from ports; erlang-runtime17-17.5.6.3
>>
>>
>>
>> On 10/8/2015 15:28, Russell Brown wrote:
>>> Hi Allen,
>>> What version of erlang are you running, please?
>>>
>>> Cheers
>>>
>>> Russell
>>>
>>>> On 8 Oct 2015, at 19:58, Allen Landsidel <landsidel.allen at gmail.com> wrote:
>>>>
>>>> Background:
>>>> Riak 2.1.1
>>>> FreeBSD 9.1.
>>>> Servers are all virtualized on VMWare ESX 5.5.  Each node is given ~300G of storage, 4G of RAM, and 2 core SMP.
>>>> Storage is via FC SAN.
>>>> Access to the cluster from clients is strictly over the HTTP interface and is funneled through haproxy.
>>>> The cluster was five nodes using leveldb with a ring size of 16.
>>>>
>>>> The cluster is setup this way so that we could start small and add nodes as needed, with a small memory footprint, rather than preallocating a ton of memory and disk space that we may never use.  The cluster has been running fine for the past month.
>>>>
>>>> Today the cluster has begun experiencing strange repeated failures.  No changes in any of the clients have been made, so the problem seems as though it must be in the data coming in.
>>>>
>>>> I started a fresh two-node cluster and shortly after data begins flowing into the cluster, nodes begin crashing.
>>>>
>>>> The first report in the crash log is below.  I'm far from an expert at understanding these errors, but "invalid_unicode" seems telling.
>>>>
>>>> If more from the crash log is needed, I can provide it.  Once this first error comes in, the node goes offline and errors just continue to roll in to the log file, in what looks like an attempt to automatically restart the node.
>>>>
>>>> --------------------------------------------
>>>> ** Generic server <0.2136.0> terminating
>>>> ** Last message in was {vnodeid,10000}
>>>> ** When Server state == {state,"/var/db/riak/kv_vnode/0",0,<0.2135.0>,2}
>>>> ** Reason for termination ==
>>>> ** {{badmatch,{error,{4,file_io_server,invalid_unicode}}},[{riak_kv_vnode_status_mgr,handle_call,3,[{file,"src/riak_kv_vnode_status_mgr.erl"},{line,178}]},{gen_server,try_handle_call,4,[{file,"gen_server.erl"},{line,607}]},{gen_server,handle_msg,5,[{file,"gen_server.erl"},{line,639}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,237}]}]}
>>>> 2015-10-08 18:36:18 =CRASH REPORT====
>>>>   crasher:
>>>>     initial call: riak_kv_vnode_status_mgr:init/1
>>>>     pid: <0.2136.0>
>>>>     registered_name: []
>>>>     exception exit: {{{badmatch,{error,{4,file_io_server,invalid_unicode}}},[{riak_kv_vnode_status_mgr,handle_call,3,[{file,"src/riak_kv_vnode_status_mgr.erl"},{line,178}]},{gen_server,try_handle_call,4,[{file,"gen_server.erl"},{line,607}]},{gen_server,handle_msg,5,[{file,"gen_server.erl"},{line,639}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,237}]}]},[{gen_server,terminate,7,[{file,"gen_server.erl"},{line,804}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,237}]}]}
>>>>     ancestors: [<0.2135.0>,riak_core_vnode_sup,riak_core_sup,<0.169.0>]
>>>>     messages: []
>>>>     links: [<0.2135.0>]
>>>>     dictionary: []
>>>>     trap_exit: false
>>>>     status: running
>>>>     heap_size: 610
>>>>     stack_size: 27
>>>>     reductions: 569
>>>>   neighbours:
>>>> --------------------------------------------
>>>>
>>>> _______________________________________________
>>>> riak-users mailing list
>>>> riak-users at lists.basho.com
>>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>>
>




More information about the riak-users mailing list