Riak crashed and crashed again when recovering

Sean Cribbs sean at basho.com
Wed May 5 08:05:31 EDT 2010


Germain,

It looks like you're filling up the dets tables -- which have a 2GB limit per file, although Riak uses multiple files, one per vnode.  Have you tried the innostore backend?  Also if you continue to use dets, try increasing the number of partitions, which will make more, smaller files.

Sean Cribbs <sean at basho.com>
Developer Advocate
Basho Technologies, Inc.
http://basho.com/

On May 5, 2010, at 7:55 AM, Germain Maurice wrote:

> I got other informations when I launched "riak console" on the both nodes, as you can see here.
> I hope this will be useful for you and then for me :)
> 
> 10.0.0.40:$ riak console
> [...]
> =INFO REPORT==== 5-May-2010::13:04:48 ===
> Starting handoff of partition 479555224749202520035584085735030365824602865664 to 'riak at 10.0.0.41'
> dets: file "/reiser/riak/dets/639406966332270026714112114313373821099470487552" not properly closed, repairing ...
> 
> =INFO REPORT==== 5-May-2010::13:05:48 ===
> Dropping partition 479555224749202520035584085735030365824602865664
> 
> =INFO REPORT==== 5-May-2010::13:08:47 ===
>    alarm_handler: {set,{system_memory_high_watermark,[]}}
> 
> =INFO REPORT==== 5-May-2010::13:19:32 ===
> "dets:open_file failed"
> 
> =ERROR REPORT==== 5-May-2010::13:19:32 ===
> ** Generic server riak_kv_vnode_master terminating
> ** Last message in was {'$gen_cast',
>                           {start_vnode,
>                               639406966332270026714112114313373821099470487552}}
> ** When Server state == {state,12307,[]}
> ** Reason for termination ==
> ** {{badmatch,
>        {error,
>            {{badmatch,
>                 {error,
>                     {no_more_space_on_file,
>                         "/reiser/riak/dets/639406966332270026714112114313373821099470487552.TMP"}}},
>             [{riak_kv_vnode,init,1},
>              {gen_fsm,init_it,6},
>              {proc_lib,init_p_do_apply,3}]}}},
>    [{riak_kv_vnode_master,get_vnode,2},
>     {riak_kv_vnode_master,handle_cast,2},
>     {gen_server,handle_msg,5},
>     {proc_lib,init_p_do_apply,3}]}
> 
> =INFO REPORT==== 5-May-2010::13:19:32 ===
> Spidermonkey VM host stopping (<0.117.0>)
> 
> =INFO REPORT==== 5-May-2010::13:19:32 ===
> Spidermonkey VM host stopping (<0.115.0>)
> 
> =INFO REPORT==== 5-May-2010::13:19:32 ===
> Spidermonkey VM host stopping (<0.119.0>)
> 
> =INFO REPORT==== 5-May-2010::13:19:32 ===
> Spidermonkey VM host stopping (<0.112.0>)
> 
> =INFO REPORT==== 5-May-2010::13:19:32 ===
> Spidermonkey VM host stopping (<0.114.0>)
> Erlang has closed
> 
> =INFO REPORT==== 5-May-2010::13:19:32 ===
>    alarm_handler: {clear,system_memory_high_watermark}
> /usr/lib/riak/lib/os_mon-2.2.5/priv/bin/memsup: Erlang has closed.
> 
> 
> 
> ==========================================================
> 
> 10.0.0.41:$ riak console
> [...]
> =INFO REPORT==== 5-May-2010::13:03:41 ===
> Spidermonkey VM host starting (<0.120.0>)
> Eshell V5.7.5  (abort with ^G)
> (riak at 10.0.0.41)1> dets: file "/reiser/riak/dets/616571003248974668617179538802181898917346541568" not properly closed, repairing ...
> 
> =INFO REPORT==== 5-May-2010::13:04:48 ===
> Receiving handoff data for partition 479555224749202520035584085735030365824602865664
> 
> =ERROR REPORT==== 5-May-2010::13:05:48 ===
> ** Generic server <0.189.0> terminating
> ** Last message in was {tcp,#Port<0.2936>,
>                            [0|<<84,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0>>]}
> ** When Server state == {state,#Port<0.2936>,undefined,undefined,0}
> ** Reason for termination ==
> ** {timeout,{gen_server2,call,
>                         [riak_kv_vnode_master,
>                          {get_vnode,479555224749202520035584085735030365824602865664},
>                          60000]}}
> 
> =INFO REPORT==== 5-May-2010::13:06:41 ===
>    alarm_handler: {set,{system_memory_high_watermark,[]}}
> 
> =ERROR REPORT==== 5-May-2010::13:07:41 ===
> webmachine error: path="/riak/blog_content_temp/1714432724f7f975610be47146fec6c7e74bf4bbccdbeea5208ac6e3540e6f4b"
> [{webmachine_decision_core,'-decision/1-lc$^1/1-1-',
>     [{error,
>          {error,
>              {case_clause,{error,timeout}},
>              [{riak_kv_wm_raw,content_types_provided,2},
>               {webmachine_resource,resource_call,3},
>               {webmachine_resource,do,3},
>               {webmachine_decision_core,resource_call,1},
>               {webmachine_decision_core,decision,1},
>               {webmachine_decision_core,handle_request,2},
>               {webmachine_mochiweb,loop,1},
>               {mochiweb_http,headers,5}]}}]},
> {webmachine_decision_core,decision,1},
> {webmachine_decision_core,handle_request,2},
> {webmachine_mochiweb,loop,1},
> {mochiweb_http,headers,5},
> {proc_lib,init_p_do_apply,3}]
> 
> =INFO REPORT==== 5-May-2010::13:16:54 ===
> "dets:open_file failed"
> 
> =ERROR REPORT==== 5-May-2010::13:16:54 ===
> ** Generic server riak_kv_vnode_master terminating
> ** Last message in was {'$gen_cast',
>                           {start_vnode,
>                               616571003248974668617179538802181898917346541568}}
> ** When Server state == {state,12307,[]}
> ** Reason for termination ==
> ** {{badmatch,
>        {error,
>            {{badmatch,
>                 {error,
>                     {no_more_space_on_file,
>                         "/reiser/riak/dets/616571003248974668617179538802181898917346541568.TMP"}}},
>             [{riak_kv_vnode,init,1},
>              {gen_fsm,init_it,6},
>              {proc_lib,init_p_do_apply,3}]}}},
>    [{riak_kv_vnode_master,get_vnode,2},
>     {riak_kv_vnode_master,handle_cast,2},
>     {gen_server,handle_msg,5},
>     {proc_lib,init_p_do_apply,3}]}
> 
> =INFO REPORT==== 5-May-2010::13:16:54 ===
> Spidermonkey VM host stopping (<0.113.0>)
> 
> =INFO REPORT==== 5-May-2010::13:16:54 ===
> Spidermonkey VM host stopping (<0.115.0>)
> 
> =INFO REPORT==== 5-May-2010::13:16:54 ===
> Spidermonkey VM host stopping (<0.114.0>)
> 
> =INFO REPORT==== 5-May-2010::13:16:54 ===
> Spidermonkey VM host stopping (<0.117.0>)
> 
> =INFO REPORT==== 5-May-2010::13:16:54 ===
> Spidermonkey VM host stopping (<0.119.0>)
> 
> =INFO REPORT==== 5-May-2010::13:16:54 ===
> Spidermonkey VM host stopping (<0.118.0>)
> 
> =INFO REPORT==== 5-May-2010::13:16:54 ===
> Spidermonkey VM host stopping (<0.116.0>)
> 
> =INFO REPORT==== 5-May-2010::13:16:54 ===
> Spidermonkey VM host stopping (<0.120.0>)
> 
> =INFO REPORT==== 5-May-2010::13:16:54 ===
>    alarm_handler: {clear,system_memory_high_watermark}
> /usr/lib/riak/lib/os_mon-2.2.5/priv/bin/memsup: Erlang has closed.
>                                                                   Erlang has closed
> 
> 
> 
> 
> Le 05/05/10 11:35, Germain Maurice a écrit :
>> Hi all,
>> I am testing Riak for my document base and i got a problem when i was migrating documents from my previous
>> system to Riak.
>> I have two nodes and one bucket for the beginning.
>> There are more than 480 000 documents in the bucket and the documents are html pages.
>> 
>> In the following you'll find all the files and informations after a node was restarted.
>> After a while, riak crashed again for the two nodes I restarted ... :(
>> 
>> [...]
> 
> 
> -- 
> Germain Maurice
> Administrateur Système/Réseau
> Tel : +33.(0)1.42.43.54.33
> 
> http://www.linkfluence.net
> 
> 
> _______________________________________________
> riak-users mailing list
> riak-users at lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com




More information about the riak-users mailing list