Riak Cluster Crash down on heavy load Benchmarking

Mark Phillips mark at basho.com
Tue Jun 19 13:51:44 EDT 2012


Hi Amol,

It looks like you're out of RAM. One of the offending entries from your log:

2012-06-15 19:09:31.777 [error] <0.20970.188> gen_server <0.20970.188>
terminated with reason:
{mem_error,[{zlib,call,3},{zlib,zip,1},{riak_kv_pb_socket,process_message,2},{riak_kv_pb_socket,handle_info,2},{gen_server2,handle_msg,7},{proc_lib,init_p_do_apply,3}]}

Is there any any chance you could add a machine or two and to the cluster
and run the test gain?

Mark




On Mon, Jun 18, 2012 at 11:34 PM, Amol Rajoba <amolrajoba at gmail.com> wrote:

> Hi Guys,
>
> Can anybody help here?
>
> Also found that due to continuous put operation beam.smp taking 2.9G RAM,
> what could be reason of this?
>
> Thanks.
> Amol Rajoba
>
>
> On Sat, Jun 16, 2012 at 1:13 PM, Amol Rajoba <amolrajoba at gmail.com> wrote:
>
>> Hi Guys,
>> I am evaluating Riak as Kay-Value storage where my requirement is to
>> store huge set of data(more than RAM), so Riak was setup with LevelDB as
>> backend.
>>
>> Clients were connected using protocol buffer api.
>> {pb_backlog, 100000}, in app.config
>>
>> Benchmarking involved 25 Agents doing put/store on single node for 100M
>> records.
>> It runs well till 3M but then complete cluster crashes with making all
>> nodes down.
>>
>> Following are the System as well as Riak configurations with error &
>> crash logs
>>
>> Please help to find what I am missing, I need to test riak & use it in
>> production as soon as possible.
>>
>> Nodes: 2  (I know cluster of 5 is best but this is just test setup)
>> OS: Ubuntu 12.04 32bit
>> CPU: Core i3
>> RAM: 4GB
>> HDD: 500GB
>>
>> app.config [changes only]
>>
>> %% eLevelDB Config
>>  {eleveldb, [
>>              {data_root, "/data/riak/leveldb"},
>>              {block_size, 262144}, %%256k
>>              {cache_size, 10485760}, %% 10MB - default cache size 8MB
>> per-partition
>>
>>              {write_buffer_size, 524288000}, %% 500MB in bytes
>>                 {write_buffer_size_min, 524288000}, %% 500MB in bytes
>>                 {write_buffer_size_max, 524288000}, %% 500MB in bytes
>>                 {max_open_files, 100} %% Maximum number of files open at
>> once per partition- Default: 20 - Minimum: 20
>>             ]},
>>
>>
>> vm.args [changes only]
>> ## Enable kernel poll and a few async threads
>> +K true
>> +A 128
>>
>>
>> Bucket "riaktest" properties:
>>
>> {"props":{"allow_mult":false,"
>>
>> basic_quorum":false,"big_vclock":50,"chash_keyfun":{"mod":"riak_core_util","fun":"chash_std_keyfun"},"dw":"quorum","last_write_wins":true,"linkfun":{"mod":"riak_kv_wm_link_walker","fun":"mapreduce_linkfun"},"n_val":3,"name":"riaktest","notfound_ok":true,"old_vclock":86400,"postcommit":[],"pr":0,"precommit":[],"pw":0,"r":"one","rw":"one","small_vclock":50,"w":"one","young_vclock":20}}
>>
>> relatime set in /etc/fstab on all drives
>>
>> OS open files limit sysctl fs.file-max set to 800000
>>
>>
>> Following are the error.log, crash.log and console.log* *files*
>>
>> error.log*
>> ---------------
>> 2012-06-15 19:09:31.777 [error] <0.20970.188> gen_server <0.20970.188>
>> terminated with reason:
>> {mem_error,[{zlib,call,3},{zlib,zip,1},{riak_kv_pb_socket,process_message,2},{riak_kv_pb_socket,handle_info,2},{gen_server2,handle_msg,7},{proc_lib,init_p_do_apply,3}]}
>> 2012-06-15 19:09:32.219 [error] <0.20970.188> CRASH REPORT Process
>> <0.20970.188> with 0 neighbours crashed with reason:
>> {mem_error,[{zlib,call,3},{zlib,zip,1},{riak_kv_pb_socket,process_message,2},{riak_kv_pb_socket,handle_info,2},{gen_server2,handle_msg,7},{proc_lib,init_p_do_apply,3}]}
>> 2012-06-15 19:09:32.320 [error] <0.284.0> Supervisor
>> riak_kv_pb_socket_sup had child undefined started with
>> {riak_kv_pb_socket,start_link,undefined} at <0.20970.188> exit with reason
>> {mem_error,[{zlib,call,3},{zlib,zip,1},{riak_kv_pb_socket,process_message,2},{riak_kv_pb_socket,handle_info,2},{gen_server2,handle_msg,7},{proc_lib,init_p_do_apply,3}]}
>> in context child_terminated
>> 2012-06-15 19:09:32.824 [error] <0.20974.188> gen_server <0.20974.188>
>> terminated with reason:
>> {mem_error,[{zlib,call,3},{zlib,zip,1},{riak_kv_pb_socket,process_message,2},{riak_kv_pb_socket,handle_info,2},{gen_server2,handle_msg,7},{proc_lib,init_p_do_apply,3}]}
>> 2012-06-15 19:09:32.972 [error] <0.20974.188> CRASH REPORT Process
>> <0.20974.188> with 0 neighbours crashed with reason:
>> {mem_error,[{zlib,call,3},{zlib,zip,1},{riak_kv_pb_socket,process_message,2},{riak_kv_pb_socket,handle_info,2},{gen_server2,handle_msg,7},{proc_lib,init_p_do_apply,3}]}
>>
>> *
>>
>> Crash.log*
>> --------------
>> 2012-06-15 19:09:31 =ERROR REPORT====
>> ** Generic server <0.20970.188> terminating
>> ** Last message in was
>> {tcp,#Port<0.6076011>,[11|<<10,6,117,114,108,99,97,116,18,39,50,53,48,50,97,98,102,49,55,97,100,102,100,48,98,55,102,48,57,48,52,99,48,99,98,101,52,48,100,100,100,55,49,55,50,48,51,51,57,34,122,10,109,34,50,53,48,50,97,98,102,49,55,97,100,102,100,48,98,55,102,48,57,48,52,99,48,99,98,101,52,48,100,100,100,55,49,55,50,48,51,51,57,58,58,32,99,97,116,101,103,111,114,121,32,49,44,32,107,101,121,119,111,114,100,32,49,44,32,99,97,116,101,103,111,114,121,32,50,44,32,99,97,116,101,103,111,114,121,32,51,44,32,107,101,121,119,111,114,100,50,44,32,107,101,121,119,111,114,100,51,34,18,9,116,101,120,116,47,106,115,111,110,40,2,48,2,56,1>>]}
>> ** When Server state == {state,#Port<0.6076011>,{riak_client,'
>> riak at 10.90.15.198',undefined},undefined,undefined,<<0,0,0,0>>}
>> ** Reason for termination ==
>> **
>> {mem_error,[{zlib,call,3},{zlib,zip,1},{riak_kv_pb_socket,process_message,2},{riak_kv_pb_socket,handle_info,2},{gen_server2,handle_msg,7},{proc_lib,init_p_do_apply,3}]}
>> 2012-06-15 19:09:32 =CRASH REPORT====
>>   crasher:
>>     initial call: gen:init_it/6
>>     pid: <0.20970.188>
>>     registered_name: []
>>     exception exit:
>> {mem_error,[{zlib,call,3},{zlib,zip,1},{riak_kv_pb_socket,process_message,2},{riak_kv_pb_socket,handle_info,2},{gen_server2,handle_msg,7},{proc_lib,init_p_do_apply,3}]}
>>       in function  gen_server2:terminate/6
>>       in call from proc_lib:init_p_do_apply/3
>>     ancestors: [riak_kv_pb_socket_sup,riak_kv_sup,<0.279.0>]
>>     messages: []
>>     links: [#Port<0.6076023>,<0.284.0>,#Port<0.6076011>]
>>     dictionary: []
>>     trap_exit: false
>>     status: running
>>     heap_size: 987
>>     stack_size: 24
>>     reductions: 974
>>   neighbours:
>> 2012-06-15 19:09:32 =SUPERVISOR REPORT====
>>      Supervisor: {local,riak_kv_pb_socket_sup}
>>      Context:    child_terminated
>>      Reason:
>> {mem_error,[{zlib,call,3},{zlib,zip,1},{riak_kv_pb_socket,process_message,2},{riak_kv_pb_socket,handle_info,2},{gen_server2,handle_msg,7},{proc_lib,init_p_do_apply,3}]}
>>      Offender:
>> [{pid,<0.20970.188>},{name,undefined},{mfargs,{riak_kv_pb_socket,start_link,undefined}},{restart_type,temporary},{shutdown,brutal_kill},{child_type,worker}]
>>
>> 2012-06-15 19:09:32 =ERROR REPORT====
>> ** Generic server <0.20974.188> terminating
>> ** Last message in was
>> {tcp,#Port<0.6076015>,[11|<<10,6,117,114,108,99,97,116,18,39,50,53,48,50,97,98,102,49,55,97,100,102,100,48,98,55,102,48,57,48,52,99,48,99,98,101,52,48,100,100,100,55,49,49,50,48,51,57,57,34,122,10,109,34,50,53,48,50,97,98,102,49,55,97,100,102,100,48,98,55,102,48,57,48,52,99,48,99,98,101,52,48,100,100,100,55,49,49,50,48,51,57,57,58,58,32,99,97,116,101,103,111,114,121,32,49,44,32,107,101,121,119,111,114,100,32,49,44,32,99,97,116,101,103,111,114,121,32,50,44,32,99,97,116,101,103,111,114,121,32,51,44,32,107,101,121,119,111,114,100,50,44,32,107,101,121,119,111,114,100,51,34,18,9,116,101,120,116,47,106,115,111,110,40,2,48,2,56,1>>]}
>> ** When Server state == {state,#Port<0.6076015>,{riak_client,'
>> riak at 10.90.15.198',undefined},undefined,undefined,<<0,0,0,0>>}
>> ** Reason for termination ==
>> **
>> {mem_error,[{zlib,call,3},{zlib,zip,1},{riak_kv_pb_socket,process_message,2},{riak_kv_pb_socket,handle_info,2},{gen_server2,handle_msg,7},{proc_lib,init_p_do_apply,3}]}
>> 2012-06-15 19:09:33 =CRASH REPORT====
>>   crasher:
>>     initial call: gen:init_it/6
>>     pid: <0.20974.188>
>>     registered_name: []
>>     exception exit:
>> {mem_error,[{zlib,call,3},{zlib,zip,1},{riak_kv_pb_socket,process_message,2},{riak_kv_pb_socket,handle_info,2},{gen_server2,handle_msg,7},{proc_lib,init_p_do_apply,3}]}
>>       in function  gen_server2:terminate/6
>>       in call from proc_lib:init_p_do_apply/3
>>     ancestors: [riak_kv_pb_socket_sup,riak_kv_sup,<0.279.0>]
>>     messages: []
>>     links: [#Port<0.6076029>,<0.284.0>,#Port<0.6076015>]
>>     dictionary: []
>>     trap_exit: false
>>     status: running
>>     heap_size: 987
>>     stack_size: 24
>>     reductions: 910
>>   neighbours:
>>
>>
>>
>> *Console.log*
>> --------------------
>> 2012-06-15 17:50:48.811 [info] <0.7.0> Application lager started on node '
>> riak at 10.90.15.198'
>> 2012-06-15 17:50:48.970 [info] <0.7.0> Application public_key started on
>> node 'riak at 10.90.15.198'
>> 2012-06-15 17:50:49.003 [info] <0.7.0> Application ssl started on node '
>> riak at 10.90.15.198'
>> 2012-06-15 17:50:49.037 [info] <0.7.0> Application riak_core started on
>> node 'riak at 10.90.15.198'
>> 2012-06-15 17:50:49.060 [info] <0.7.0> Application riak_control started
>> on node 'riak at 10.90.15.198'
>> 2012-06-15 17:50:49.061 [info] <0.7.0> Application basho_metrics started
>> on node 'riak at 10.90.15.198'
>> 2012-06-15 17:50:49.063 [info] <0.7.0> Application cluster_info started
>> on node 'riak at 10.90.15.198'
>> 2012-06-15 17:50:49.072 [info] <0.7.0> Application merge_index started on
>> node 'riak at 10.90.15.198'
>> 2012-06-15 17:50:49.083 [info] <0.180.0>@riak_core:wait_for_service:416
>> Waiting for service riak_pipe to start (0 seconds)
>> 2012-06-15 17:50:49.110 [info]
>> <0.249.0>@riak_core:wait_for_application:396 Waiting for application
>> riak_pipe to start (0 seconds).
>> 2012-06-15 17:50:49.111 [info] <0.7.0> Application riak_pipe started on
>> node 'riak at 10.90.15.198'
>> 2012-06-15 17:50:49.145 [info] <0.7.0> Application inets started on node '
>> riak at 10.90.15.198'
>> 2012-06-15 17:50:49.151 [info] <0.7.0> Application mochiweb started on
>> node 'riak at 10.90.15.198'
>> 2012-06-15 17:50:49.169 [info] <0.7.0> Application erlang_js started on
>> node 'riak at 10.90.15.198'
>> 2012-06-15 17:50:49.176 [info] <0.7.0> Application luke started on node '
>> riak at 10.90.15.198'
>> 2012-06-15 17:50:49.197 [info] <0.283.0>@riak_core:wait_for_service:416
>> Waiting for service riak_kv to start (0 seconds)
>> 2012-06-15 17:50:49.212 [info]
>> <0.249.0>@riak_core:wait_for_application:390 Wait complete for application
>> riak_pipe (0 seconds)
>> 2012-06-15 17:50:49.285 [info] <0.180.0>@riak_core:wait_for_service:410
>> Wait complete for service riak_pipe (0 seconds)
>> 2012-06-15 17:50:49.291 [info] <0.367.0>@riak_kv_js_vm:init:76
>> Spidermonkey VM (thread stack: 16MB, max heap: 8MB, pool: riak_kv_js_map)
>> host starting (<0.367.0>)
>> 2012-06-15 17:50:49.296 [info] <0.368.0>@riak_kv_js_vm:init:76
>> Spidermonkey VM (thread stack: 16MB, max heap: 8MB, pool: riak_kv_js_map)
>> host starting (<0.368.0>)
>> 2012-06-15 17:50:49.302 [info] <0.369.0>@riak_kv_js_vm:init:76
>> Spidermonkey VM (thread stack: 16MB, max heap: 8MB, pool: riak_kv_js_map)
>> host starting (<0.369.0>)
>> 2012-06-15 17:50:49.307 [info] <0.370.0>@riak_kv_js_vm:init:76
>> Spidermonkey VM (thread stack: 16MB, max heap: 8MB, pool: riak_kv_js_map)
>> host starting (<0.370.0>)
>> 2012-06-15 17:50:49.311 [info] <0.371.0>@riak_kv_js_vm:init:76
>> Spidermonkey VM (thread stack: 16MB, max heap: 8MB, pool: riak_kv_js_map)
>> host starting (<0.371.0>)
>> 2012-06-15 17:50:49.316 [info] <0.372.0>@riak_kv_js_vm:init:76
>> Spidermonkey VM (thread stack: 16MB, max heap: 8MB, pool: riak_kv_js_map)
>> host starting (<0.372.0>)
>> 2012-06-15 17:50:49.320 [info] <0.373.0>@riak_kv_js_vm:init:76
>> Spidermonkey VM (thread stack: 16MB, max heap: 8MB, pool: riak_kv_js_map)
>> host starting (<0.373.0>)
>> 2012-06-15 17:50:49.324 [info] <0.374.0>@riak_kv_js_vm:init:76
>> Spidermonkey VM (thread stack: 16MB, max heap: 8MB, pool: riak_kv_js_map)
>> host starting (<0.374.0>)
>> 2012-06-15 17:50:49.333 [info] <0.376.0>@riak_kv_js_vm:init:76
>> Spidermonkey VM (thread stack: 16MB, max heap: 8MB, pool:
>> riak_kv_js_reduce) host starting (<0.376.0>)
>> 2012-06-15 17:50:49.341 [info] <0.377.0>@riak_kv_js_vm:init:76
>> Spidermonkey VM (thread stack: 16MB, max heap: 8MB, pool:
>> riak_kv_js_reduce) host starting (<0.377.0>)
>> 2012-06-15 17:50:49.348 [info] <0.378.0>@riak_kv_js_vm:init:76
>> Spidermonkey VM (thread stack: 16MB, max heap: 8MB, pool:
>> riak_kv_js_reduce) host starting (<0.378.0>)
>> 2012-06-15 17:50:49.354 [info] <0.379.0>@riak_kv_js_vm:init:76
>> Spidermonkey VM (thread stack: 16MB, max heap: 8MB, pool:
>> riak_kv_js_reduce) host starting (<0.379.0>)
>> 2012-06-15 17:50:49.360 [info] <0.380.0>@riak_kv_js_vm:init:76
>> Spidermonkey VM (thread stack: 16MB, max heap: 8MB, pool:
>> riak_kv_js_reduce) host starting (<0.380.0>)
>> 2012-06-15 17:50:49.366 [info] <0.381.0>@riak_kv_js_vm:init:76
>> Spidermonkey VM (thread stack: 16MB, max heap: 8MB, pool:
>> riak_kv_js_reduce) host starting (<0.381.0>)
>> 2012-06-15 17:50:49.371 [info] <0.383.0>@riak_kv_js_vm:init:76
>> Spidermonkey VM (thread stack: 16MB, max heap: 8MB, pool: riak_kv_js_hook)
>> host starting (<0.383.0>)
>> 2012-06-15 17:50:49.375 [info] <0.384.0>@riak_kv_js_vm:init:76
>> Spidermonkey VM (thread stack: 16MB, max heap: 8MB, pool: riak_kv_js_hook)
>> host starting (<0.384.0>)
>> 2012-06-15 17:50:49.395 [info] <0.7.0> Application bitcask started on
>> node 'riak at 10.90.15.198'
>> 2012-06-15 17:50:49.567 [info]
>> <0.463.0>@riak_core:wait_for_application:396 Waiting for application
>> riak_kv to start (0 seconds).
>> 2012-06-15 17:50:49.571 [info] <0.7.0> Application riak_kv started on
>> node 'riak at 10.90.15.198'
>> 2012-06-15 17:50:49.573 [info] <0.7.0> Application riak_search started on
>> node 'riak at 10.90.15.198'
>> 2012-06-15 17:50:49.573 [info] <0.7.0> Application basho_stats started on
>> node 'riak at 10.90.15.198'
>> 2012-06-15 17:50:49.584 [info] <0.7.0> Application runtime_tools started
>> on node 'riak at 10.90.15.198'
>> 2012-06-15 17:50:49.669 [info]
>> <0.463.0>@riak_core:wait_for_application:390 Wait complete for application
>> riak_kv (0 seconds)
>> 2012-06-15 17:50:54.871 [info] <0.283.0>@riak_core:wait_for_service:410
>> Wait complete for service riak_kv (4 seconds)
>> 2012-06-15 18:26:48.764 [info] <0.42.0> alarm_handler:
>> {set,{system_memory_high_watermark,[]}}
>> 2012-06-15 19:09:31.777 [error] <0.20970.188> gen_server <0.20970.188>
>> terminated with reason:
>> {mem_error,[{zlib,call,3},{zlib,zip,1},{riak_kv_pb_socket,process_message,2},{riak_kv_pb_socket,handle_info,2},{gen_server2,handle_msg,7},{proc_lib,init_p_do_apply,3}]}
>> 2012-06-15 19:09:32.219 [error] <0.20970.188> CRASH REPORT Process
>> <0.20970.188> with 0 neighbours crashed with reason:
>> {mem_error,[{zlib,call,3},{zlib,zip,1},{riak_kv_pb_socket,process_message,2},{riak_kv_pb_socket,handle_info,2},{gen_server2,handle_msg,7},{proc_lib,init_p_do_apply,3}]}
>> 2012-06-15 19:09:32.320 [error] <0.284.0> Supervisor
>> riak_kv_pb_socket_sup had child undefined started with
>> {riak_kv_pb_socket,start_link,undefined} at <0.20970.188> exit with reason
>> {mem_error,[{zlib,call,3},{zlib,zip,1},{riak_kv_pb_socket,process_message,2},{riak_kv_pb_socket,handle_info,2},{gen_server2,handle_msg,7},{proc_lib,init_p_do_apply,3}]}
>> in context child_terminated
>> 2012-06-15 19:09:32.824 [error] <0.20974.188> gen_server <0.20974.188>
>> terminated with reason:
>> {mem_error,[{zlib,call,3},{zlib,zip,1},{riak_kv_pb_socket,process_message,2},{riak_kv_pb_socket,handle_info,2},{gen_server2,handle_msg,7},{proc_lib,init_p_do_apply,3}]}
>> 2012-06-15 19:09:32.972 [error] <0.20974.188> CRASH REPORT Process
>> <0.20974.188> with 0 neighbours crashed with reason:
>> {mem_error,[{zlib,call,3},{zlib,zip,1},{riak_kv_pb_socket,process_message,2},{riak_kv_pb_socket,handle_info,2},{gen_server2,handle_msg,7},{proc_lib,init_p_do_apply,3}]}
>>
>>
>> Thanks In Advance,
>> Amol Rajoba
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20120619/da1a1d4e/attachment.html>


More information about the riak-users mailing list