Riak Cluster Crash down on heavy load Benchmarking

Amol Rajoba amolrajoba at gmail.com
Tue Jun 19 02:34:26 EDT 2012


Hi Guys,

Can anybody help here?

Also found that due to continuous put operation beam.smp taking 2.9G RAM,
what could be reason of this?

Thanks.
Amol Rajoba


On Sat, Jun 16, 2012 at 1:13 PM, Amol Rajoba <amolrajoba at gmail.com> wrote:

> Hi Guys,
> I am evaluating Riak as Kay-Value storage where my requirement is to store
> huge set of data(more than RAM), so Riak was setup with LevelDB as backend.
>
> Clients were connected using protocol buffer api.
> {pb_backlog, 100000}, in app.config
>
> Benchmarking involved 25 Agents doing put/store on single node for 100M
> records.
> It runs well till 3M but then complete cluster crashes with making all
> nodes down.
>
> Following are the System as well as Riak configurations with error & crash
> logs
>
> Please help to find what I am missing, I need to test riak & use it in
> production as soon as possible.
>
> Nodes: 2  (I know cluster of 5 is best but this is just test setup)
> OS: Ubuntu 12.04 32bit
> CPU: Core i3
> RAM: 4GB
> HDD: 500GB
>
> app.config [changes only]
>
> %% eLevelDB Config
>  {eleveldb, [
>              {data_root, "/data/riak/leveldb"},
>              {block_size, 262144}, %%256k
>              {cache_size, 10485760}, %% 10MB - default cache size 8MB
> per-partition
>              {write_buffer_size, 524288000}, %% 500MB in bytes
>                 {write_buffer_size_min, 524288000}, %% 500MB in bytes
>                 {write_buffer_size_max, 524288000}, %% 500MB in bytes
>                 {max_open_files, 100} %% Maximum number of files open at
> once per partition- Default: 20 - Minimum: 20
>             ]},
>
>
> vm.args [changes only]
> ## Enable kernel poll and a few async threads
> +K true
> +A 128
>
>
> Bucket "riaktest" properties:
>
> {"props":{"allow_mult":false,"
>
> basic_quorum":false,"big_vclock":50,"chash_keyfun":{"mod":"riak_core_util","fun":"chash_std_keyfun"},"dw":"quorum","last_write_wins":true,"linkfun":{"mod":"riak_kv_wm_link_walker","fun":"mapreduce_linkfun"},"n_val":3,"name":"riaktest","notfound_ok":true,"old_vclock":86400,"postcommit":[],"pr":0,"precommit":[],"pw":0,"r":"one","rw":"one","small_vclock":50,"w":"one","young_vclock":20}}
>
> relatime set in /etc/fstab on all drives
>
> OS open files limit sysctl fs.file-max set to 800000
>
>
> Following are the error.log, crash.log and console.log* *files*
>
> error.log*
> ---------------
> 2012-06-15 19:09:31.777 [error] <0.20970.188> gen_server <0.20970.188>
> terminated with reason:
> {mem_error,[{zlib,call,3},{zlib,zip,1},{riak_kv_pb_socket,process_message,2},{riak_kv_pb_socket,handle_info,2},{gen_server2,handle_msg,7},{proc_lib,init_p_do_apply,3}]}
> 2012-06-15 19:09:32.219 [error] <0.20970.188> CRASH REPORT Process
> <0.20970.188> with 0 neighbours crashed with reason:
> {mem_error,[{zlib,call,3},{zlib,zip,1},{riak_kv_pb_socket,process_message,2},{riak_kv_pb_socket,handle_info,2},{gen_server2,handle_msg,7},{proc_lib,init_p_do_apply,3}]}
> 2012-06-15 19:09:32.320 [error] <0.284.0> Supervisor riak_kv_pb_socket_sup
> had child undefined started with {riak_kv_pb_socket,start_link,undefined}
> at <0.20970.188> exit with reason
> {mem_error,[{zlib,call,3},{zlib,zip,1},{riak_kv_pb_socket,process_message,2},{riak_kv_pb_socket,handle_info,2},{gen_server2,handle_msg,7},{proc_lib,init_p_do_apply,3}]}
> in context child_terminated
> 2012-06-15 19:09:32.824 [error] <0.20974.188> gen_server <0.20974.188>
> terminated with reason:
> {mem_error,[{zlib,call,3},{zlib,zip,1},{riak_kv_pb_socket,process_message,2},{riak_kv_pb_socket,handle_info,2},{gen_server2,handle_msg,7},{proc_lib,init_p_do_apply,3}]}
> 2012-06-15 19:09:32.972 [error] <0.20974.188> CRASH REPORT Process
> <0.20974.188> with 0 neighbours crashed with reason:
> {mem_error,[{zlib,call,3},{zlib,zip,1},{riak_kv_pb_socket,process_message,2},{riak_kv_pb_socket,handle_info,2},{gen_server2,handle_msg,7},{proc_lib,init_p_do_apply,3}]}
>
> *
>
> Crash.log*
> --------------
> 2012-06-15 19:09:31 =ERROR REPORT====
> ** Generic server <0.20970.188> terminating
> ** Last message in was
> {tcp,#Port<0.6076011>,[11|<<10,6,117,114,108,99,97,116,18,39,50,53,48,50,97,98,102,49,55,97,100,102,100,48,98,55,102,48,57,48,52,99,48,99,98,101,52,48,100,100,100,55,49,55,50,48,51,51,57,34,122,10,109,34,50,53,48,50,97,98,102,49,55,97,100,102,100,48,98,55,102,48,57,48,52,99,48,99,98,101,52,48,100,100,100,55,49,55,50,48,51,51,57,58,58,32,99,97,116,101,103,111,114,121,32,49,44,32,107,101,121,119,111,114,100,32,49,44,32,99,97,116,101,103,111,114,121,32,50,44,32,99,97,116,101,103,111,114,121,32,51,44,32,107,101,121,119,111,114,100,50,44,32,107,101,121,119,111,114,100,51,34,18,9,116,101,120,116,47,106,115,111,110,40,2,48,2,56,1>>]}
> ** When Server state == {state,#Port<0.6076011>,{riak_client,'
> riak at 10.90.15.198',undefined},undefined,undefined,<<0,0,0,0>>}
> ** Reason for termination ==
> **
> {mem_error,[{zlib,call,3},{zlib,zip,1},{riak_kv_pb_socket,process_message,2},{riak_kv_pb_socket,handle_info,2},{gen_server2,handle_msg,7},{proc_lib,init_p_do_apply,3}]}
> 2012-06-15 19:09:32 =CRASH REPORT====
>   crasher:
>     initial call: gen:init_it/6
>     pid: <0.20970.188>
>     registered_name: []
>     exception exit:
> {mem_error,[{zlib,call,3},{zlib,zip,1},{riak_kv_pb_socket,process_message,2},{riak_kv_pb_socket,handle_info,2},{gen_server2,handle_msg,7},{proc_lib,init_p_do_apply,3}]}
>       in function  gen_server2:terminate/6
>       in call from proc_lib:init_p_do_apply/3
>     ancestors: [riak_kv_pb_socket_sup,riak_kv_sup,<0.279.0>]
>     messages: []
>     links: [#Port<0.6076023>,<0.284.0>,#Port<0.6076011>]
>     dictionary: []
>     trap_exit: false
>     status: running
>     heap_size: 987
>     stack_size: 24
>     reductions: 974
>   neighbours:
> 2012-06-15 19:09:32 =SUPERVISOR REPORT====
>      Supervisor: {local,riak_kv_pb_socket_sup}
>      Context:    child_terminated
>      Reason:
> {mem_error,[{zlib,call,3},{zlib,zip,1},{riak_kv_pb_socket,process_message,2},{riak_kv_pb_socket,handle_info,2},{gen_server2,handle_msg,7},{proc_lib,init_p_do_apply,3}]}
>      Offender:
> [{pid,<0.20970.188>},{name,undefined},{mfargs,{riak_kv_pb_socket,start_link,undefined}},{restart_type,temporary},{shutdown,brutal_kill},{child_type,worker}]
>
> 2012-06-15 19:09:32 =ERROR REPORT====
> ** Generic server <0.20974.188> terminating
> ** Last message in was
> {tcp,#Port<0.6076015>,[11|<<10,6,117,114,108,99,97,116,18,39,50,53,48,50,97,98,102,49,55,97,100,102,100,48,98,55,102,48,57,48,52,99,48,99,98,101,52,48,100,100,100,55,49,49,50,48,51,57,57,34,122,10,109,34,50,53,48,50,97,98,102,49,55,97,100,102,100,48,98,55,102,48,57,48,52,99,48,99,98,101,52,48,100,100,100,55,49,49,50,48,51,57,57,58,58,32,99,97,116,101,103,111,114,121,32,49,44,32,107,101,121,119,111,114,100,32,49,44,32,99,97,116,101,103,111,114,121,32,50,44,32,99,97,116,101,103,111,114,121,32,51,44,32,107,101,121,119,111,114,100,50,44,32,107,101,121,119,111,114,100,51,34,18,9,116,101,120,116,47,106,115,111,110,40,2,48,2,56,1>>]}
> ** When Server state == {state,#Port<0.6076015>,{riak_client,'
> riak at 10.90.15.198',undefined},undefined,undefined,<<0,0,0,0>>}
> ** Reason for termination ==
> **
> {mem_error,[{zlib,call,3},{zlib,zip,1},{riak_kv_pb_socket,process_message,2},{riak_kv_pb_socket,handle_info,2},{gen_server2,handle_msg,7},{proc_lib,init_p_do_apply,3}]}
> 2012-06-15 19:09:33 =CRASH REPORT====
>   crasher:
>     initial call: gen:init_it/6
>     pid: <0.20974.188>
>     registered_name: []
>     exception exit:
> {mem_error,[{zlib,call,3},{zlib,zip,1},{riak_kv_pb_socket,process_message,2},{riak_kv_pb_socket,handle_info,2},{gen_server2,handle_msg,7},{proc_lib,init_p_do_apply,3}]}
>       in function  gen_server2:terminate/6
>       in call from proc_lib:init_p_do_apply/3
>     ancestors: [riak_kv_pb_socket_sup,riak_kv_sup,<0.279.0>]
>     messages: []
>     links: [#Port<0.6076029>,<0.284.0>,#Port<0.6076015>]
>     dictionary: []
>     trap_exit: false
>     status: running
>     heap_size: 987
>     stack_size: 24
>     reductions: 910
>   neighbours:
>
>
>
> *Console.log*
> --------------------
> 2012-06-15 17:50:48.811 [info] <0.7.0> Application lager started on node '
> riak at 10.90.15.198'
> 2012-06-15 17:50:48.970 [info] <0.7.0> Application public_key started on
> node 'riak at 10.90.15.198'
> 2012-06-15 17:50:49.003 [info] <0.7.0> Application ssl started on node '
> riak at 10.90.15.198'
> 2012-06-15 17:50:49.037 [info] <0.7.0> Application riak_core started on
> node 'riak at 10.90.15.198'
> 2012-06-15 17:50:49.060 [info] <0.7.0> Application riak_control started on
> node 'riak at 10.90.15.198'
> 2012-06-15 17:50:49.061 [info] <0.7.0> Application basho_metrics started
> on node 'riak at 10.90.15.198'
> 2012-06-15 17:50:49.063 [info] <0.7.0> Application cluster_info started on
> node 'riak at 10.90.15.198'
> 2012-06-15 17:50:49.072 [info] <0.7.0> Application merge_index started on
> node 'riak at 10.90.15.198'
> 2012-06-15 17:50:49.083 [info] <0.180.0>@riak_core:wait_for_service:416
> Waiting for service riak_pipe to start (0 seconds)
> 2012-06-15 17:50:49.110 [info]
> <0.249.0>@riak_core:wait_for_application:396 Waiting for application
> riak_pipe to start (0 seconds).
> 2012-06-15 17:50:49.111 [info] <0.7.0> Application riak_pipe started on
> node 'riak at 10.90.15.198'
> 2012-06-15 17:50:49.145 [info] <0.7.0> Application inets started on node '
> riak at 10.90.15.198'
> 2012-06-15 17:50:49.151 [info] <0.7.0> Application mochiweb started on
> node 'riak at 10.90.15.198'
> 2012-06-15 17:50:49.169 [info] <0.7.0> Application erlang_js started on
> node 'riak at 10.90.15.198'
> 2012-06-15 17:50:49.176 [info] <0.7.0> Application luke started on node '
> riak at 10.90.15.198'
> 2012-06-15 17:50:49.197 [info] <0.283.0>@riak_core:wait_for_service:416
> Waiting for service riak_kv to start (0 seconds)
> 2012-06-15 17:50:49.212 [info]
> <0.249.0>@riak_core:wait_for_application:390 Wait complete for application
> riak_pipe (0 seconds)
> 2012-06-15 17:50:49.285 [info] <0.180.0>@riak_core:wait_for_service:410
> Wait complete for service riak_pipe (0 seconds)
> 2012-06-15 17:50:49.291 [info] <0.367.0>@riak_kv_js_vm:init:76
> Spidermonkey VM (thread stack: 16MB, max heap: 8MB, pool: riak_kv_js_map)
> host starting (<0.367.0>)
> 2012-06-15 17:50:49.296 [info] <0.368.0>@riak_kv_js_vm:init:76
> Spidermonkey VM (thread stack: 16MB, max heap: 8MB, pool: riak_kv_js_map)
> host starting (<0.368.0>)
> 2012-06-15 17:50:49.302 [info] <0.369.0>@riak_kv_js_vm:init:76
> Spidermonkey VM (thread stack: 16MB, max heap: 8MB, pool: riak_kv_js_map)
> host starting (<0.369.0>)
> 2012-06-15 17:50:49.307 [info] <0.370.0>@riak_kv_js_vm:init:76
> Spidermonkey VM (thread stack: 16MB, max heap: 8MB, pool: riak_kv_js_map)
> host starting (<0.370.0>)
> 2012-06-15 17:50:49.311 [info] <0.371.0>@riak_kv_js_vm:init:76
> Spidermonkey VM (thread stack: 16MB, max heap: 8MB, pool: riak_kv_js_map)
> host starting (<0.371.0>)
> 2012-06-15 17:50:49.316 [info] <0.372.0>@riak_kv_js_vm:init:76
> Spidermonkey VM (thread stack: 16MB, max heap: 8MB, pool: riak_kv_js_map)
> host starting (<0.372.0>)
> 2012-06-15 17:50:49.320 [info] <0.373.0>@riak_kv_js_vm:init:76
> Spidermonkey VM (thread stack: 16MB, max heap: 8MB, pool: riak_kv_js_map)
> host starting (<0.373.0>)
> 2012-06-15 17:50:49.324 [info] <0.374.0>@riak_kv_js_vm:init:76
> Spidermonkey VM (thread stack: 16MB, max heap: 8MB, pool: riak_kv_js_map)
> host starting (<0.374.0>)
> 2012-06-15 17:50:49.333 [info] <0.376.0>@riak_kv_js_vm:init:76
> Spidermonkey VM (thread stack: 16MB, max heap: 8MB, pool:
> riak_kv_js_reduce) host starting (<0.376.0>)
> 2012-06-15 17:50:49.341 [info] <0.377.0>@riak_kv_js_vm:init:76
> Spidermonkey VM (thread stack: 16MB, max heap: 8MB, pool:
> riak_kv_js_reduce) host starting (<0.377.0>)
> 2012-06-15 17:50:49.348 [info] <0.378.0>@riak_kv_js_vm:init:76
> Spidermonkey VM (thread stack: 16MB, max heap: 8MB, pool:
> riak_kv_js_reduce) host starting (<0.378.0>)
> 2012-06-15 17:50:49.354 [info] <0.379.0>@riak_kv_js_vm:init:76
> Spidermonkey VM (thread stack: 16MB, max heap: 8MB, pool:
> riak_kv_js_reduce) host starting (<0.379.0>)
> 2012-06-15 17:50:49.360 [info] <0.380.0>@riak_kv_js_vm:init:76
> Spidermonkey VM (thread stack: 16MB, max heap: 8MB, pool:
> riak_kv_js_reduce) host starting (<0.380.0>)
> 2012-06-15 17:50:49.366 [info] <0.381.0>@riak_kv_js_vm:init:76
> Spidermonkey VM (thread stack: 16MB, max heap: 8MB, pool:
> riak_kv_js_reduce) host starting (<0.381.0>)
> 2012-06-15 17:50:49.371 [info] <0.383.0>@riak_kv_js_vm:init:76
> Spidermonkey VM (thread stack: 16MB, max heap: 8MB, pool: riak_kv_js_hook)
> host starting (<0.383.0>)
> 2012-06-15 17:50:49.375 [info] <0.384.0>@riak_kv_js_vm:init:76
> Spidermonkey VM (thread stack: 16MB, max heap: 8MB, pool: riak_kv_js_hook)
> host starting (<0.384.0>)
> 2012-06-15 17:50:49.395 [info] <0.7.0> Application bitcask started on node
> 'riak at 10.90.15.198'
> 2012-06-15 17:50:49.567 [info]
> <0.463.0>@riak_core:wait_for_application:396 Waiting for application
> riak_kv to start (0 seconds).
> 2012-06-15 17:50:49.571 [info] <0.7.0> Application riak_kv started on node
> 'riak at 10.90.15.198'
> 2012-06-15 17:50:49.573 [info] <0.7.0> Application riak_search started on
> node 'riak at 10.90.15.198'
> 2012-06-15 17:50:49.573 [info] <0.7.0> Application basho_stats started on
> node 'riak at 10.90.15.198'
> 2012-06-15 17:50:49.584 [info] <0.7.0> Application runtime_tools started
> on node 'riak at 10.90.15.198'
> 2012-06-15 17:50:49.669 [info]
> <0.463.0>@riak_core:wait_for_application:390 Wait complete for application
> riak_kv (0 seconds)
> 2012-06-15 17:50:54.871 [info] <0.283.0>@riak_core:wait_for_service:410
> Wait complete for service riak_kv (4 seconds)
> 2012-06-15 18:26:48.764 [info] <0.42.0> alarm_handler:
> {set,{system_memory_high_watermark,[]}}
> 2012-06-15 19:09:31.777 [error] <0.20970.188> gen_server <0.20970.188>
> terminated with reason:
> {mem_error,[{zlib,call,3},{zlib,zip,1},{riak_kv_pb_socket,process_message,2},{riak_kv_pb_socket,handle_info,2},{gen_server2,handle_msg,7},{proc_lib,init_p_do_apply,3}]}
> 2012-06-15 19:09:32.219 [error] <0.20970.188> CRASH REPORT Process
> <0.20970.188> with 0 neighbours crashed with reason:
> {mem_error,[{zlib,call,3},{zlib,zip,1},{riak_kv_pb_socket,process_message,2},{riak_kv_pb_socket,handle_info,2},{gen_server2,handle_msg,7},{proc_lib,init_p_do_apply,3}]}
> 2012-06-15 19:09:32.320 [error] <0.284.0> Supervisor riak_kv_pb_socket_sup
> had child undefined started with {riak_kv_pb_socket,start_link,undefined}
> at <0.20970.188> exit with reason
> {mem_error,[{zlib,call,3},{zlib,zip,1},{riak_kv_pb_socket,process_message,2},{riak_kv_pb_socket,handle_info,2},{gen_server2,handle_msg,7},{proc_lib,init_p_do_apply,3}]}
> in context child_terminated
> 2012-06-15 19:09:32.824 [error] <0.20974.188> gen_server <0.20974.188>
> terminated with reason:
> {mem_error,[{zlib,call,3},{zlib,zip,1},{riak_kv_pb_socket,process_message,2},{riak_kv_pb_socket,handle_info,2},{gen_server2,handle_msg,7},{proc_lib,init_p_do_apply,3}]}
> 2012-06-15 19:09:32.972 [error] <0.20974.188> CRASH REPORT Process
> <0.20974.188> with 0 neighbours crashed with reason:
> {mem_error,[{zlib,call,3},{zlib,zip,1},{riak_kv_pb_socket,process_message,2},{riak_kv_pb_socket,handle_info,2},{gen_server2,handle_msg,7},{proc_lib,init_p_do_apply,3}]}
>
>
> Thanks In Advance,
> Amol Rajoba
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20120619/ef39e54d/attachment.html>


More information about the riak-users mailing list