Riak Cluster Crash down on heavy load Benchmarking

Amol Rajoba amolrajoba at gmail.com
Fri Jun 15 11:35:04 EDT 2012


Hi Guys,
I am evaluating Riak as Kay-Value storage where my requirement is to store
huge set of data(more than RAM), so Riak was setup with LevelDB as backend.

Benchmarking involved 25 Agents doing put/store on single node for 100M
records.
It runs well till 3M but then complete cluster crashes with making all
nodes down.
Following are the System as well as Riak configurations with error & crash
logs

Please help to find what I am missing, I need to test riak & use it in
production as soon as possible.

Nodes: 2  (I know cluster of 5 is best but this is just test setup)
OS: Ubuntu 12.04 32bit
CPU: Core i3
RAM: 4GB
HDD: 500GB

app.config [changes only]

%% eLevelDB Config
 {eleveldb, [
             {data_root, "/data/riak/leveldb"},
             {block_size, 262144}, %%256k
             {cache_size, 104857600}, %% 100MB - default cache size 8MB
per-partition
             {write_buffer_size, 524288000}, %% 500MB in bytes
                {write_buffer_size_min, 524288000}, %% 500MB in bytes
                {write_buffer_size_max, 524288000}, %% 500MB in bytes
                {max_open_files, 100} %% Maximum number of files open at
once per partition- Default: 20 - Minimum: 20
            ]},


vm.args [changes only]
## Enable kernel poll and a few async threads
+K true
+A 128


Bucket "riaktest" properties:

{"props":{"allow_mult":false,"basic_quorum":false,"big_vclock":50,"chash_keyfun":{"mod":"riak_core_util","fun":"chash_std_keyfun"},"dw":"quorum","last_write_wins":true,"linkfun":{"mod":"riak_kv_wm_link_walker","fun":"mapreduce_linkfun"},"n_val":3,"name":"riaktest","notfound_ok":true,"old_vclock":86400,"postcommit":[],"pr":0,"precommit":[],"pw":0,"r":"one","rw":"one","small_vclock":50,"w":"one","young_vclock":20}}

relatime set in /etc/fstab on all drives

OS open files limit sysctl fs.file-max set to 800000


Following are the error.log, crash.log and console.log* *files*

error.log*
---------------
2012-06-15 19:09:31.777 [error] <0.20970.188> gen_server <0.20970.188>
terminated with reason:
{mem_error,[{zlib,call,3},{zlib,zip,1},{riak_kv_pb_socket,process_message,2},{riak_kv_pb_socket,handle_info,2},{gen_server2,handle_msg,7},{proc_lib,init_p_do_apply,3}]}
2012-06-15 19:09:32.219 [error] <0.20970.188> CRASH REPORT Process
<0.20970.188> with 0 neighbours crashed with reason:
{mem_error,[{zlib,call,3},{zlib,zip,1},{riak_kv_pb_socket,process_message,2},{riak_kv_pb_socket,handle_info,2},{gen_server2,handle_msg,7},{proc_lib,init_p_do_apply,3}]}
2012-06-15 19:09:32.320 [error] <0.284.0> Supervisor riak_kv_pb_socket_sup
had child undefined started with {riak_kv_pb_socket,start_link,undefined}
at <0.20970.188> exit with reason
{mem_error,[{zlib,call,3},{zlib,zip,1},{riak_kv_pb_socket,process_message,2},{riak_kv_pb_socket,handle_info,2},{gen_server2,handle_msg,7},{proc_lib,init_p_do_apply,3}]}
in context child_terminated
2012-06-15 19:09:32.824 [error] <0.20974.188> gen_server <0.20974.188>
terminated with reason:
{mem_error,[{zlib,call,3},{zlib,zip,1},{riak_kv_pb_socket,process_message,2},{riak_kv_pb_socket,handle_info,2},{gen_server2,handle_msg,7},{proc_lib,init_p_do_apply,3}]}
2012-06-15 19:09:32.972 [error] <0.20974.188> CRASH REPORT Process
<0.20974.188> with 0 neighbours crashed with reason:
{mem_error,[{zlib,call,3},{zlib,zip,1},{riak_kv_pb_socket,process_message,2},{riak_kv_pb_socket,handle_info,2},{gen_server2,handle_msg,7},{proc_lib,init_p_do_apply,3}]}

*

Crash.log*
--------------
2012-06-15 19:09:31 =ERROR REPORT====
** Generic server <0.20970.188> terminating
** Last message in was
{tcp,#Port<0.6076011>,[11|<<10,6,117,114,108,99,97,116,18,39,50,53,48,50,97,98,102,49,55,97,100,102,100,48,98,55,102,48,57,48,52,99,48,99,98,101,52,48,100,100,100,55,49,55,50,48,51,51,57,34,122,10,109,34,50,53,48,50,97,98,102,49,55,97,100,102,100,48,98,55,102,48,57,48,52,99,48,99,98,101,52,48,100,100,100,55,49,55,50,48,51,51,57,58,58,32,99,97,116,101,103,111,114,121,32,49,44,32,107,101,121,119,111,114,100,32,49,44,32,99,97,116,101,103,111,114,121,32,50,44,32,99,97,116,101,103,111,114,121,32,51,44,32,107,101,121,119,111,114,100,50,44,32,107,101,121,119,111,114,100,51,34,18,9,116,101,120,116,47,106,115,111,110,40,2,48,2,56,1>>]}
** When Server state == {state,#Port<0.6076011>,{riak_client,'
riak at 10.90.15.198',undefined},undefined,undefined,<<0,0,0,0>>}
** Reason for termination ==
**
{mem_error,[{zlib,call,3},{zlib,zip,1},{riak_kv_pb_socket,process_message,2},{riak_kv_pb_socket,handle_info,2},{gen_server2,handle_msg,7},{proc_lib,init_p_do_apply,3}]}
2012-06-15 19:09:32 =CRASH REPORT====
  crasher:
    initial call: gen:init_it/6
    pid: <0.20970.188>
    registered_name: []
    exception exit:
{mem_error,[{zlib,call,3},{zlib,zip,1},{riak_kv_pb_socket,process_message,2},{riak_kv_pb_socket,handle_info,2},{gen_server2,handle_msg,7},{proc_lib,init_p_do_apply,3}]}
      in function  gen_server2:terminate/6
      in call from proc_lib:init_p_do_apply/3
    ancestors: [riak_kv_pb_socket_sup,riak_kv_sup,<0.279.0>]
    messages: []
    links: [#Port<0.6076023>,<0.284.0>,#Port<0.6076011>]
    dictionary: []
    trap_exit: false
    status: running
    heap_size: 987
    stack_size: 24
    reductions: 974
  neighbours:
2012-06-15 19:09:32 =SUPERVISOR REPORT====
     Supervisor: {local,riak_kv_pb_socket_sup}
     Context:    child_terminated
     Reason:
{mem_error,[{zlib,call,3},{zlib,zip,1},{riak_kv_pb_socket,process_message,2},{riak_kv_pb_socket,handle_info,2},{gen_server2,handle_msg,7},{proc_lib,init_p_do_apply,3}]}
     Offender:
[{pid,<0.20970.188>},{name,undefined},{mfargs,{riak_kv_pb_socket,start_link,undefined}},{restart_type,temporary},{shutdown,brutal_kill},{child_type,worker}]

2012-06-15 19:09:32 =ERROR REPORT====
** Generic server <0.20974.188> terminating
** Last message in was
{tcp,#Port<0.6076015>,[11|<<10,6,117,114,108,99,97,116,18,39,50,53,48,50,97,98,102,49,55,97,100,102,100,48,98,55,102,48,57,48,52,99,48,99,98,101,52,48,100,100,100,55,49,49,50,48,51,57,57,34,122,10,109,34,50,53,48,50,97,98,102,49,55,97,100,102,100,48,98,55,102,48,57,48,52,99,48,99,98,101,52,48,100,100,100,55,49,49,50,48,51,57,57,58,58,32,99,97,116,101,103,111,114,121,32,49,44,32,107,101,121,119,111,114,100,32,49,44,32,99,97,116,101,103,111,114,121,32,50,44,32,99,97,116,101,103,111,114,121,32,51,44,32,107,101,121,119,111,114,100,50,44,32,107,101,121,119,111,114,100,51,34,18,9,116,101,120,116,47,106,115,111,110,40,2,48,2,56,1>>]}
** When Server state == {state,#Port<0.6076015>,{riak_client,'
riak at 10.90.15.198',undefined},undefined,undefined,<<0,0,0,0>>}
** Reason for termination ==
**
{mem_error,[{zlib,call,3},{zlib,zip,1},{riak_kv_pb_socket,process_message,2},{riak_kv_pb_socket,handle_info,2},{gen_server2,handle_msg,7},{proc_lib,init_p_do_apply,3}]}
2012-06-15 19:09:33 =CRASH REPORT====
  crasher:
    initial call: gen:init_it/6
    pid: <0.20974.188>
    registered_name: []
    exception exit:
{mem_error,[{zlib,call,3},{zlib,zip,1},{riak_kv_pb_socket,process_message,2},{riak_kv_pb_socket,handle_info,2},{gen_server2,handle_msg,7},{proc_lib,init_p_do_apply,3}]}
      in function  gen_server2:terminate/6
      in call from proc_lib:init_p_do_apply/3
    ancestors: [riak_kv_pb_socket_sup,riak_kv_sup,<0.279.0>]
    messages: []
    links: [#Port<0.6076029>,<0.284.0>,#Port<0.6076015>]
    dictionary: []
    trap_exit: false
    status: running
    heap_size: 987
    stack_size: 24
    reductions: 910
  neighbours:



*Console.log*
--------------------
2012-06-15 17:50:48.811 [info] <0.7.0> Application lager started on node '
riak at 10.90.15.198'
2012-06-15 17:50:48.970 [info] <0.7.0> Application public_key started on
node 'riak at 10.90.15.198'
2012-06-15 17:50:49.003 [info] <0.7.0> Application ssl started on node '
riak at 10.90.15.198'
2012-06-15 17:50:49.037 [info] <0.7.0> Application riak_core started on
node 'riak at 10.90.15.198'
2012-06-15 17:50:49.060 [info] <0.7.0> Application riak_control started on
node 'riak at 10.90.15.198'
2012-06-15 17:50:49.061 [info] <0.7.0> Application basho_metrics started on
node 'riak at 10.90.15.198'
2012-06-15 17:50:49.063 [info] <0.7.0> Application cluster_info started on
node 'riak at 10.90.15.198'
2012-06-15 17:50:49.072 [info] <0.7.0> Application merge_index started on
node 'riak at 10.90.15.198'
2012-06-15 17:50:49.083 [info] <0.180.0>@riak_core:wait_for_service:416
Waiting for service riak_pipe to start (0 seconds)
2012-06-15 17:50:49.110 [info] <0.249.0>@riak_core:wait_for_application:396
Waiting for application riak_pipe to start (0 seconds).
2012-06-15 17:50:49.111 [info] <0.7.0> Application riak_pipe started on
node 'riak at 10.90.15.198'
2012-06-15 17:50:49.145 [info] <0.7.0> Application inets started on node '
riak at 10.90.15.198'
2012-06-15 17:50:49.151 [info] <0.7.0> Application mochiweb started on node
'riak at 10.90.15.198'
2012-06-15 17:50:49.169 [info] <0.7.0> Application erlang_js started on
node 'riak at 10.90.15.198'
2012-06-15 17:50:49.176 [info] <0.7.0> Application luke started on node '
riak at 10.90.15.198'
2012-06-15 17:50:49.197 [info] <0.283.0>@riak_core:wait_for_service:416
Waiting for service riak_kv to start (0 seconds)
2012-06-15 17:50:49.212 [info] <0.249.0>@riak_core:wait_for_application:390
Wait complete for application riak_pipe (0 seconds)
2012-06-15 17:50:49.285 [info] <0.180.0>@riak_core:wait_for_service:410
Wait complete for service riak_pipe (0 seconds)
2012-06-15 17:50:49.291 [info] <0.367.0>@riak_kv_js_vm:init:76 Spidermonkey
VM (thread stack: 16MB, max heap: 8MB, pool: riak_kv_js_map) host starting
(<0.367.0>)
2012-06-15 17:50:49.296 [info] <0.368.0>@riak_kv_js_vm:init:76 Spidermonkey
VM (thread stack: 16MB, max heap: 8MB, pool: riak_kv_js_map) host starting
(<0.368.0>)
2012-06-15 17:50:49.302 [info] <0.369.0>@riak_kv_js_vm:init:76 Spidermonkey
VM (thread stack: 16MB, max heap: 8MB, pool: riak_kv_js_map) host starting
(<0.369.0>)
2012-06-15 17:50:49.307 [info] <0.370.0>@riak_kv_js_vm:init:76 Spidermonkey
VM (thread stack: 16MB, max heap: 8MB, pool: riak_kv_js_map) host starting
(<0.370.0>)
2012-06-15 17:50:49.311 [info] <0.371.0>@riak_kv_js_vm:init:76 Spidermonkey
VM (thread stack: 16MB, max heap: 8MB, pool: riak_kv_js_map) host starting
(<0.371.0>)
2012-06-15 17:50:49.316 [info] <0.372.0>@riak_kv_js_vm:init:76 Spidermonkey
VM (thread stack: 16MB, max heap: 8MB, pool: riak_kv_js_map) host starting
(<0.372.0>)
2012-06-15 17:50:49.320 [info] <0.373.0>@riak_kv_js_vm:init:76 Spidermonkey
VM (thread stack: 16MB, max heap: 8MB, pool: riak_kv_js_map) host starting
(<0.373.0>)
2012-06-15 17:50:49.324 [info] <0.374.0>@riak_kv_js_vm:init:76 Spidermonkey
VM (thread stack: 16MB, max heap: 8MB, pool: riak_kv_js_map) host starting
(<0.374.0>)
2012-06-15 17:50:49.333 [info] <0.376.0>@riak_kv_js_vm:init:76 Spidermonkey
VM (thread stack: 16MB, max heap: 8MB, pool: riak_kv_js_reduce) host
starting (<0.376.0>)
2012-06-15 17:50:49.341 [info] <0.377.0>@riak_kv_js_vm:init:76 Spidermonkey
VM (thread stack: 16MB, max heap: 8MB, pool: riak_kv_js_reduce) host
starting (<0.377.0>)
2012-06-15 17:50:49.348 [info] <0.378.0>@riak_kv_js_vm:init:76 Spidermonkey
VM (thread stack: 16MB, max heap: 8MB, pool: riak_kv_js_reduce) host
starting (<0.378.0>)
2012-06-15 17:50:49.354 [info] <0.379.0>@riak_kv_js_vm:init:76 Spidermonkey
VM (thread stack: 16MB, max heap: 8MB, pool: riak_kv_js_reduce) host
starting (<0.379.0>)
2012-06-15 17:50:49.360 [info] <0.380.0>@riak_kv_js_vm:init:76 Spidermonkey
VM (thread stack: 16MB, max heap: 8MB, pool: riak_kv_js_reduce) host
starting (<0.380.0>)
2012-06-15 17:50:49.366 [info] <0.381.0>@riak_kv_js_vm:init:76 Spidermonkey
VM (thread stack: 16MB, max heap: 8MB, pool: riak_kv_js_reduce) host
starting (<0.381.0>)
2012-06-15 17:50:49.371 [info] <0.383.0>@riak_kv_js_vm:init:76 Spidermonkey
VM (thread stack: 16MB, max heap: 8MB, pool: riak_kv_js_hook) host starting
(<0.383.0>)
2012-06-15 17:50:49.375 [info] <0.384.0>@riak_kv_js_vm:init:76 Spidermonkey
VM (thread stack: 16MB, max heap: 8MB, pool: riak_kv_js_hook) host starting
(<0.384.0>)
2012-06-15 17:50:49.395 [info] <0.7.0> Application bitcask started on node '
riak at 10.90.15.198'
2012-06-15 17:50:49.567 [info] <0.463.0>@riak_core:wait_for_application:396
Waiting for application riak_kv to start (0 seconds).
2012-06-15 17:50:49.571 [info] <0.7.0> Application riak_kv started on node '
riak at 10.90.15.198'
2012-06-15 17:50:49.573 [info] <0.7.0> Application riak_search started on
node 'riak at 10.90.15.198'
2012-06-15 17:50:49.573 [info] <0.7.0> Application basho_stats started on
node 'riak at 10.90.15.198'
2012-06-15 17:50:49.584 [info] <0.7.0> Application runtime_tools started on
node 'riak at 10.90.15.198'
2012-06-15 17:50:49.669 [info] <0.463.0>@riak_core:wait_for_application:390
Wait complete for application riak_kv (0 seconds)
2012-06-15 17:50:54.871 [info] <0.283.0>@riak_core:wait_for_service:410
Wait complete for service riak_kv (4 seconds)
2012-06-15 18:26:48.764 [info] <0.42.0> alarm_handler:
{set,{system_memory_high_watermark,[]}}
2012-06-15 19:09:31.777 [error] <0.20970.188> gen_server <0.20970.188>
terminated with reason:
{mem_error,[{zlib,call,3},{zlib,zip,1},{riak_kv_pb_socket,process_message,2},{riak_kv_pb_socket,handle_info,2},{gen_server2,handle_msg,7},{proc_lib,init_p_do_apply,3}]}
2012-06-15 19:09:32.219 [error] <0.20970.188> CRASH REPORT Process
<0.20970.188> with 0 neighbours crashed with reason:
{mem_error,[{zlib,call,3},{zlib,zip,1},{riak_kv_pb_socket,process_message,2},{riak_kv_pb_socket,handle_info,2},{gen_server2,handle_msg,7},{proc_lib,init_p_do_apply,3}]}
2012-06-15 19:09:32.320 [error] <0.284.0> Supervisor riak_kv_pb_socket_sup
had child undefined started with {riak_kv_pb_socket,start_link,undefined}
at <0.20970.188> exit with reason
{mem_error,[{zlib,call,3},{zlib,zip,1},{riak_kv_pb_socket,process_message,2},{riak_kv_pb_socket,handle_info,2},{gen_server2,handle_msg,7},{proc_lib,init_p_do_apply,3}]}
in context child_terminated
2012-06-15 19:09:32.824 [error] <0.20974.188> gen_server <0.20974.188>
terminated with reason:
{mem_error,[{zlib,call,3},{zlib,zip,1},{riak_kv_pb_socket,process_message,2},{riak_kv_pb_socket,handle_info,2},{gen_server2,handle_msg,7},{proc_lib,init_p_do_apply,3}]}
2012-06-15 19:09:32.972 [error] <0.20974.188> CRASH REPORT Process
<0.20974.188> with 0 neighbours crashed with reason:
{mem_error,[{zlib,call,3},{zlib,zip,1},{riak_kv_pb_socket,process_message,2},{riak_kv_pb_socket,handle_info,2},{gen_server2,handle_msg,7},{proc_lib,init_p_do_apply,3}]}


Thanks In Advance,
Amol Rajoba
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20120615/58af035f/attachment.html>


More information about the riak-users mailing list