Riak 1.4.12 high I/O while almost idle

Timo Gatsonides tpjg at me.com
Fri May 22 09:52:27 EDT 2015


I have a 7 node Riak cluster. Most nodes are on 1.4.10 and I am currently migrating the nodes from RedHat to Ubuntu (and Ext4 -> ZFS). 

One of the nodes is running Ubuntu 14 with ZFS for the data and has Riak 1.4.12 installed. On this node Riak is continuously reading and writing a lot of data. Even hours after it is started. There are no running transfers (riak-admin transfers). The other nodes are not very busy, almost no CPU, no I/O. For this one node see the output of iotop, uname -a, riak-admin status and some recent logging below.

There is nothing obvious in the Riak logging, except that there is an occasional reading/writing large object (once a minute or so, objects of around 5Mb-10Mb).

Is there a way to see what Riak is doing, what it is reading/writing?

Thanks,
Timo

Total DISK READ :      81.99 M/s | Total DISK WRITE :      33.33 M/s
Actual DISK READ:      39.59 M/s | Actual DISK WRITE:     241.72 M/s
  TID  PRIO  USER     DISK READ  DISK WRITE  SWAPIN     IO>    COMMAND                                    
 8723 be/4 riak       23.82 M/s   12.05 M/s  0.00 % 80.54 % beam.smp -K true -A 64~ -smp enable -- console
 8725 be/4 riak        4.80 M/s  814.60 B/s  0.00 % 72.67 % beam.smp -K true -A 64~ -smp enable -- console
 8657 be/4 riak        3.83 M/s  814.60 B/s  0.00 % 68.98 % beam.smp -K true -A 64~ -smp enable -- console
  698 be/0 root        0.00 B/s    0.00 B/s  0.00 % 47.67 % [txg_sync]
 8724 be/4 riak       13.07 M/s    7.68 M/s  0.00 % 39.04 % beam.smp -K true -A 64~ -smp enable -- console
 8726 be/4 riak       50.91 K/s   13.59 M/s  0.00 % 36.77 % beam.smp -K true -A 64~ -smp enable -- console
 7917 be/4 root       51.11 K/s    0.00 B/s  0.00 %  4.26 % [kworker/u16:5]
  234 be/3 root        0.00 B/s 1629.19 B/s  0.00 %  0.01 % [jbd2/sda6-8]
 8302 be/4 riak        0.00 B/s  814.60 B/s  0.00 %  0.00 % run_erl -daemon /tmp/r~ /usr/sbin/riak console
 9942 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [kworker/u16:6]
10319 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [kworker/u16:0]
10320 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [kworker/u16:1]
 8401 be/4 riak        0.00 B/s  814.60 B/s  0.00 %  0.00 % beam.smp -K true -A 

#uname -a
Linux tx4 3.13.0-49-generic #83-Ubuntu SMP Fri Apr 10 20:11:33 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

sys_system_version : <<"Erlang R15B01 (erts-5.9.1) [source] [64-bit] [smp:8:8] [async-threads:64] [kernel-poll:true]">>
sys_system_architecture : <<"x86_64-unknown-linux-gnu">>
sys_threads_enabled : true
sys_thread_pool_size : 64
sys_wordsize : 8
ring_num_partitions : 64
ring_creation_size : 64
storage_backend : riak_kv_multi_backend
erlydtl_version : <<"0.7.0">>
riak_control_version : <<"1.4.12-0-g964c5db">>
cluster_info_version : <<"1.2.4">>
riak_search_version : <<"1.4.12-0-g7fe0e00">>
merge_index_version : <<"1.3.2-0-gcb38ee7">>
riak_kv_version : <<"1.4.12-0-gc6bbd66">>

1-minute stats for ...
-------------------------------------------
riak_kv_stat_ts : 1432301431
vnode_gets : 748
vnode_gets_total : 10051
vnode_puts : 182
vnode_puts_total : 3690
vnode_index_refreshes : 0
vnode_index_refreshes_total : 0
vnode_index_reads : 10
vnode_index_reads_total : 100
vnode_index_writes : 182
vnode_index_writes_total : 3686
vnode_index_writes_postings : 2
vnode_index_writes_postings_total : 139
vnode_index_deletes : 0
vnode_index_deletes_total : 0
vnode_index_deletes_postings : 0
vnode_index_deletes_postings_total : 13
node_gets : 0
node_gets_total : 0
node_get_fsm_siblings_mean : 0
node_get_fsm_siblings_median : 0
node_get_fsm_siblings_95 : 0
node_get_fsm_siblings_99 : 0
node_get_fsm_siblings_100 : 0
node_get_fsm_objsize_mean : 0
node_get_fsm_objsize_median : 0
node_get_fsm_objsize_95 : 0
node_get_fsm_objsize_99 : 0
node_get_fsm_objsize_100 : 0
node_get_fsm_time_mean : 0
node_get_fsm_time_median : 0
node_get_fsm_time_95 : 0
node_get_fsm_time_99 : 0
node_get_fsm_time_100 : 0
node_puts : 64
node_puts_total : 1196
node_put_fsm_time_mean : 691827
node_put_fsm_time_median : 2168
node_put_fsm_time_95 : 4415096
node_put_fsm_time_99 : 8226455
node_put_fsm_time_100 : 12045311
read_repairs : 0
read_repairs_total : 0
coord_redirs_total : 0

==> /var/log/riak/erlang.log.1 <==
08:41:19.235 [info] monitor long_gc <0.7.0> [{name,application_controller},{initial_call,{application_controller,start,1}},{almost_current_function,{gen_server,loop,6}},{message_queue_len,0}] [{timeout,293},{old_heap_block_size,0},{heap_block_size,6765},{mbuf_size,0},{stack_size,23},{old_heap_size,0},{heap_size,2142}]
08:41:19.437 [warning] Writing very large object (19730102 bytes) to <<...>>
==> /var/log/riak/console.log <==
2015-05-22 08:41:19.235 [info] <0.96.0>@riak_core_sysmon_handler:handle_event:92 monitor long_gc <0.7.0> [{name,application_controller},{initial_call,{application_controller,start,1}},{almost_current_function,{gen_server,loop,6}},{message_queue_len,0}] [{timeout,293},{old_heap_block_size,0},{heap_block_size,6765},{mbuf_size,0},{stack_size,23},{old_heap_size,0},{heap_size,2142}]
2015-05-22 08:41:19.437 [warning] <0.710.0>@riak_kv_vnode:encode_and_put_no_sib_check:1900 Writing very large object (19730102 bytes) to <<...>>
==> /var/log/riak/erlang.log.1 <==
08:41:23.417 [info] monitor long_gc <0.436.0> [{name,proxy_riak_kv_vnode_68507889249886074290797726533575766546371837952},{initial_call,{riak_core_vnode_proxy,init,1}},{almost_current_function,{riak_core_vnode_proxy,loop,2}},{message_queue_len,0}] [{timeout,2093},{old_heap_block_size,0},{heap_block_size,987},{mbuf_size,0},{stack_size,5},{old_heap_size,0},{heap_size,481}]
==> /var/log/riak/console.log <==
2015-05-22 08:41:23.417 [info] <0.96.0>@riak_core_sysmon_handler:handle_event:92 monitor long_gc <0.436.0> [{name,proxy_riak_kv_vnode_68507889249886074290797726533575766546371837952},{initial_call,{riak_core_vnode_proxy,init,1}},{almost_current_function,{riak_core_vnode_proxy,loop,2}},{message_queue_len,0}] [{timeout,2093},{old_heap_block_size,0},{heap_block_size,987},{mbuf_size,0},{stack_size,5},{old_heap_size,0},{heap_size,481}]
==> /var/log/riak/erlang.log.1 <==
08:41:25.319 [warning] Reading large object of size 12178287 from <<...>>
==> /var/log/riak/console.log <==
2015-05-22 08:41:25.319 [warning] <0.707.0>@riak_kv_vnode:do_get_object:1365 Reading large object of size 12178287 from <<..">>
==> /var/log/riak/erlang.log.1 <==
08:42:00.687 [info] monitor long_gc <0.707.0> [{initial_call,{riak_core_vnode,init,1}},{almost_current_function,{eleveldb,get,3}},{message_queue_len,7}] [{timeout,485},{old_heap_block_size,0},{heap_block_size,2584},{mbuf_size,0},{stack_size,46},{old_heap_size,0},{heap_size,909}]
==> /var/log/riak/console.log <==
2015-05-22 08:42:00.687 [info] <0.96.0>@riak_core_sysmon_handler:handle_event:92 monitor long_gc <0.707.0> [{initial_call,{riak_core_vnode,init,1}},{almost_current_function,{eleveldb,get,3}},{message_queue_len,7}] [{timeout,485},{old_heap_block_size,0},{heap_block_size,2584},{mbuf_size,0},{stack_size,46},{old_heap_size,0},{heap_size,909}]
==> /var/log/riak/erlang.log.1 <==
08:42:40.036 [info] monitor long_gc <0.712.0> [{initial_call,{riak_core_vnode,init,1}},{almost_current_function,{eleveldb,write,3}},{message_queue_len,0}] [{timeout,759},{old_heap_block_size,0},{heap_block_size,4181},{mbuf_size,0},{stack_size,73},{old_heap_size,0},{heap_size,1683}]
==> /var/log/riak/console.log <==
2015-05-22 08:42:40.036 [info] <0.96.0>@riak_core_sysmon_handler:handle_event:92 monitor long_gc <0.712.0> [{initial_call,{riak_core_vnode,init,1}},{almost_current_function,{eleveldb,write,3}},{message_queue_len,0}] [{timeout,759},{old_heap_block_size,0},{heap_block_size,4181},{mbuf_size,0},{stack_size,73},{old_heap_size,0},{heap_size,1683}]


Some of the non-standard /etc/riak/vm.args:

## Increase number of concurrent ports/sockets
-env ERL_MAX_PORTS 64000
## Tweak GC to run more often 
-env ERL_FULLSWEEP_AFTER 0
## Set the location of crash dumps
-env ERL_CRASH_DUMP /var/log/riak/erl_crash.dump
## Raise the ETS table limit
-env ERL_MAX_ETS_TABLES 256000
## Force the erlang VM to use SMP
-smp enable
## For nodes with many busy_dist_port events, Basho recommends
## raising the sender-side network distribution buffer size.
## 32MB may not be sufficient for some workloads and is a suggested
## starting point.
## The Erlang/OTP default is 1024 (1 megabyte).
## See: http://www.erlang.org/doc/man/erl.html#%2bzdbbl
+zdbbl 32768
## Raise the default erlang process limit 
+P 256000


Some of the non-standard /etc/riak/app.config:

%% -*- tab-width: 4;erlang-indent-level: 4;indent-tabs-mode: nil -*-
%% ex: ts=4 sw=4 et
[
 %% For firewall config limit ports
 { kernel, [
            {inet_dist_listen_min, 6000},
            {inet_dist_listen_max, 7999}
          ]},
 %% Riak Client APIs config
              %% riak_handoff_port is the TCP port that Riak uses for
              %% intra-cluster data handoff.
              {handoff_port, 8099 },
              {handoff_concurrency, 3},             
             
 %% Riak KV config
 {riak_kv, [
            %% Storage_backend specifies the Erlang module defining the storage
            %% mechanism that will be used on this node.

            {storage_backend, riak_kv_multi_backend},
            {multi_backend_default, <<"eleveldb">>},
            {multi_backend, [        
              {<<"eleveldb">>, riak_kv_eleveldb_backend, [
                {data_root, "/data/riak/leveldb"}
              ]},
              {<<"fs">>, riak_kv_eleveldb_backend, [
                {data_root, "/bigdata/riak/bigleveldb"}
              ]},
              {<<"cache">>, riak_kv_memory_backend, [
                {max_memory, 16}, %% 16Mb
				{ttl, 86400} %% 1 Day in seconds
              ]}
            ]},

            {anti_entropy, {off, []}},
            {vnode_vclocks, true},
            {legacy_keylisting, false},
            {listkeys_backpressure, true},
            {legacy_vnode_routing, false},
            {fsm_limit, 50000},
            {object_format, v1}
           ]},
 {riak_search, [
                %% To enable Search functionality set this 'true'.
                {enabled, false}
               ]},

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20150522/5665a971/attachment-0002.html>


More information about the riak-users mailing list