get and put operations are slow on Riak KV cluster

Abhinav Tripathi abhi.160382 at gmail.com
Mon Jun 13 03:00:33 EDT 2016


Hi,
I am seeing quite slow get and put operations on our Riak KV cluster. We
are new to Riak and have tried to follow the instructions in the
documentation. It would be great if you could take a look and let us know
what are we doing wrong.

We are using Riak as a distributed cache for our application. The 95th
percentile (and above) get operations are >10ms. Similarly, 99th and above
percentile for put operations is getting close to 10ms. Ideally, we want
the 99% gets to stay around 1ms and 99% puts to stay <5ms.

I have attached the configurations for all 3 nodes in our cluster. Also,
attached are recent console.log files. I have attached a recent one-minute
stats file as well from which the below values are.

*GETs*
    "node_get_fsm_time_100": 71429,
    "node_get_fsm_time_95": 16551,
    "node_get_fsm_time_99": 27513,
    "node_get_fsm_time_mean": 4393,
    "node_get_fsm_time_median": 989

*PUTs*
    "node_put_fsm_time_100": 10785,
    "node_put_fsm_time_95": 2679,
    "node_put_fsm_time_99": 7800,
    "node_put_fsm_time_mean": 1752,
    "node_put_fsm_time_median": 1608

I have recorded many one-minute stats for our cluster. Most of them for the
past couple of days show high values like these.

All three Riak nodes are 4-core, 16GB RAM, 128 GB SSD machines ...
basically m3.xlarge AWS instances.

Also, I can see many such lines in the console.log,

2016-06-13 00:03:32.545 [info] <0.23701.3269> Merged
{["/apps/riak/lib/bitcask/502391187832497878132516661246222288006726811648/75.bitcask.data","/apps/riak/lib/bitcask/502391187832497878132516661246222288006726811648/76.bitcask.data","/apps/riak/lib/bitcask/502391187832497878132516661246222288006726811648/77.bitcask.data","/apps/riak/lib/bitcask/502391187832497878132516661246222288006726811648/78.bitcask.data"],[]}
in 596.530484 seconds.

These merge operations are usually taking close to 600 seconds. Could that
be a problem as well?

Thanks,
Abhinav.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20160613/7ea622ad/attachment-0002.html>
-------------- next part --------------
{
    "connected_nodes": [
        "riak at 30.0.232.244",
        "riak at 30.0.238.48"
    ],
    "consistent_get_objsize_100": 0,
    "consistent_get_objsize_95": 0,
    "consistent_get_objsize_99": 0,
    "consistent_get_objsize_mean": 0,
    "consistent_get_objsize_median": 0,
    "consistent_get_time_100": 0,
    "consistent_get_time_95": 0,
    "consistent_get_time_99": 0,
    "consistent_get_time_mean": 0,
    "consistent_get_time_median": 0,
    "consistent_gets": 0,
    "consistent_gets_total": 0,
    "consistent_put_objsize_100": 0,
    "consistent_put_objsize_95": 0,
    "consistent_put_objsize_99": 0,
    "consistent_put_objsize_mean": 0,
    "consistent_put_objsize_median": 0,
    "consistent_put_time_100": 0,
    "consistent_put_time_95": 0,
    "consistent_put_time_99": 0,
    "consistent_put_time_mean": 0,
    "consistent_put_time_median": 0,
    "consistent_puts": 0,
    "consistent_puts_total": 0,
    "converge_delay_last": 0,
    "converge_delay_max": 0,
    "converge_delay_mean": 0,
    "converge_delay_min": 0,
    "coord_redirs_total": 6185211,
    "counter_actor_counts_100": 0,
    "counter_actor_counts_95": 0,
    "counter_actor_counts_99": 0,
    "counter_actor_counts_mean": 0,
    "counter_actor_counts_median": 0,
    "cpu_avg1": 205,
    "cpu_avg15": 274,
    "cpu_avg5": 294,
    "cpu_nprocs": 353,
    "dropped_vnode_requests_total": 0,
    "executing_mappers": 0,
    "gossip_received": 7,
    "handoff_timeouts": 0,
    "ignored_gossip_total": 0,
    "index_fsm_active": 0,
    "index_fsm_create": 0,
    "index_fsm_create_error": 0,
    "late_put_fsm_coordinator_ack": 0,
    "leveldb_read_block_error": "undefined",
    "list_fsm_active": 0,
    "list_fsm_create": 0,
    "list_fsm_create_error": 0,
    "list_fsm_create_error_total": 0,
    "list_fsm_create_total": 0,
    "map_actor_counts_100": 0,
    "map_actor_counts_95": 0,
    "map_actor_counts_99": 0,
    "map_actor_counts_mean": 0,
    "map_actor_counts_median": 0,
    "mem_allocated": 15607599104,
    "mem_total": 15800115200,
    "memory_atom": 662409,
    "memory_atom_used": 656469,
    "memory_binary": 17152336,
    "memory_code": 15317897,
    "memory_ets": 7150904,
    "memory_processes": 188217880,
    "memory_processes_used": 187865168,
    "memory_system": 54139384,
    "memory_total": 242357264,
    "node_get_fsm_active": 0,
    "node_get_fsm_active_60s": 1048,
    "node_get_fsm_counter_objsize_100": 0,
    "node_get_fsm_counter_objsize_95": 0,
    "node_get_fsm_counter_objsize_99": 0,
    "node_get_fsm_counter_objsize_mean": 0,
    "node_get_fsm_counter_objsize_median": 0,
    "node_get_fsm_counter_siblings_100": 0,
    "node_get_fsm_counter_siblings_95": 0,
    "node_get_fsm_counter_siblings_99": 0,
    "node_get_fsm_counter_siblings_mean": 0,
    "node_get_fsm_counter_siblings_median": 0,
    "node_get_fsm_counter_time_100": 0,
    "node_get_fsm_counter_time_95": 0,
    "node_get_fsm_counter_time_99": 0,
    "node_get_fsm_counter_time_mean": 0,
    "node_get_fsm_counter_time_median": 0,
    "node_get_fsm_errors": 0,
    "node_get_fsm_errors_total": 0,
    "node_get_fsm_in_rate": 0,
    "node_get_fsm_map_objsize_100": 0,
    "node_get_fsm_map_objsize_95": 0,
    "node_get_fsm_map_objsize_99": 0,
    "node_get_fsm_map_objsize_mean": 0,
    "node_get_fsm_map_objsize_median": 0,
    "node_get_fsm_map_siblings_100": 0,
    "node_get_fsm_map_siblings_95": 0,
    "node_get_fsm_map_siblings_99": 0,
    "node_get_fsm_map_siblings_mean": 0,
    "node_get_fsm_map_siblings_median": 0,
    "node_get_fsm_map_time_100": 0,
    "node_get_fsm_map_time_95": 0,
    "node_get_fsm_map_time_99": 0,
    "node_get_fsm_map_time_mean": 0,
    "node_get_fsm_map_time_median": 0,
    "node_get_fsm_objsize_100": 214,
    "node_get_fsm_objsize_95": 193,
    "node_get_fsm_objsize_99": 214,
    "node_get_fsm_objsize_mean": 176,
    "node_get_fsm_objsize_median": 176,
    "node_get_fsm_out_rate": 0,
    "node_get_fsm_rejected": 0,
    "node_get_fsm_rejected_60s": 0,
    "node_get_fsm_rejected_total": 0,
    "node_get_fsm_set_objsize_100": 0,
    "node_get_fsm_set_objsize_95": 0,
    "node_get_fsm_set_objsize_99": 0,
    "node_get_fsm_set_objsize_mean": 0,
    "node_get_fsm_set_objsize_median": 0,
    "node_get_fsm_set_siblings_100": 0,
    "node_get_fsm_set_siblings_95": 0,
    "node_get_fsm_set_siblings_99": 0,
    "node_get_fsm_set_siblings_mean": 0,
    "node_get_fsm_set_siblings_median": 0,
    "node_get_fsm_set_time_100": 0,
    "node_get_fsm_set_time_95": 0,
    "node_get_fsm_set_time_99": 0,
    "node_get_fsm_set_time_mean": 0,
    "node_get_fsm_set_time_median": 0,
    "node_get_fsm_siblings_100": 1,
    "node_get_fsm_siblings_95": 1,
    "node_get_fsm_siblings_99": 1,
    "node_get_fsm_siblings_mean": 1,
    "node_get_fsm_siblings_median": 1,
    "node_get_fsm_time_100": 71429,
    "node_get_fsm_time_95": 16551,
    "node_get_fsm_time_99": 27513,
    "node_get_fsm_time_mean": 4393,
    "node_get_fsm_time_median": 989,
    "node_gets": 1027,
    "node_gets_counter": 0,
    "node_gets_counter_total": 0,
    "node_gets_map": 0,
    "node_gets_map_total": 0,
    "node_gets_set": 0,
    "node_gets_set_total": 0,
    "node_gets_total": 2649075,
    "node_put_fsm_active": 0,
    "node_put_fsm_active_60s": 5893,
    "node_put_fsm_counter_time_100": 0,
    "node_put_fsm_counter_time_95": 0,
    "node_put_fsm_counter_time_99": 0,
    "node_put_fsm_counter_time_mean": 0,
    "node_put_fsm_counter_time_median": 0,
    "node_put_fsm_in_rate": 123,
    "node_put_fsm_map_time_100": 0,
    "node_put_fsm_map_time_95": 0,
    "node_put_fsm_map_time_99": 0,
    "node_put_fsm_map_time_mean": 0,
    "node_put_fsm_map_time_median": 0,
    "node_put_fsm_out_rate": 124,
    "node_put_fsm_rejected": 0,
    "node_put_fsm_rejected_60s": 0,
    "node_put_fsm_rejected_total": 0,
    "node_put_fsm_set_time_100": 0,
    "node_put_fsm_set_time_95": 0,
    "node_put_fsm_set_time_99": 0,
    "node_put_fsm_set_time_mean": 0,
    "node_put_fsm_set_time_median": 0,
    "node_put_fsm_time_100": 10785,
    "node_put_fsm_time_95": 2679,
    "node_put_fsm_time_99": 7800,
    "node_put_fsm_time_mean": 1752,
    "node_put_fsm_time_median": 1608,
    "node_puts": 5131,
    "node_puts_counter": 0,
    "node_puts_counter_total": 0,
    "node_puts_map": 0,
    "node_puts_map_total": 0,
    "node_puts_set": 0,
    "node_puts_set_total": 0,
    "node_puts_total": 36492684,
    "nodename": "riak at 30.0.235.230",
    "object_counter_merge": 0,
    "object_counter_merge_time_100": 0,
    "object_counter_merge_time_95": 0,
    "object_counter_merge_time_99": 0,
    "object_counter_merge_time_mean": 0,
    "object_counter_merge_time_median": 0,
    "object_counter_merge_total": 0,
    "object_map_merge": 0,
    "object_map_merge_time_100": 0,
    "object_map_merge_time_95": 0,
    "object_map_merge_time_99": 0,
    "object_map_merge_time_mean": 0,
    "object_map_merge_time_median": 0,
    "object_map_merge_total": 0,
    "object_merge": 1432,
    "object_merge_time_100": 140,
    "object_merge_time_95": 33,
    "object_merge_time_99": 54,
    "object_merge_time_mean": 19,
    "object_merge_time_median": 19,
    "object_merge_total": 1352769,
    "object_set_merge": 0,
    "object_set_merge_time_100": 0,
    "object_set_merge_time_95": 0,
    "object_set_merge_time_99": 0,
    "object_set_merge_time_mean": 0,
    "object_set_merge_time_median": 0,
    "object_set_merge_total": 0,
    "pbc_active": 16,
    "pbc_connects": 0,
    "pbc_connects_total": 1112,
    "pipeline_active": 0,
    "pipeline_create_count": 0,
    "pipeline_create_error_count": 0,
    "pipeline_create_error_one": 0,
    "pipeline_create_one": 0,
    "postcommit_fail": 0,
    "precommit_fail": 0,
    "read_repairs": 67,
    "read_repairs_counter": 0,
    "read_repairs_counter_total": 0,
    "read_repairs_fallback_notfound_count": 4890,
    "read_repairs_fallback_notfound_one": 0,
    "read_repairs_fallback_outofdate_count": 2,
    "read_repairs_fallback_outofdate_one": 0,
    "read_repairs_map": 0,
    "read_repairs_map_total": 0,
    "read_repairs_primary_notfound_count": 5370,
    "read_repairs_primary_notfound_one": 0,
    "read_repairs_primary_outofdate_count": 119812,
    "read_repairs_primary_outofdate_one": 201,
    "read_repairs_set": 0,
    "read_repairs_set_total": 0,
    "read_repairs_total": 51935,
    "rebalance_delay_last": 0,
    "rebalance_delay_max": 0,
    "rebalance_delay_mean": 0,
    "rebalance_delay_min": 0,
    "rejected_handoffs": 3,
    "riak_kv_vnodeq_max": 2,
    "riak_kv_vnodeq_mean": 0.09523809523809523,
    "riak_kv_vnodeq_median": 0,
    "riak_kv_vnodeq_min": 0,
    "riak_kv_vnodeq_total": 2,
    "riak_kv_vnodes_running": 21,
    "riak_pipe_vnodeq_max": 0,
    "riak_pipe_vnodeq_mean": 0.0,
    "riak_pipe_vnodeq_median": 0,
    "riak_pipe_vnodeq_min": 0,
    "riak_pipe_vnodeq_total": 0,
    "riak_pipe_vnodes_running": 21,
    "ring_creation_size": 64,
    "ring_members": [
        "riak at 30.0.232.244",
        "riak at 30.0.235.230",
        "riak at 30.0.238.48"
    ],
    "ring_num_partitions": 64,
    "ring_ownership": "[{'riak at 30.0.235.230',21},{'riak at 30.0.232.244',22},{'riak at 30.0.238.48',21}]",
    "rings_reconciled": 0,
    "rings_reconciled_total": 65,
    "set_actor_counts_100": 0,
    "set_actor_counts_95": 0,
    "set_actor_counts_99": 0,
    "set_actor_counts_mean": 0,
    "set_actor_counts_median": 0,
    "skipped_read_repairs": 0,
    "skipped_read_repairs_total": 0,
    "storage_backend": "riak_kv_bitcask_backend",
    "sys_driver_version": "2.2",
    "sys_global_heaps_size": "deprecated",
    "sys_heap_type": "private",
    "sys_logical_processors": 4,
    "sys_monitor_count": 845,
    "sys_otp_release": "R16B02_basho10",
    "sys_port_count": 399,
    "sys_process_count": 1540,
    "sys_smp_support": true,
    "sys_system_architecture": "x86_64-unknown-linux-gnu",
    "sys_system_version": "Erlang R16B02_basho10 (erts-5.10.3) [source] [64-bit] [smp:4:4] [async-threads:64] [kernel-poll:true] [frame-pointer]",
    "sys_thread_pool_size": 64,
    "sys_threads_enabled": true,
    "sys_wordsize": 8,
    "vnode_counter_update": 0,
    "vnode_counter_update_time_100": 0,
    "vnode_counter_update_time_95": 0,
    "vnode_counter_update_time_99": 0,
    "vnode_counter_update_time_mean": 0,
    "vnode_counter_update_time_median": 0,
    "vnode_counter_update_total": 0,
    "vnode_get_fsm_time_100": 124086,
    "vnode_get_fsm_time_95": 13404,
    "vnode_get_fsm_time_99": 25332,
    "vnode_get_fsm_time_mean": 2739,
    "vnode_get_fsm_time_median": 167,
    "vnode_gets": 1986,
    "vnode_gets_total": 9025418,
    "vnode_index_deletes": 0,
    "vnode_index_deletes_postings": 0,
    "vnode_index_deletes_postings_total": 0,
    "vnode_index_deletes_total": 0,
    "vnode_index_reads": 0,
    "vnode_index_reads_total": 0,
    "vnode_index_refreshes": 0,
    "vnode_index_refreshes_total": 0,
    "vnode_index_writes": 0,
    "vnode_index_writes_postings": 0,
    "vnode_index_writes_postings_total": 0,
    "vnode_index_writes_total": 0,
    "vnode_map_update": 0,
    "vnode_map_update_time_100": 0,
    "vnode_map_update_time_95": 0,
    "vnode_map_update_time_99": 0,
    "vnode_map_update_time_mean": 0,
    "vnode_map_update_time_median": 0,
    "vnode_map_update_total": 0,
    "vnode_put_fsm_time_100": 163113,
    "vnode_put_fsm_time_95": 866,
    "vnode_put_fsm_time_99": 13305,
    "vnode_put_fsm_time_mean": 679,
    "vnode_put_fsm_time_median": 477,
    "vnode_puts": 15019,
    "vnode_puts_total": 104405368,
    "vnode_set_update": 0,
    "vnode_set_update_time_100": 0,
    "vnode_set_update_time_95": 0,
    "vnode_set_update_time_99": 0,
    "vnode_set_update_time_mean": 0,
    "vnode_set_update_time_median": 0,
    "vnode_set_update_total": 0,
    "write_once_merge": 0,
    "write_once_put_objsize_100": 0,
    "write_once_put_objsize_95": 0,
    "write_once_put_objsize_99": 0,
    "write_once_put_objsize_mean": 0,
    "write_once_put_objsize_median": 0,
    "write_once_put_time_100": 0,
    "write_once_put_time_95": 0,
    "write_once_put_time_99": 0,
    "write_once_put_time_mean": 0,
    "write_once_put_time_median": 0,
    "write_once_puts": 0,
    "write_once_puts_total": 0,
    "disk": [
        {
            "id": "/",
            "size": 25798684,
            "used": 7
        },
        {
            "id": "/dev/shm",
            "size": 7714900,
            "used": 1
        },
        {
            "id": "/home",
            "size": 10321208,
            "used": 5
        },
        {
            "id": "/tmp",
            "size": 10321208,
            "used": 2
        },
        {
            "id": "/var",
            "size": 10321208,
            "used": 5
        },
        {
            "id": "/var/log",
            "size": 10321208,
            "used": 3
        },
        {
            "id": "/opt",
            "size": 10321208,
            "used": 5
        },
        {
            "id": "/usr",
            "size": 10321208,
            "used": 14
        },
        {
            "id": "/apps",
            "size": 113529504,
            "used": 41
        }
    ],
    "riak_auth_mods_version": "2.1.0-0-g31b8b30",
    "erlydtl_version": "0.7.0",
    "riak_control_version": "2.1.2-0-gab3f924",
    "cluster_info_version": "2.0.3-0-g76c73fc",
    "yokozuna_version": "2.1.2-0-g3520d11",
    "ibrowse_version": "4.0.2",
    "riak_search_version": "2.1.1-0-gffe2113",
    "merge_index_version": "2.0.1-0-g0c8f77c",
    "riak_kv_version": "2.1.2-0-gf969bba",
    "riak_api_version": "2.1.2-0-gd8d510f",
    "riak_pb_version": "2.1.0.2-0-g620bc70",
    "protobuffs_version": "0.8.1p5-0-gf88fc3c",
    "riak_dt_version": "2.1.1-0-ga2986bc",
    "sidejob_version": "2.0.0-0-gc5aabba",
    "riak_pipe_version": "2.1.1-0-gb1ac2cf",
    "riak_core_version": "2.1.5-0-gb02ab53",
    "exometer_core_version": "1.0.0-basho2-0-gb47a5d6",
    "poolboy_version": "0.8.1p3-0-g8bb45fb",
    "pbkdf2_version": "2.0.0-0-g7076584",
    "eleveldb_version": "2.0.17-0-g973fc92",
    "clique_version": "0.3.2-0-ge332c8f",
    "bitcask_version": "1.7.2",
    "basho_stats_version": "1.0.3",
    "webmachine_version": "1.10.8-0-g7677c24",
    "mochiweb_version": "2.9.0",
    "inets_version": "5.9.6",
    "xmerl_version": "1.3.4",
    "erlang_js_version": "1.3.0-0-g07467d8",
    "runtime_tools_version": "1.8.12",
    "os_mon_version": "2.2.13",
    "riak_sysmon_version": "2.0.0",
    "ssl_version": "5.3.1",
    "public_key_version": "0.20",
    "crypto_version": "3.1",
    "asn1_version": "2.0.3",
    "sasl_version": "2.3.3",
    "lager_version": "2.1.1",
    "goldrush_version": "0.1.7",
    "compiler_version": "4.9.3",
    "syntax_tools_version": "1.6.11",
    "stdlib_version": "1.19.3",
    "kernel_version": "2.16.3"
}
-------------- next part --------------
A non-text attachment was scrubbed...
Name: riak1.conf
Type: application/octet-stream
Size: 13740 bytes
Desc: not available
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20160613/7ea622ad/attachment.conf>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: riak1.console.log
Type: text/x-log
Size: 516630 bytes
Desc: not available
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20160613/7ea622ad/attachment.log>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: riak2.conf
Type: application/octet-stream
Size: 13738 bytes
Desc: not available
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20160613/7ea622ad/attachment-0001.conf>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: riak2.console.log
Type: text/x-log
Size: 611764 bytes
Desc: not available
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20160613/7ea622ad/attachment-0001.log>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: riak3.conf
Type: application/octet-stream
Size: 13738 bytes
Desc: not available
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20160613/7ea622ad/attachment-0002.conf>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: riak3.console.log
Type: text/x-log
Size: 591169 bytes
Desc: not available
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20160613/7ea622ad/attachment-0002.log>


More information about the riak-users mailing list