Riak performance issue with many connected objects (11 million +)

Lei Gu legu at e-dialog.com
Fri Aug 31 13:49:07 EDT 2012


Hi Jared,
Thanks for getting back to us so quickly and please find my answers below.

In terms of n_val, can I switch to 1 now and restart Riak?

I am building a tree. The tree is 6 levels deep, and the leaf nodes (7th level) are a different object type. They are connected through
Bi-directional links and we are using secondary indexes on all those 11 million objects.

The system is extremely slow (minutes) when running show all buckets query and didn't return for query by key while the CPU was pegging 100% the entire time.
Thanks.
-- Lei

From: Jared Morrow <jared at basho.com<mailto:jared at basho.com>>
Date: Friday, August 31, 2012 1:35 PM
To: Lei Gu <legu at e-dialog.com<mailto:legu at e-dialog.com>>
Cc: "riak-users at lists.basho.com<mailto:riak-users at lists.basho.com>" <riak-users at lists.basho.com<mailto:riak-users at lists.basho.com>>
Subject: Re: Riak performance issue with many connected objects (11 million +)

Lei,

One issue that stands out is that you are running a single-node cluster.   With a n_val of 3, you will be storing 3 copies of every piece of data on that one node.  Also, if you are doing a read with a r of 3, you are waiting for the data to be read three times from a single node.  If possible, consider making a larger cluster (even with smaller machines) to get a better idea about how Riak operates.  We recommend a minimum of five physical nodes for a n_val of 3.  If you lower your n_val you can possibly lower your physical node count and still get good performance at the sacrifice of higher availability.

Given all of the above, it is still unusual that your reads *never* return.  There might be another issue there irregardless of cluster size.  When you say "connected" objects, are you referring to links and link-walking?   If so, how deep are you going with your links?  Some context here would be helpful.

Thanks,
Jared



On Fri, Aug 31, 2012 at 9:54 AM, Lei Gu <legu at e-dialog.com<mailto:legu at e-dialog.com>> wrote:
I started performance testing Riak by loading 11 million connected objects in Riak. Loading has not been an issue, although it did take a long time.
Now when I tried to access an object, Riak never returns and was using 100% of CPU and 74% of memory.

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 8952 riak      20   0 23.4g  11g 1708 S 100.6 74.1   1520:49 beam.smp

This raises major concern on our side. I am using LevelDB for secondary index support and I haven't turn on free text search yet. I am running Centos6, with 16 G memory and Intel i7 8 core.
Any help is highly appreciated.
Thanks.
-- Lei


Riak status:
[root at lgu-linux performance]# cat riak-status.txt
Attempting to restart script through sudo -u riak
1-minute stats for 'riak at 127.0.0.1<mailto:riak at 127.0.0.1><mailto:'riak at 127.0.0.1<mailto:riak at 127.0.0.1>>'
-------------------------------------------
vnode gets : 13728
vnode_puts : 9153
vnode_index_reads : 0
vnode_index_writes : 9153
vnode_index_writes_postings : 4575
vnode_index_deletes : 234
vnode_index_deletes_postings : 0
read_repairs : 0
vnode_gets_total : 17412882
vnode_puts_total : 11608449
vnode_index_reads_total : 0
vnode_index_writes_total : 11608449
vnode_index_writes_postings_total : 5798670
vnode_index_deletes_total : 234
vnode_index_deletes_postings_total : 402
node_gets : 0
node_gets_total : 5804294
node_get_fsm_time_mean : 0
node_get_fsm_time_median : 0
node_get_fsm_time_95 : 0
node_get_fsm_time_99 : 0
node_get_fsm_time_100 : 0
node_puts : 0
node_puts_total : 3869483
node_put_fsm_time_mean : 0
node_put_fsm_time_median : 0
node_put_fsm_time_95 : 0
node_put_fsm_time_99 : 0
node_put_fsm_time_100 : 0
node_get_fsm_siblings_mean : 0
node_get_fsm_siblings_median : 0
node_get_fsm_siblings_95 : 0
node_get_fsm_siblings_99 : 0
node_get_fsm_siblings_100 : 0
node_get_fsm_objsize_mean : 0
node_get_fsm_objsize_median : 0
node_get_fsm_objsize_95 : 0
node_get_fsm_objsize_99 : 0
node_get_fsm_objsize_100 : 0
read_repairs_total : 0
coord_redirs_total : 0
precommit_fail : 0
postcommit_fail : 0
cpu_nprocs : 655
cpu_avg1 : 8
cpu_avg5 : 3
cpu_avg15 : 5
mem_total : 16699068416
mem_allocated : 5336322048
disk : [{"/",51606140,18},
        {"/dev/shm",8153840,1},
        {"/boot",495844,9},
        {"/home",891064276,1}]
nodename : 'riak at 127.0.0.1<mailto:riak at 127.0.0.1><mailto:'riak at 127.0.0.1<mailto:riak at 127.0.0.1>>'
connected_nodes : []
sys_driver_version : <<"1.5">>
sys_global_heaps_size : 0
sys_heap_type : private
sys_logical_processors : 8
sys_otp_release : <<"R14B04">>
sys_process_count : 1559
sys_smp_support : true
sys_system_version : <<"Erlang R14B04 (erts-5.8.5) [source] [64-bit] [smp:8:8] [rq:8] [async-threads:64] [hipe] [kernel-poll:true]">>
sys_system_architecture : <<"x86_64-unknown-linux-gnu">>
sys_threads_enabled : true
sys_thread_pool_size : 64
sys_wordsize : 8
ring_members : ['riak at 127.0.0.1<mailto:riak at 127.0.0.1><mailto:'riak at 127.0.0.1<mailto:riak at 127.0.0.1>>']
ring_num_partitions : 64
ring_ownership : <<"[{'riak at 127.0.0.1<mailto:riak at 127.0.0.1><mailto:'riak at 127.0.0.1<mailto:riak at 127.0.0.1>>',64}]">>
ring_creation_size : 64
storage_backend : riak_kv_eleveldb_backend
pbc_connects_total : 10
pbc_connects : 1
pbc_active : 0
ssl_version : <<"4.1.6">>
public_key_version : <<"0.13">>
runtime_tools_version : <<"1.8.6">>
basho_stats_version : <<"1.0.2">>
riak_search_version : <<"1.1.2">>
riak_kv_version : <<"1.1.4">>
bitcask_version : <<"1.5.1">>
luke_version : <<"0.2.5">>
erlang_js_version : <<"1.0.2">>
mochiweb_version : <<"1.5.1">>
inets_version : <<"5.7.1">>
riak_pipe_version : <<"1.1.2">>
merge_index_version : <<"1.1.0">>
cluster_info_version : <<"1.2.1">>
basho_metrics_version : <<"1.0.0">>
riak_control_version : <<"0.1.0">>
riak_core_version : <<"1.1.2">>
lager_version : <<"1.0.0">>
riak_sysmon_version : <<"1.1.2">>
webmachine_version : <<"1.9.1">>
crypto_version : <<"2.0.4">>
os_mon_version : <<"2.2.7">>
sasl_version : <<"2.1.10">>
stdlib_version : <<"1.17.5">>
kernel_version : <<"2.14.5">>
executing_mappers : 0
memory_total : 29539192
memory_processes : 9988968
memory_processes_used : 9958144
memory_system : 19550224
memory_atom : 1050169
memory_atom_used : 1036476
memory_binary : 421104
memory_code : 9330790
memory_ets : 4598912


Riak config file:

%% -*- mode: erlang;erlang-indent-level: 4;indent-tabs-mode: nil -*-
%% ex: ft=erlang ts=4 sw=4 et
[
 %% Riak Core config
 {riak_core, [
              %% Default location of ringstate
              {ring_state_dir, "/var/lib/riak/ring"},

              %% http is a list of IP addresses and TCP ports that the Riak
              %% HTTP interface will bind.
              {http, [ {"0.0.0.0", 8098 } ]},

              %% https is a list of IP addresses and TCP ports that the Riak
              %% HTTPS interface will bind.
              %{https, [{ "127.0.0.1", 8098 }]},

              %% Default cert and key locations for https can be overridden
              %% with the ssl config variable, for example:
              %{ssl, [
              %       {certfile, "/etc/riak/cert.pem"},
              %       {keyfile, "/etc/riak/key.pem"}
              %      ]},

              %% riak_handoff_port is the TCP port that Riak uses for
              %% intra-cluster data handoff.
              {handoff_port, 8099 },

              %% To encrypt riak_core intra-cluster data handoff traffic,
              %% uncomment the following line and edit its path to an
              %% appropriate certfile and keyfile.  (This example uses a
              %% single file with both items concatenated together.)
              %{handoff_ssl_options, [{certfile, "/tmp/erlserver.pem"}]},

              %% Disable legacy vnode routing for 1.1.x and above clusters
              %% (use vnode proxies).  Only set true on mixed clusters.
              {legacy_vnode_routing, false},

              %% Platform-specific installation paths (substituted by rebar)
              {platform_bin_dir, "/usr/sbin"},
              {platform_data_dir, "/var/lib/riak"},
              {platform_etc_dir, "/etc/riak"},
              {platform_lib_dir, "/usr/lib64/riak"},
              {platform_log_dir, "/var/log/riak"}
             ]},

 %% Riak KV config
 {riak_kv, [
            %% Storage_backend specifies the Erlang module defining the storage
            %% mechanism that will be used on this node.
            {storage_backend, riak_kv_eleveldb_backend},

            %% pb_ip is the IP address that the Riak Protocol Buffers interface
            %% will bind to.  If this is undefined, the interface will not run.
            {pb_ip,   "0.0.0.0" },

            %% pb_port is the TCP port that the Riak Protocol Buffers interface
            %% will bind to
            {pb_port, 8087 },

            %% pb_backlog is the maximum length to which the queue of pending
            %% connections may grow. If set, it must be an integer >= 0.
            %% By default the value is 5. If you anticipate a huge number of
            %% connections being initialised *simultaneously*, set this number
            %% higher.
            %% {pb_backlog, 64},

            %% raw_name is the first part of all URLS used by the Riak raw HTTP
            %% interface.  See riak_web.erl and raw_http_resource.erl for
            %% details.
            %{raw_name, "riak"},

            %% mapred_name is URL used to submit map/reduce requests to Riak.
            {mapred_name, "mapred"},

            %% mapred_system indicates which version of the MapReduce
            %% system should be used: 'pipe' means riak_pipe will
            %% power MapReduce queries, while 'legacy' means that luke
            %% will be used
            {mapred_system, pipe},

            %% mapred_2i_pipe indicates whether secondary-index
            %% MapReduce inputs are queued in parallel via their own
            %% pipe ('true'), or serially via a helper process
            %% ('false' or undefined).  Set to 'false' or leave
            %% undefined during a rolling upgrade from 1.0.
            {mapred_2i_pipe, true},

            %% directory used to store a transient queue for pending
            %% map tasks
            %% Only valid when mapred_system == legacy
            %% {mapred_queue_dir, "/var/lib/riak/mr_queue" },

            %% Each of the following entries control how many Javascript
            %% virtual machines are available for executing map, reduce,
            %% pre- and post-commit hook functions.
            {map_js_vm_count, 8 },
            {reduce_js_vm_count, 6 },
            {hook_js_vm_count, 2 },

            %% Number of items the mapper will fetch in one request.
            %% Larger values can impact read/write performance for
            %% non-MapReduce requests.
            %% Only valid when mapred_system == legacy
            %% {mapper_batch_size, 5},

            %% js_max_vm_mem is the maximum amount of memory, in megabytes,
            %% allocated to the Javascript VMs. If unset, the default is
            %% 8MB.
            {js_max_vm_mem, 8},

            %% js_thread_stack is the maximum amount of thread stack, in megabyes,
            %% allocate to the Javascript VMs. If unset, the default is 16MB.
            %% NOTE: This is not the same as the C thread stack.
            {js_thread_stack, 16},

            %% Number of objects held in the MapReduce cache. These will be
            %% ejected when the cache runs out of room or the bucket/key
            %% pair for that entry changes
            %% Only valid when mapred_system == legacy
            %% {map_cache_size, 10000},

            %% js_source_dir should point to a directory containing Javascript
            %% source files which will be loaded by Riak when it initializes
            %% Javascript VMs.
            %{js_source_dir, "/tmp/js_source"},

            %% http_url_encoding determines how Riak treats URL encoded
            %% buckets, keys, and links over the REST API. When set to 'on'
            %% Riak always decodes encoded values sent as URLs and Headers.
            %% Otherwise, Riak defaults to compatibility mode where links
            %% are decoded, but buckets and keys are not. The compatibility
            %% mode will be removed in a future release.
            {http_url_encoding, on},

            %% riak_stat enables the use of the "riak-admin status" command to
            %% retrieve information the Riak node for performance and debugging needs
            {riak_kv_stat, true},

            %% When using riak_kv_stat, use the legacy routines for tracking
            {legacy_stats, true},

            %% Switch to vnode-based vclocks rather than client ids.  This
            %% significantly reduces the number of vclock entries.
            %% Only set true if *all* nodes in the cluster are upgraded to 1.0
            {vnode_vclocks, true},

            %% This option enables compatability of bucket and key listing
            %% with 0.14 and earlier versions. Once a rolling upgrade to
            %% a version > 0.14 is completed for a cluster, this should be
            %% set to false for improved performance for bucket and key
            %% listing operations.
            {legacy_keylisting, false},

            %% This option toggles compatibility of keylisting with 1.0
            %% and earlier versions.  Once a rolling upgrade to a version
            %% > 1.0 is completed for a cluster, this should be set to
            %% true for better control of memory usage during key listing
            %% operations
            {listkeys_backpressure, true}
           ]},

 %% Riak Search Config
 {riak_search, [
                %% To enable Search functionality set this 'true'.
                {enabled, true}
               ]},

 %% Merge Index Config
 {merge_index, [
                %% The root dir to store search merge_index data
                {data_root, "/var/lib/riak/merge_index"},

                %% The root dir to store secondary index merge_index data
                {data_root_2i, "/var/lib/riak/merge_index_2i"},

                %% Size, in bytes, of the in-memory buffer.  When this
                %% threshold has been reached the data is transformed
                %% into a segment file which resides on disk.
                {buffer_rollover_size, 1048576},

                %% Overtime the segment files need to be compacted.
                %% This is the maximum number of segments that will be
                %% compacted at once.  A lower value will lead to
                %% quicker but more frequent compactions.
                {max_compact_segments, 20}
               ]},

 %% Bitcask Config
 {bitcask, [
             {data_root, "/var/lib/riak/bitcask"}
           ]},

 %% eLevelDB Config
 {eleveldb, [
             {data_root, "/var/lib/riak/leveldb"}
            ]},

 %% Lager Config
 {lager, [
            %% What handlers to install with what arguments
            %% The defaults for the logfiles are to rotate the files when
            %% they reach 10Mb or at midnight, whichever comes first, and keep
            %% the last 5 rotations. See the lager README for a description of
            %% the time rotation format:
            %% https://github.com/basho/lager/blob/master/README.org
            %%
            %% If you wish to disable rotation, you can either set the size to 0
            %% and the rotation time to "", or instead specify a 2-tuple that only
            %% consists of {Logfile, Level}.
            {handlers, [
                {lager_console_backend, info},
                {lager_file_backend, [
                    {"/var/log/riak/error.log", error, 10485760, "$D0", 5},
                    {"/var/log/riak/console.log", info, 10485760, "$D0", 5}
                ]}
            ]},

            %% Whether to write a crash log, and where.
            %% Commented/omitted/undefined means no crash logger.
            {crash_log, "/var/log/riak/crash.log"},

            %% Maximum size in bytes of events in the crash log - defaults to 65536
            {crash_log_msg_size, 65536},

            %% Maximum size of the crash log in bytes, before its rotated, set
            %% to 0 to disable rotation - default is 0
            {crash_log_size, 10485760},

            %% What time to rotate the crash log - default is no time
            %% rotation. See the lager README for a description of this format:
            %% https://github.com/basho/lager/blob/master/README.org
            {crash_log_date, "$D0"},

            %% Number of rotated crash logs to keep, 0 means keep only the
            %% current one - default is 0
            {crash_log_count, 5},

            %% Whether to redirect error_logger messages into lager - defaults to true
            {error_logger_redirect, true}
        ]},

 %% riak_sysmon config
 {riak_sysmon, [
         %% To disable forwarding events of a particular type, use a
         %% limit of 0.
         {process_limit, 30},
         {port_limit, 2},

         %% Finding reasonable limits for a given workload is a matter
         %% of experimentation.
         {gc_ms_limit, 100},
         {heap_word_limit, 40111000},

         %% Configure the following items to 'false' to disable logging
         %% of that event type.
         {busy_port, true},
         {busy_dist_port, true}
        ]},

 %% SASL config
 {sasl, [
         {sasl_error_logger, false}
        ]},

 %% riak_control config
 {riak_control, [
                %% Set to false to disable the admin panel.
                {enabled, false},

                %% Authentication style used for access to the admin
                %% panel. Valid styles are 'userlist' <TODO>.
                {auth, userlist},

                %% If auth is set to 'userlist' then this is the
                %% list of usernames and passwords for access to the
                %% admin panel.
                {userlist, [{"user", "pass"}
                           ]},

                %% The admin panel is broken up into multiple
                %% components, each of which is enabled or disabled
                %% by one of these settings.
                {admin, true}
                ]}
].


________________________________
The information contained in this electronic mail transmission is intended only for the use of the individual or entity named in this transmission. If you are not the intended recipient of this transmission, you are hereby notified that any disclosure, copying or distribution of the contents of this transmission is strictly prohibited and that you should delete the contents of this transmission from your system immediately. Any comments or statements contained in this transmission do not necessarily reflect the views or position of GSI Commerce, Inc. or its subsidiaries and/or affiliates.

_______________________________________________
riak-users mailing list
riak-users at lists.basho.com<mailto:riak-users at lists.basho.com>
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com





More information about the riak-users mailing list