Riak behavior

Kirill A. Korinskiy catap+riak at catap.ru
Mon Nov 30 16:24:59 EST 2009


Hi

I have a test Riak cluster of 10 nodes. For storage backend I use
riak_fs_backend. Example of a typical config:

    {cluster_name, "default"}.
    {ring_state_dir, "/data/riak/priv/ringstate"}.
    {ring_creation_size, 100}.
    {gossip_interval, 60000}.
    {doorbell_port, 9000}.
    {storage_backend, riak_fs_backend}.
    {riak_fs_backend_root, "/data/riak-data"}.
    %{riak_cookie, riak_jskit_cookie}.
    {riak_heart_command, "(cd /data/riak; ./start-restart.sh /data/riak/config/riak.erlenv)"}.
    {riak_nodename, "riak"}.
    %{riak_hostname, "127.0.0.1"}.
    {riak_web_ip, "127.0.0.1"}.
    {riak_web_port, 9980}.
    {jiak_name, "jiak"}.

In a series of tests and attempts to use Riak, some aspects of riak
behavior aren't clear for me. Maybe you can comment on them.

1) The data when riak_fs_backend is in use is not written atomically
to the file system. That is, a file is written directly to the right
place, which could lead to a partial data written during the system
crash.  Respectively, after system restart, the file system will
appear inconsistent.

2) With the active use of Riak very quickly starts to be blocked by
IO. If you add +A 32 to the erl's command line options, it gets
better. Have you tried riak_fs_backend in high load setting? Do you
have any additional recommendations?

3) In riak_vnode_sidekick changes its state from not_home to
home. Under what conditions does it happen?

4) If understand the idea correctly, before you insert data into a
cluster, you need to make sure that there is no such information, or
to update it. But for Riak to return the {error, notfound} answer, is
awaits such a response from N-R+1 nodes. If we have N=3, R=1 and one
of the nodes goes offline, it results in {error, timeout}. What is the
expected behavior?

A simple test case:

[catap at satellite] cat config/riak-A.erlenv
{cluster_name, "default"}.
{ring_state_dir, "priv/riak-A/ringstate/"}.
{ring_creation_size, 16}.
{gossip_interval, 60000}.
{doorbell_port, 9000}.
{storage_backend, riak_ets_backend}.
{riak_cookie, riak_demo_cookie}.
{riak_heart_command, "(cd /home/catap/src/riak; ./start-restart.sh /home/catap/src/riak/config/riak-A.erlenv)"}.
{riak_nodename, "riak-A"}.
{riak_hostname, "127.0.0.1"}.
{riak_web_ip, "127.0.0.1"}.
{riak_web_port, 8098}.
{jiak_name, "jiak"}.
{default_bucket_props, [{n_val,2},
                        {allow_mult,false},
                        {linkfun,{modfun, jiak_object, mapreduce_linkfun}},
                        {chash_keyfun, {riak_util, chash_std_keyfun}},
                        {old_vclock, 86400},
                        {young_vclock, 21600},
                        {big_vclock, 50},
                        {small_vclock, 10}]}.
[catap at satellite] cat config/riak-B.erlenv
{cluster_name, "default"}.
{ring_state_dir, "priv/riak-B/ringstate/"}.
{ring_creation_size, 16}.
{gossip_interval, 60000}.
{doorbell_port, 9000}.
{storage_backend, riak_ets_backend}.
{riak_cookie, riak_demo_cookie}.
{riak_heart_command, "(cd /home/catap/src/riak; ./start-restart.sh /home/catap/src/riak/config/riak-B.erlenv)"}.
{riak_nodename, "riak-B"}.
{riak_hostname, "127.0.0.1"}.
{riak_web_ip, "127.0.0.1"}.
{riak_web_port, 8098}.
{jiak_name, "jiak"}.
{default_bucket_props, [{n_val,2},
                        {allow_mult,false},
                        {linkfun,{modfun, jiak_object, mapreduce_linkfun}},
                        {chash_keyfun, {riak_util, chash_std_keyfun}},
                        {old_vclock, 86400},
                        {young_vclock, 21600},
                        {big_vclock, 50},
                        {small_vclock, 10}]}.
[catap at satellite] ./start-fresh.sh config/riak-A.erlenv
[catap at satellite] ./start-join.sh config/riak-B.erlenv riak-A at 127.0.0.1
[catap at satellite] erl -pa ebin -name riak-test at 127.0.0.1 -setcookie riak_cookie
Erlang R13B02 (erts-5.7.3) [source] [smp:2:2] [rq:2] [async-threads:0] [kernel-poll:false]

Eshell V5.7.3  (abort with ^G)
(riak-test at 127.0.0.1)1> {ok, A} = riak:client_connect('riak-A at 127.0.0.1').
{ok,{riak_client,'riak-A at 127.0.0.1',<<1,235,57,194>>}}
(riak-test at 127.0.0.1)2> A:get(<<"Table">>, <<"Key">>, 1).
{error,notfound}
(riak-test at 127.0.0.1)3> rpc:call('riak-B at 127.0.0.1', init, stop, []).           
ok
(riak-test at 127.0.0.1)4> A:get(<<"Table">>, <<"Key">>, 1).
{error,timeout}
(riak-test at 127.0.0.1)6>


5) I started a simple experiment on the 10 nodes, using the fs
backend. I put object in the cluster and look in /data/riak-data on
all nodes to see that data appeared to node1, node2 and node6. I saved
the data-file that was created with the data to my home. Next, I
killed node6 and updated the object. I looked at the nodes and didn't
see the data on the nodes other then node1, node2. Next, I compared
the changed data-file with the file on node1/node2 and the data-file
from my home. Diff said that the files are different (as
expected). Then I joined the node6 and compared its data shortly
thereafter. Diff found no differences between the nod6 data and the
data file in my home.  Then I issued the get command to obtin the
object and indeed node6 got an updated file, different from the one I
saved earlier.

Now, questions:
 5.1) Where the data is getting saved when one of the "ideal" nodes is
 not available?
 5.2) According to the experiments, the data gets updated only when it
 is accessed via an API, directly. The data folders are not
 synchronized automatically when the "ideal" node being down becomes
 up again.

6) In riak_vnode_sidekick are two things that will produce a very high
load on the nodes at unexpected intervals. I'm referring to
gen_server2:call(VNode, list, 60000) and make_merk(VNode, Idx,
ObjList).  How do you control this load? If this piece of code will
encounter the large amount of files, you can lose a node for a few
hours or days. Even a few tens of thousand object (files on the file
system, objects in the ets (?)) will induce unfair load to the system
at some point.

-- 
wbr, Kirill




More information about the riak-users mailing list