Riak behavior

Kirill A. Korinskiy catap+riak at catap.ru
Wed Dec 2 06:10:00 EST 2009


Hi Justin,

thank you for you response.

At Tue, 1 Dec 2009 12:45:19 -0500,
Justin Sheehy <justin at basho.com> wrote:
> 
> > I have a test Riak cluster of 10 nodes. For storage backend I use
> > riak_fs_backend.
> 
> I would start by suggesting a different backend.  Of the open source
> backends currently available, riak_dets_backend is the best choice for
> many people.

At the time of writing to riak_dets_backend occurs system crash it
could lead to the destroy dets tables, right? I choice riak_fs_backend
only for save a data in the time of system crash.

> The riak_fs_backend is really intended as the first demo code of
> writing a backend, and is also useful for certain testing scenarios,
> but is not really the right thing for production use.  We should
> better document this fact about riak_fs_backend, and I will make
> sure that we do so.

When do you want to use another opernsource backends
(riak_osmos_backend/riak_gb_trees_backend)?

> On to each of your issues...
> 
> > 1) The data when riak_fs_backend is in use is not written atomically
> > to the file system. That is, a file is written directly to the right
> > place, which could lead to a partial data written during the system
> > crash.  Respectively, after system restart, the file system will
> > appear inconsistent.
> 
> Yes, this is a large part of what I meant above about the fs backend.
> The other big downside of fs_backend is that it is quite slow when it
> contains a large amount of data.  I recommend switching backends.
> 

JFYI: implement a atomically write http://bitbucket.org/catap/riak/changeset/18d9d98b054b/


> > 2) With the active use of Riak very quickly starts to be blocked by
> > IO. If you add +A 32 to the erl's command line options, it gets
> > better. Have you tried riak_fs_backend in high load setting? Do you
> > have any additional recommendations?
> 
> We don't use the fs backend in any high load settings, but the dets
> backend is currently in use at some customer locations that see a
> reasonable amount of load and it performs satisfactorily for them.
> 

How many data you store per one node in some customers with dets
backend? 100M pairs of key-value? 25Gb data?

> None of these requests timed out, even when 2/3 of the nodes were shut down..
> 

yes I try my test case of rev 1e7fdd78e996 and all is ok. Thanks!

> > 5) I started a simple experiment on the 10 nodes, using the fs
> > backend.
> 
> I am not sure what was going on in your experiment, but we have
> regularly verified that this variety of hinted handoff will in fact
> store a replica of the updated document on another node and will also
> (after some delay following the ideal node rejoining) transfer that
> document to the ideal node.
> 

What a typical delay? Maybe my wait too small?

> >  5.1) Where the data is getting saved when one of the "ideal" nodes is
> >  not available?
> 
> Typically, on the next vnode on a unique node along the ring after the
> ideal ones.  In the case where less than N physical nodes are present,
> it will not be a unique node.
> 

What do you meant "on the next vnode on a unique node along the ring
after the ideal ones"?


> >  5.2) According to the experiments, the data gets updated only when it
> >  is accessed via an API, directly. The data folders are not
> >  synchronized automatically when the "ideal" node being down becomes
> >  up again.
> 
> I'm not sure what to say without being involved in that experiment,
> but that we observe that data updated when an ideal node is
> unreachable is correctly stored on a fallback node, and later
> transferred.  This is accomplished via the vnode-to-vnode merkle
> exchange a little while after the ideal node has become visible again.
> 

How many time required for transferred data to ideal node?

> Also, if you use nearly any backend besides fs_backend, the
> performance will tend to be quite different.
> 

Do you have any recommendations for choice backends?

-- 
wbr, Kirill




More information about the riak-users mailing list