Riak behavior

Justin Sheehy justin at basho.com
Wed Dec 2 17:21:20 EST 2009


Hi Kirill,

On Wed, Dec 2, 2009 at 6:10 AM, Kirill A. Korinskiy <catap+riak at catap.ru> wrote:

> At the time of writing to riak_dets_backend occurs system crash it
> could lead to the destroy dets tables, right? I choice riak_fs_backend
> only for save a data in the time of system crash.

In our experience, dets crash repair works reasonably well.  And of
course you'd have to ruin all of your replicas to actually lose data.

> When do you want to use another opernsource backends
> (riak_osmos_backend/riak_gb_trees_backend)?

The osmos backend is a reasonable general-purpose persistent backend.
You should be able to use it happily instead of fs or dets.  You will
have to build the 3rd party osmos package yourself before it will
work, but osmos is a very easy build out of the box.
(http://code.google.com/p/osmos/)

The gb_trees backend is like ets_backend in that it is in-memory only.
 You can use it in roughly the same situations; their performance
characteristics are a bit different but they are largely
interchangeable.

We are also working on determining which of our various other backends
(based on inno, localmemcache, and boost's persistent hashes) to
prepare for open source release.

> JFYI: implement a atomically write http://bitbucket.org/catap/riak/changeset/18d9d98b054b/

Yes: I will integrate either this change or something very like it in
the near future.

> yes I try my test case of rev 1e7fdd78e996 and all is ok. Thanks!

Excellent, I am glad to hear it.

>> I am not sure what was going on in your experiment, but we have
>> regularly verified that this variety of hinted handoff will in fact
>> store a replica of the updated document on another node and will also
>> (after some delay following the ideal node rejoining) transfer that
>> document to the ideal node.
>>
>
> What a typical delay? Maybe my wait too small?

The timeouts are intentionally long (and started independently) so
that in the case of a large network partition the nodes are not likely
to all do handoff at once.  A minute would not be surprising.

>> Typically, on the next vnode on a unique node along the ring after the
>> ideal ones.  In the case where less than N physical nodes are present,
>> it will not be a unique node.
>>
>
> What do you meant "on the next vnode on a unique node along the ring
> after the ideal ones"?

Sorry -- by "vnode" I mean the process on a Riak node that owns a
given ring partition.  So, the node that would have held the
additional replica if your N value was 1 higher will get the first
handoff.

I hope that this clarifies.  If not I can write a fuller explanation soon.

Best,

-Justin




More information about the riak-users mailing list