Clarifying "Read-before-Write"

Andres Jaan Tack andres.jaan.tack at eesti.ee
Sat Nov 26 08:19:40 EST 2011


Thanks! That explanation is perfect. I guess should have taken a look at
some of the other clients as an example in the first place.

Now I have something to fix for Riak-Cpp. :)

--
Andres

2011/11/26 Russell Brown <russelldb at basho.com>

>
> On 26 Nov 2011, at 01:14, Andres Jaan Tack wrote:
>
> So I was just reading and thinking about this, and I don't understand the
> advice offered under "Read-before-Write" at
> http://wiki.basho.com/Client-Implementation-Guide.html.
>
> "Riak will return an encoded vector clock<http://wiki.basho.com/Vector-Clocks.html>
>>  with every "fetch" or "read" request that does not result in a "not
>> found" response. In addition to the Client ID, this vector clock tells Riak
>> how to resolve concurrent writes, essentially representing the "last seen"
>> version of the object to which the client made modifications. In order to
>> prevent sibling explosion<http://wiki.basho.com/Vector-Clocks.html#Sibling-explosion>,
>> clients should always have a vector clock before sending a write, and send
>> the vector clock as part of the write request. Therefore, it is essential
>> that keys are fetched before being written (except in the case where Riak
>> selects the key or there is *a priori* knowledge that the key is new).
>> Client libraries that make this automatic will reduce operational issues by
>> limiting sibling explosion. Clients may also choose to perform automatic Sibling
>> Resolution<http://wiki.basho.com/Client-Implementation-Guide.html#Sibling-Resolution>
>>  on read."
>
>
> I'm having trouble understanding the advice. I get that if I'm aware of
> all the siblings, I can resolve them (optionally) with that vector clock.
> What I don't understand here: If an application PUTs to an object out of
> the blue, not having read it first, should the client library
> read-before-write?
>
>
> Yes it should.
>
> This seems like a great way to blow away siblings by accident.
>
>
> But it should never do that, if siblings are encountered, it should *do*
> something.
>
> Or is the point rather to avoid sibling explosion for applications that
> don't care about losing information?
>
>
> A well behaved client library will not blindly PUT a value "over the top"
> of siblings, but will push the problem to the library user (hopefully in
> some helpful way, like automatically applying some domain specific
> resolution logic.)
>
> So, in the case of the Java client, when you store (or fetch for that
> matter) you must provide an implementation of the ConflictResolver<T>
> interface to the client, this will then be executed to resolve any siblings
> on the pre-store fetch. If you don't provide a conflict resolver the Java
> client uses one that throws a runtime exception when it encounters siblings
> on fetch, exactly so that you don't do as you describe, and blow away
> potentially meaningful sibling values.
>
> Maybe the wording on the wiki should make this clearer, maybe it should
> read:
>
> "Clients [that automatically fetch before store] _must_ chose to either
> perform automatic Sibling Resolution *or* abort the write and notify the
> presence of siblings to the caller"
>
> It is a thorny issue, please let me know if I've answered your question
> adequately.
>
> Cheers
>
> Russell
>
>
> --
> Andres
> _______________________________________________
> riak-users mailing list
> riak-users at lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20111126/c5574c53/attachment.html>


More information about the riak-users mailing list