update document

Bryan Fink bryan at basho.com
Tue Nov 3 09:40:16 EST 2009

On Mon, Nov 2, 2009 at 11:58 AM, francisco treacy
<francisco.treacy at gmail.com> wrote:
> I am developing a Scala library for Riak while I learn more about this
> datastore.
> I have covered storing/fetching documents, so far so good. But when I
> try to 'update' a document I am noticing a behaviour I didn't expect:
> As an example, when I execute this Ruby code:
> client = JiakClient.new('localhost', 8098)
> b = {'key' => "key", 'bucket' => "test", :links => [], 'object' => {
> :my => "json2" }}
> c = {'key' => "key", 'bucket' => "test", :links => [], 'object' => {
> :my => "json3" }}
> client.store b
> client.store c
> r = client.fetch 'test', 'key'
> puts r['object']['my']
> the output is always "json2", where I would normally expect "json3".
> However, immediately after I do:
> curl -X PUT http://localhost:8098/jiak/test/key -H "Content-Type:
> application/json" --data "{\"bucket\":\"test\", \"key\":\"key\",
> \"object\":{\"my\":\"json4\"}, \"links\":[]}"
> curl http://localhost:8098/test/key
> ...and the result is "json4", which seems fine. (If I execute the Ruby
> code again, I get "json2").
> So I guess my question is... what is going on here?  Why doesn't it
> store the object with "json3"?
> Looks like it can't cope with subsequent updates, but is that tied to
> the fact of having vclocks or something to do with the N/R/W values?

Hi, Francisco.  Indeed, this does have to do with vclocks.  Put
simply, because 'c' doesn't contain a vclock, Riak can't tell that it
*is* a subsequent update.

When Riak can't tell (via vclocks) that a write descends from the
value that's already in place, it stores the new value as a "sibling"
to the existing value, instead of overwriting it.  At read time, if
the 'allow_mult' bucket property is set to 'false', Riak will choose
one of these sibling values, instead of handing them all to you.  If
'allow_mult' is set to true, both values are given to the Jiak layer
for merging.

In either the allow_mult=false or the default-Jiak-merge case, an
attempt is made to choose the "latest" value by comparing the
last-modified-time of each value.  Unfortunately for your example
case, last-modified-time only has second resolution, so there's a
pretty good chance that the earlier value will be chosen if the writes
happened very close together (i.e. in the same second).

This is also the reason that your third PUT, in curl-command form,
"overwrote" the old value.  It was at least a second later, so the
timestamp made it obvious which value to choose.  If you were to pause
between the Ruby client.store calls, you wouldn't see the issue.

The real fix is to not build your second write without a vclock.  What
you really want is:

client = JiakClient.new('localhost', 8098)
b = {'key' => "key", 'bucket' => "test", :links => [], 'object' => {
:my => "json2" }}
c = client.store b
c['object']['my'] = "json3"
client.store c
r = client.fetch 'test', 'key'
puts r['object']['my']

That code should print out "json3" every time.  The value of 'c' will
include a vclock telling Riak that this value descends from the old
value, and should therefore replace it.  No confusion with
last-modified-time necessary.

The behavior of 'allow_mult' will change slightly in the next release
of Riak, making this simple case less surprising, but that won't help
you in more complex, live-system, distributed cases.  Only proper
vclock management can help you there.


More information about the riak-users mailing list