full-writes vs. merges

Paul Rogers riak at dingosky.com
Sat Oct 31 17:33:32 EDT 2009


On Sun, Oct 25, 2009 at 11:21 PM, Brian Hammond <brian at brianhammond.com 
 > wrote:
 >> If I want to update the links associated with some stored value, I  
have to
 >> first read the value, update the links, and write back the entire  
value.
 >>  I'm concerned if that my data has "large" values that this
 >> read-update-write technique will cause a fair amount of overhead.
 >>
 >> I am wondering if there's any merit to this concern in practice.   
I suppose
 >> the answer is dependent on one's definition of large, network  
configuration,
 >> etc.  Anyway, perhaps you guys can keep it in the back of your  
head that
 >> perhaps CRUD operations on links might be useful.
 >>
 >> More generically, perhaps some form of "JSON object *merging*"  
would be in
 >> order.
 >
...
 >Hi, Brian.  I see exactly where you're coming from.  The main problem,
 >though, is the need to maintain vclocks so causality can be determined
 >in the case of conflicting data.  This means that you *have* to
 >perform a read so you know what vclock to use when you perform your
 >subsequent write.
 >
 >I can think of two ways to avoid the large-data overhead, though.
 >

I'm new to Riak and very non-authoritative, but I believe a third  
option would
be to use the schema facility on a bucket by restricting the read mask  
on
existing data. I've done this as an abstraction in my Ruby RiakRest  
gem in the
following manner, but obviously the Ruby layer is just managing the  
interaction
and this could be done using any of the Jiak language bindings.

What I've done is create a concept I call resource point-of-view.  
Multiple Ruby
resource classes can actually front the same Jiak data. Each resource  
class
sets it view, which establishes the current schema on the bucket. By  
creating
two views, say Full and Links, you can interact with the data in the  
full
view and with just the links in the links view.

Here's the Ruby code:

#------------------------------------------------------------------------------
require 'riakrest'
include RiakRest

# Resource class with 10 fields :f0,...,:f9
class Full
   include JiakResource
   server       'http://localhost:8002/jiak'
   group        'fields'
   data_class   JiakDataHash.create (0...10).map {|n| "f#{n}".to_sym}
   auto_post    true
   auto_update  true
end

# copy of above resource class, but no read/write fields, i.e., only  
links
LinksOnly = JiakDataHash.create Full.schema
LinksOnly.readwrite []
Links = Full.copy(:data_class => LinksOnly)

# populate two Full resources with (meaningless) stuff
Full.pov
full1,full2 =
   ["full1","full2"].map {|o| Full.new(Full.schema.write_mask.inject 
({}) do |h,f|
                                         h[f]="#{o.upcase}-#{f.hash}"
                                         h
                                       end)}

# switch to Links pov and create a link to the full2 resource
Links.pov
links1 = Links.get(full1.jiak.key)
links1.link(full2,'link')

# back to the Full pov to make some data changes
Full.pov
full2.f1 = "new f1"

# get the linked resource and show it has the same change as above.
linked = full1.query(Full,'link')[0]
puts linked.f1 == full2.f1                        # => true

# clean up
full1.delete
full2.delete
#------------------------------------------------------------------------------

Using the Full pov, two resources, full1 and full2, are stored with  
meaningless
data in each of their 10 data fields. Then, using the Links pov, the  
Jiak
resource for full1 is retrieved as links1 with none of the data being  
pulled
over since the read_mask is empty (see below) and a link is  
established to the
full2 resource. Since the write_mask is also empty, the full1 data  
sent back to
the Jiak server to store the link is also empty. So in the Links pov,  
no data
values are transfered, just bucket name, key, empty data, links, and  
riak
context.

Now, just to show the data is still usable in Full form, I switched  
back to the
Full pov and changed the full2 f1 data value. I used a query on full1  
to get the
linked full2 resource (to show the link is there; I actually already  
have a
full2 resource) and show the changed value is the same on each resource.

To show the Links pov Jiak exchange does not include data fields, I  
captured
four Jiak JSON interactions (Rubyized)

Full pov full1
   {"object": 
{"f0 
":"FULL1 
-960744 
","f1 
":"FULL1 
-960968 
","f2 
":"FULL1 
-961192 
","f3 
":"FULL1 
-961416 
","f4 
":"FULL1 
-961640 
","f5 
":"FULL1 
-961864 
","f6 
":"FULL1 
-962088 
","f7 
":"FULL1 
-962312 
","f8 
":"FULL1 
-962536 
","f9 
":"FULL1 
-962760 
"},"vclock 
":"a85hYGBgzGDKBVIsjFmf6zOYEhnzWBlCmRKP8GUBAA==","lastmod":"Sat, 31  
Oct 2009 18:47:49  
GMT 
","vtag 
":"5T5Ct9pW8kkVgpmAfkd16v 
","bucket":"fields","key":"4ke3FMCAhPGaOiflNRVSWhIKHnD","links":[]}

Linked pov links1. Note the object data is empty but the key, bucket,  
links,
and riak context are the same as above.
   {"object": 
{},"vclock 
":"a85hYGBgzGDKBVIsjFmf6zOYEhnzWBlCmRKP8GUBAA==","lastmod":"Sat, 31  
Oct 2009 18:47:49  
GMT 
","vtag 
":"5T5Ct9pW8kkVgpmAfkd16v 
","bucket":"fields","key":"4ke3FMCAhPGaOiflNRVSWhIKHnD","links":[]}

Linked pov links1 with the link.
   {"object":{},"vclock":"a85hYGBgymDKBVIszNOiOjKYEhnzWBl4mBOP8EGFGbM 
+10OFQ5mAwlkA","lastmod":"Sat, 31 Oct 2009 18:50:52  
GMT 
","vtag 
":"6UbnN3jtsJAKW0mGwXiwBR 
","bucket":"fields","key":"4ke3FMCAhPGaOiflNRVSWhIKHnD","links": 
[["fields","Nk8cZl6bu6kCsC56ylIGYH9iY5","link"]]}

Full pov full1. Same as the first but now with the link included in  
the links.
   {"object": 
{"f5 
":"FULL1 
-961864 
","f0 
":"FULL1 
-960744 
","f6 
":"FULL1 
-962088 
","f1 
":"FULL1 
-960968 
","f7 
":"FULL1 
-962312 
","f2 
":"FULL1 
-961192 
","f8 
":"FULL1 
-962536 
","f3 
":"FULL1 
-961416 
","f9 
":"FULL1 
-962760 
","f4 
":"FULL1 
-961640 
","f0 
":"FULL1 
-960744 
","f1 
":"FULL1 
-960968 
","f2 
":"FULL1 
-961192 
","f3 
":"FULL1 
-961416 
","f4 
":"FULL1 
-961640 
","f5 
":"FULL1 
-961864 
","f6 
":"FULL1 
-962088 
","f7 
":"FULL1 
-962312 
","f8 
":"FULL1 
-962536 
","f9 
":"FULL1-962760"},"vclock":"a85hYGBgymDKBVIszNOiOjKYEhnzWBl4mBOP8EGFGbM 
+10OFQ5mAwlkA","lastmod":"Sat, 31 Oct 2009 18:50:52  
GMT 
","vtag 
":"6UbnN3jtsJAKW0mGwXiwBR 
","bucket":"fields","key":"4ke3FMCAhPGaOiflNRVSWhIKHnD","links": 
[["fields","Nk8cZl6bu6kCsC56ylIGYH9iY5","link"]]}

RiakRest abstracts much of the interaction with a Jiak server. The pov  
calls
above set the bucket on the Jiak server using the schema maintained by
JiakResource classes. And auto_post and auto_update being true means the
resources are posted when created and updated whenever a data field or  
link
changes.

Again, none of this requires Ruby, just manipulation of the bucket
schemas, and in particular, the read_mask and write_mask.

Paul

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20091031/311dee1b/attachment-0002.html>


More information about the riak-users mailing list