Number of replica for Luwak

Bryan Fink bryan at basho.com
Wed Feb 23 15:01:51 EST 2011


On Wed, Feb 23, 2011 at 10:40 AM, Les Mikesell <lesmikesell at gmail.com> wrote:
> On 2/23/2011 9:11 AM, Bryan Fink wrote:
>>
>> On Fri, Feb 18, 2011 at 8:54 PM, Les Mikesell<lesmikesell at gmail.com>
>>  wrote:
>>>
>>> What happens if there is a read of the object while it is in the process
>>> of
>>> being updated if the update is several different operations?
>>
>> Luwak streams work in an "all or nothing" fashion.  That is, no read
>> will see the result of any stream until that stream is flushed.  Luwak
>> blocks are immutable, so old file trees will still reference
>> completely valid old blocks while new ones are being written.  The
>> last action of flushing a stream is to point the file-metdata object
>> (in the luwak_tld bucket) at the head of the new tree.
>>
>> A flush will only occur when a stream closes, unless your program
>> explicitly calls luwak_put_stream:flush/1.
>
>
> Thanks!  A couple more somewhat related questions: is that atomic update
> nature hard to duplicate outside of luwak (say by a client that needs to
> keep several items in sync), and if the luwak blocks are immutable, how do
> you ever clean up the space used by data that has been deleted or modified
> and no longer referenced?

(Ryan Zezeski sent correct answers before I could finish this, but I'm
sending anyway, with hopefully extra information.)

Well, these two behaviors are partially related.

It's easy to duplicate this behavior: write the new versions of your
items, without removing the old versions, then when you're finished,
replace the object that says which version of those items is the
latest.  It's akin to the old filesystem trick of writing out a new
file, then using 'rename' to move it in place of the old one.  (In
reference to your followup email, yes, Luwak accomplishes this by
effectively (tree-wise) putting all the keys in one object, which is
updated last.)

But, you've hit one one of Luwak's major specializations: it was
originally designed for immutable data, and so it does nothing about
cleaning up unreferenced blocks.  At this point, it's a distributed
online garbage collection problem that we haven't written a solution
for yet.  If you can pause all updates to Luwak, and be sure that the
data is stable (i.e. no conflicts hidden by unreachable nodes), it's
relatively simple to mark&sweep the luwak_node bucket, based on
pointers from the luwak_tld bucket.  There's even some history (look
at the "ancestors" property of the file object) that might help out.
But, (as both you and Ryan figured out) doing this live involves much
more bookkeeping to keep track of not only blocks shared between
files, but also blocks that are not linked solely because a stream
hasn't finished flushing yet.

/me takes down a note to review Ryan's GC experiments

-Bryan




More information about the riak-users mailing list