how to think?

Bryan Fink bryan.fink at gmail.com
Thu Sep 3 23:29:53 EDT 2009


On Fri, Aug 28, 2009 at 3:27 AM, etnt<etnt at redhoterlang.com> wrote:
> Being used to Mnesia tables and transactions, I'm having trouble
> getting my head around how I should structure my DB if I were
> to migrate it to something like Riak.
>
> I guess other people must have been in the same situation, so it
> would be great if anyone could share some thoughts on this.

I'm doing the same translation on BeerRiot.  BR is currently running
on Mnesia: one table for beers, one for breweries, one for users...  I
haven't finished the translation yet, but so far I've kept the object
divisions the same: one Riak bucket for beers, one for breweries...

The biggest difference between Mnesia tables and Riak buckets I see
(ignoring transactions for the moment) is that Mnesia provides a
dedicated method for retrieving all objects in a table matching some
specification, while directly translating the same query to Riak would
require code that more explicitly examined each object in a bucket.
I've adapted not by direct translation, but instead by converting the
concepts described by many of my Mnesia match specs into "links" among
Riak objects.

For instance, finding "approved" Comments on a Beer involves asking
Mnesia for something like #comment{beer_id=BeerId, state=approved}.
In Riak, though, it works better to keep a list of Comment IDs in the
Beers they relate to, tagged with their state, and ask for a mapreduce
from {beer, BeerId} to {link, comment, approved, ...}.

The starting points can take some work.  For some objects it's obvious
- start at the object for the user making the request, and walk from
there to get objects that user has touched.  But, for other queries,
like "site home" (or "all beers" in BeerRiot's case), some additional
tracking system will be needed (I'll be using a combination of
well-known-named "index" objects and temporary ets tables on BR).

(Aside: actually, I was doing some of this linking stuff in Mnesia
already anyway.  It's quite slick to run a listcomp full of {Table,
Id} tuples through an mnesia:read.  Taking that intermediate step can
be a good way to experiment with existing Mnesia data.)

Now, about transactions -- my shortest answer: forget about
transactions, and start thinking about "merging changesets".

The approach I'm taking in BeerRiot is to "just write the
modification", without worrying about whether things have changed
between now and when I last fetched the object.  The reason I can do
this is Riak's vclocks: they allow Riak to compare two versions of an
object and determine whether one "descended" from the other.  That is,
whether one was a subsequent modification to the other.

If I store an object that descends from one already stored, Riak
simple drops the old version.  But, when Riak finds that I've stored
two different versions of an object, and that neither is a descendant
of the other, it keeps them both.  The next time I ask for that
object, Riak will hand me both versions.  At this point, I can decide
what the proper "merged" object should look like.

Merging is an interesting problem.  It can be simple (random choice,
latest timestamp, lower user-id preferred, etc.) or difficult (think
reimplementation of darcs), or anything in between - the application
gets to decide.  I've had success with storing a small bit of metadata
about what modifications were made to "this" version of the object, in
terms of set-field, add-list-element, and remove-list-element.  For
example, if I see that v1a set 'name' to 'foo' and v1b set 'size' to
'large', then v1merge should have both name=foo and size=large
(conflicting changes obviously have a different strategy).  For simple
data structures, and in cases where the Riak cluster is not
*seriously* degraded (lots of nodes down, extreme network partition,
etc.), this should be plenty to deal with the occasional concurrent
modification, in my use cases.

By the way, all of this merge talk assumes that the bucket property
'allow_mult' has been set to 'true'.  If 'allow_mult' is 'false' (as
it is by default), a last-write-wins merge scheme is imposed before
the client is given the object.

To recap: my advice on "how to think" is to focus on the links among
objects, and to devise a system for merging changesets.  (Imagine a
big disclaimer here: this is only advice distilled from my one Mnesia
-> Riak translation.  I'm absolutely positive I haven't covered all
bases, and that there are many better solutions than what I've
proposed. ;)

-Bryan

(P.S. I know we plan on discussing some strategies we've used at Basho
in this thread as well ... now that we can all take a deep breath
after releasing 0.4.)




More information about the riak-users mailing list