Schema design - version history and time travel

Mark Phillips mark at basho.com
Tue Aug 28 02:11:29 EDT 2012


Hi Patrik,

Sorry for the late response here.

On Fri, Aug 17, 2012 at 9:37 AM, Patrik Sundberg
<patrik.sundberg at gmail.com> wrote:
> Hi,
>
> I'll simplify the case to something easier to follow. The typical question I
> have is: find piece of data X as of time Y. A piece of data X has a start
> time and end time, can think of as positive integers (that I could have 2i
> indices for). I'm trying to find the version of X whose start and end time
> integer range includes the integer Y.
>
> I don't see how I make the query with 2i, and doing it via Search seems
> wrong since don't need the overhead of converting to text etc. Do I need to
> do a 2step procedure where I get all the possible X versions (think
> intervals) via a 2i query (I can organize that easily), then a map reduce on
> those results to find the right interval covering Y? The number of versions
> for the map reduce will typically be in the range of 10s to 1000s at the
> maximum, not more.
>

My initial thoughts (based on a quick reading of this email) is that a
2i range query that feeds the resulting keys to a M/R job [0] would do
the trick.

What type of response times are you looking for with these queries?
When you say "The number of versions for the map reduce will typically
be in the range of 10s to 1000s at the maximum" do you mean that the
total number of keys you'll be map-reduce'ing over will be in the 10s
to 1000s range? Or the result set you'll be producing with that M/R
job will be on the order of that?

Hope that helps.

Mark
ricon2012.com

[0] All the way at the bottom of this --->
http://wiki.basho.com/Secondary-Indexes---Configuration-and-Examples.html#Index-Lookups

> Any input would be great!
>
> On Wed, Aug 15, 2012 at 11:54 AM, Patrik Sundberg
> <patrik.sundberg at gmail.com> wrote:
>>
>> Hi,
>>
>> I have a domain where I want to be able to "time travel". I don't have
>> many of updates (many more reads), but when there is an update I need to
>> preserve history and create new versions. Setting my local "application
>> time" determines which version of a particular piece of data is fetched, and
>> I can go back in time and recreate how things looked previously. One can't
>> change the past, just create new versions in the "future" relative to last
>> version. Using a model of "starting point + replaying deltas" to get to a
>> given time is not a good idea, it's an ever evolving state where snapshots
>> are cheap enough to store and reduces complexity a lot.
>>
>> My domain objects are in the order of a couple of hundred types, each type
>> having some pure data properties (10s, up to hundreds, easily represented as
>> JSON blobs) and in the order of tens, maximum hundreds of has_one and
>> has_many type relationships to other objects (which can be of different
>> type). The relationships only require one direction, always from parent to
>> child (sourced to destination). An object has a given unique ID, and a
>> version of that object has a given unique valid time period (with the latest
>> version having an implicit "infinity" end of period).
>>
>> The queries are mostly to find a data property or a relationship for a
>> given object. A few special cases may be for range queries and exact queries
>> on properties, easily taken care of by 2i queries.
>>
>> I'm trying to think of if and how my domain would be fitted into a riak
>> "schema". My hunch of starting point:
>> - map object types to buckets
>> - make the unique object IDs the keys in the bucket to represent the
>> concept of that object
>> - not sure how to represent the links to versions of that particular
>> object
>> - the versions themselves may be either in the same bucket or in a another
>> bucket (think "cars" and "car-versions" or using "cars" for both)
>> - a version has a JSON value with its properties, some 2i for any possible
>> exact and range queries I need
>> - the has_one and has_many links i could do in several ways. first
>> decision is if to point them to the object identity or directly to a
>> specific version. then can use Link, can use 2i, can store IDs in the JSON
>> and do a 2 query fetch to get there
>> - 99% of read operations are of the type "given the time of X, give me the
>> property or relation Y of object with ID Z"
>>
>> Anyone having built something similar with a time snapshot/version angle
>> with experience to share? Any input in general appreciated.
>>
>> Thanks,
>> Patrik
>>
>
>
> _______________________________________________
> riak-users mailing list
> riak-users at lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>




More information about the riak-users mailing list