Schema design - version history and time travel

Patrik Sundberg patrik.sundberg at
Fri Aug 17 12:37:30 EDT 2012


I'll simplify the case to something easier to follow. The typical question
I have is: find piece of data X as of time Y. A piece of data X has a start
time and end time, can think of as positive integers (that I could have 2i
indices for). I'm trying to find the version of X whose start and end time
integer range includes the integer Y.

I don't see how I make the query with 2i, and doing it via Search seems
wrong since don't need the overhead of converting to text etc. Do I need to
do a 2step procedure where I get all the possible X versions (think
intervals) via a 2i query (I can organize that easily), then a map reduce
on those results to find the right interval covering Y? The number of
versions for the map reduce will typically be in the range of 10s to 1000s
at the maximum, not more.

Any input would be great!

On Wed, Aug 15, 2012 at 11:54 AM, Patrik Sundberg <patrik.sundberg at
> wrote:

> Hi,
> I have a domain where I want to be able to "time travel". I don't have
> many of updates (many more reads), but when there is an update I need to
> preserve history and create new versions. Setting my local "application
> time" determines which version of a particular piece of data is fetched,
> and I can go back in time and recreate how things looked previously. One
> can't change the past, just create new versions in the "future" relative to
> last version. Using a model of "starting point + replaying deltas" to get
> to a given time is not a good idea, it's an ever evolving state where
> snapshots are cheap enough to store and reduces complexity a lot.
> My domain objects are in the order of a couple of hundred types, each type
> having some pure data properties (10s, up to hundreds, easily represented
> as JSON blobs) and in the order of tens, maximum hundreds of has_one and
> has_many type relationships to other objects (which can be of different
> type). The relationships only require one direction, always from parent to
> child (sourced to destination). An object has a given unique ID, and a
> version of that object has a given unique valid time period (with the
> latest version having an implicit "infinity" end of period).
> The queries are mostly to find a data property or a relationship for a
> given object. A few special cases may be for range queries and exact
> queries on properties, easily taken care of by 2i queries.
> I'm trying to think of if and how my domain would be fitted into a riak
> "schema". My hunch of starting point:
> - map object types to buckets
> - make the unique object IDs the keys in the bucket to represent the
> concept of that object
> - not sure how to represent the links to versions of that particular object
> - the versions themselves may be either in the same bucket or in a another
> bucket (think "cars" and "car-versions" or using "cars" for both)
> - a version has a JSON value with its properties, some 2i for any possible
> exact and range queries I need
> - the has_one and has_many links i could do in several ways. first
> decision is if to point them to the object identity or directly to a
> specific version. then can use Link, can use 2i, can store IDs in the JSON
> and do a 2 query fetch to get there
> - 99% of read operations are of the type "given the time of X, give me the
> property or relation Y of object with ID Z"
> Anyone having built something similar with a time snapshot/version angle
> with experience to share? Any input in general appreciated.
> Thanks,
> Patrik
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

More information about the riak-users mailing list