Schema design - version history and time travel
patrik.sundberg at gmail.com
Wed Aug 15 06:54:12 EDT 2012
I have a domain where I want to be able to "time travel". I don't have many
of updates (many more reads), but when there is an update I need to
preserve history and create new versions. Setting my local "application
time" determines which version of a particular piece of data is fetched,
and I can go back in time and recreate how things looked previously. One
can't change the past, just create new versions in the "future" relative to
last version. Using a model of "starting point + replaying deltas" to get
to a given time is not a good idea, it's an ever evolving state where
snapshots are cheap enough to store and reduces complexity a lot.
My domain objects are in the order of a couple of hundred types, each type
having some pure data properties (10s, up to hundreds, easily represented
as JSON blobs) and in the order of tens, maximum hundreds of has_one and
has_many type relationships to other objects (which can be of different
type). The relationships only require one direction, always from parent to
child (sourced to destination). An object has a given unique ID, and a
version of that object has a given unique valid time period (with the
latest version having an implicit "infinity" end of period).
The queries are mostly to find a data property or a relationship for a
given object. A few special cases may be for range queries and exact
queries on properties, easily taken care of by 2i queries.
I'm trying to think of if and how my domain would be fitted into a riak
"schema". My hunch of starting point:
- map object types to buckets
- make the unique object IDs the keys in the bucket to represent the
concept of that object
- not sure how to represent the links to versions of that particular object
- the versions themselves may be either in the same bucket or in a another
bucket (think "cars" and "car-versions" or using "cars" for both)
- a version has a JSON value with its properties, some 2i for any possible
exact and range queries I need
- the has_one and has_many links i could do in several ways. first decision
is if to point them to the object identity or directly to a specific
version. then can use Link, can use 2i, can store IDs in the JSON and do a
2 query fetch to get there
- 99% of read operations are of the type "given the time of X, give me the
property or relation Y of object with ID Z"
Anyone having built something similar with a time snapshot/version angle
with experience to share? Any input in general appreciated.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the riak-users