Looking for Riak recommendations for modeling data with N:1 references

elij elij.mx at gmail.com
Sun May 27 17:45:10 EDT 2012


I am evaluating Riak for a project, and am looking for some recommendations on modeling data for optimal performance. The data is a single 'object' (henceforth named 'Widget') that needs to be looked up via N possible attributes, and should have a reference to theses keys. There will be many hundreds of millions of such Widgets (keys will not fit in cluster ram). Currently this data is housed in a RDBMS, but we are looking at a few alternatives due to single node scaling issue, the desire for easier operations, and growth to more datacenters. 

Consider the following example Widget object..

  vendorAkey: "widget001"
  vendorBkey: "bluewidget6"
  vendorCkey: "sprocket42"
  widgetData: <json blob of data>

My ideas so far are to:

1) have 'reference lookups' performed application side, with a 'widget' bucket at the end.

widget001 = 282ec0a1-a842-11e1-83cd-34159e0284ea
bluewidget6 = 282ec0a1-a842-11e1-83cd-34159e0284ea
sprocket42 = 282ec0a1-a842-11e1-83cd-34159e0284ea

then finally the 'real widget'
282ec0a1-a842-11e1-83cd-34159e0284ea = <Widget json dict>

With the idea being that the application code would fetch the uuid1 value by vendor key, and then perform another fetch of the actual widget data based on the response of the first (if found). The widget json dict would contain the vendor keys as well (for any needed cleanup down the road, cross reference, etc).

2) Use secondary indexes and have each vendor 'key' be a secondary index. I heard[1] that secondary indexes are slow though.

3) use layout of the first solution, but with links instead of application side lookups. I also hear[2] that links are slow too.

I am leaning towards #1, but would like to hear of any better recommendations.


[1]: http://basho.com/blog/technical/2012/05/25/Scaling-Riak-At-Kiip/
[2]: http://www.infoq.com/presentations/Case-Study-Riak-on-Drugs

More information about the riak-users mailing list