Using Riak to perform aggregate queries

Paul Barry pbarry at temetra.com
Mon Apr 15 04:58:32 EDT 2013


Chris,

I would agree with Alexander about doing more work up front.

We had a lot of data in SQL we moved to Riak, and like you, our initial 
instinct was to keep the data normalised and use M/R to work out 
aggregate values.

After some time experimenting, I believe the better solution, which we 
now use, is to play to the strength of riak and treat it more like an 
infinite size store with fast lookup on keys. This means denormalising 
your data and maybe storing the same piece of information in several 
different ways to match your later access patterns.

This is antithetical to SQL view of the world, but does allow us to 
scale much better. In our application of smart meter data, we keep SQL 
around for all the low volume data that we like to query in lots of 
different ways, and use riak for the high volume but slowly changing 
stuff. As the latter arrives, we store it in several data structures, 
pre-computing most of the calculations that would were previously done 
in SQL on the fly.

M/R as implemented in Riak has applications, but is a poor choice when 
you're starting with 'all the data'. You can help it along by preparing 
your data to be M/R friendly.

Paul



Alexander Sicular wrote, On 15/04/13 05:47:
> by date via a secondary index query or via riak search. Oh, and
> precompute everything. Pick whichever time slice has less keys than the
> number of keys that make your queries go boom. If a month is too big do
> a week or even a day. Persist all computation in materialized keys like





More information about the riak-users mailing list