Using Riak to perform aggregate queries
paul.barry at temetra.com
Mon Apr 15 05:55:43 EDT 2013
I would agree with Alexander about doing more work up front.
We had a lot of data in SQL we moved to Riak, and like you, our initial
instinct was to keep the data normalised and use M/R to work out
After some time experimenting, I believe the better solution, which we
now use, is to play to the strength of riak and treat it more like an
infinite size store with fast lookup on keys. This means denormalising
your data and maybe storing the same piece of information in several
different ways to match your later access patterns.
This is antithetical to SQL view of the world, but does allow us to
scale much better. In our application of smart meter data, we keep SQL
around for all the low volume data that we like to query in lots of
different ways, and use riak for the high volume but slowly changing
stuff. As the latter arrives, we store it in several data structures,
pre-computing most of the calculations that would were previously done
in SQL on the fly.
M/R as implemented in Riak has applications, but is a poor choice when
you're starting with 'all the data'. You can help it along by preparing
your data to be M/R friendly.
Alexander Sicular wrote, On 15/04/13 05:47:
> by date via a secondary index query or via riak search. Oh, and
> precompute everything. Pick whichever time slice has less keys than the
> number of keys that make your queries go boom. If a month is too big do
> a week or even a day. Persist all computation in materialized keys like
More information about the riak-users